Models API

API reference for object detection models.

Model Registry

The global registry for all detection models. Use this to build models by name.

objdet.models.registry.MODEL_REGISTRY

Generic registry for plugin-style component management.

This class provides a centralized registry where components can be registered by name and later retrieved. It supports both decorator-style and direct registration.

Parameters:

name – Name of this registry (for logging/error messages).

objdet.models.registry.name

Registry name.

objdet.models.registry._registry

Internal dictionary mapping names to registered items.

Example

>>> registry = Registry[nn.Module]("models")
>>> registry.register("my_model")(MyModelClass)
>>> model_cls = registry.get("my_model")

Registered Models

Name

Aliases

Class

faster_rcnn

fasterrcnn, frcnn

FasterRCNN

retinanet

-

RetinaNet

yolov8

yolo8

YOLOv8

yolov11

yolo11

YOLOv11

from objdet.models import MODEL_REGISTRY

# Build model from registry
model = MODEL_REGISTRY.build("faster_rcnn", num_classes=80)

Base Class

BaseLightningDetector

class objdet.models.base.BaseLightningDetector(num_classes, class_index_mode=ClassIndexMode.TORCHVISION, learning_rate=0.001, weight_decay=0.0001, confidence_threshold=0.25, nms_threshold=0.45, pretrained=True, pretrained_backbone=True, optimizer='adamw', scheduler='cosine', scheduler_kwargs=None)[source]

Bases: LightningModule

Abstract base class for object detection models.

This class provides common functionality for all detection models: - Standard training/validation/test step implementations - Metric computation (mAP) - Optimizer and scheduler configuration - Logging integration

Subclasses must implement: - forward(): Model forward pass - _build_model(): Model architecture construction

Parameters:
  • num_classes (int) – Number of object classes to detect (excluding background for TorchVision models, including all classes for YOLO).

  • class_index_mode (ClassIndexMode | str) – How class indices are handled. TORCHVISION expects background at index 0, YOLO has no background class.

  • learning_rate (float) – Initial learning rate for optimizer.

  • weight_decay (float) – Weight decay for optimizer.

  • confidence_threshold (float) – Minimum confidence for predictions.

  • nms_threshold (float) – IoU threshold for NMS.

  • pretrained (bool) – Whether to use pretrained weights.

  • pretrained_backbone (bool) – Whether to use pretrained backbone only.

num_classes

Number of detection classes.

class_index_mode

Class index handling mode.

hparams

Hyperparameters (auto-saved by Lightning).

Example

>>> model = MyDetector(num_classes=80, pretrained=True)
>>> trainer = L.Trainer(max_epochs=100)
>>> trainer.fit(model, datamodule)
abstractmethod forward(images, targets=None)[source]

Forward pass of the model.

Parameters:
  • images (list[Tensor]) – List of image tensors, each of shape (C, H, W).

  • targets (list[DetectionTarget] | None) – Optional list of target dictionaries for training. Each target contains ‘boxes’ and ‘labels’ at minimum.

Returns:

Dictionary of losses. During inference (no targets): List of prediction dicts.

Return type:

During training (targets provided)

training_step(batch, batch_idx)[source]

Perform a single training step.

Parameters:
  • batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).

  • batch_idx (int) – Index of the current batch.

Return type:

Tensor

Returns:

Total loss for backpropagation.

validation_step(batch, batch_idx)[source]

Perform a single validation step.

Parameters:
  • batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).

  • batch_idx (int) – Index of the current batch.

Return type:

None

on_validation_epoch_end()[source]

Compute and log validation metrics at epoch end.

Return type:

None

test_step(batch, batch_idx)[source]

Perform a single test step.

Parameters:
  • batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).

  • batch_idx (int) – Index of the current batch.

Return type:

None

on_test_epoch_end()[source]

Compute and log test metrics at epoch end.

Return type:

None

predict_step(batch, batch_idx)[source]

Perform a single prediction step.

Parameters:
Return type:

list[DetectionPrediction]

Returns:

List of prediction dictionaries.

configure_optimizers()[source]

Configure optimizer and learning rate scheduler.

Return type:

OptimizerLRSchedulerConfig

Returns:

Dictionary with optimizer and optional lr_scheduler configuration.

property num_model_classes: int

Get the number of classes expected by the model.

For TorchVision models, this includes the background class. For YOLO models, this is the same as num_classes.

Returns:

Number of classes for the model architecture.

get_model_info()[source]

Get model information dictionary.

Return type:

dict[str, Any]

Returns:

Dictionary with model metadata.


TorchVision Models

FasterRCNN

Two-stage detector with Region Proposal Network.

class objdet.models.torchvision.faster_rcnn.FasterRCNN(num_classes, backbone='resnet50_fpn_v2', pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, min_size=800, max_size=1333, **kwargs)[source]

Bases: BaseLightningDetector

Faster R-CNN with ResNet-50 FPN backbone.

This is a two-stage object detector consisting of: 1. Region Proposal Network (RPN) for generating object proposals 2. Fast R-CNN head for classification and bounding box regression

The model uses TorchVision class indexing (background at index 0).

Parameters:
  • num_classes (int) – Number of object classes (NOT including background). The model will internally use num_classes + 1.

  • backbone (str) – Backbone variant - “resnet50_fpn” or “resnet50_fpn_v2”.

  • pretrained (bool) – If True, use pretrained weights on COCO.

  • pretrained_backbone (bool) – If True, use ImageNet pretrained backbone.

  • trainable_backbone_layers (int) – Number of trainable backbone layers (0-5).

  • min_size (int) – Minimum image size for inference.

  • max_size (int) – Maximum image size for inference.

  • **kwargs (Any) – Additional arguments for BaseLightningDetector.

model

The underlying TorchVision Faster R-CNN model.

Example

>>> model = FasterRCNN(num_classes=20, pretrained_backbone=True)
>>> images = [torch.rand(3, 800, 600) for _ in range(4)]
>>> predictions = model(images)
forward(images, targets=None)[source]

Forward pass.

Parameters:
  • images (list[Tensor]) – List of image tensors (C, H, W).

  • targets (list[DetectionTarget] | None) – Optional list of target dicts for training.

Returns:

Dict of losses. Inference: List of prediction dicts with boxes, labels, scores.

Return type:

Training

get_model_info()[source]

Get model information.

Return type:

dict[str, Any]

from objdet.models.torchvision import FasterRCNN
from lightning import Trainer

model = FasterRCNN(
    num_classes=80,
    backbone="resnet50_fpn_v2",
    pretrained_backbone=True,
    trainable_backbone_layers=3,
)

trainer = Trainer(max_epochs=100)
trainer.fit(model, datamodule)

RetinaNet

One-stage detector with focal loss.

class objdet.models.torchvision.retinanet.RetinaNet(num_classes, backbone='resnet50_fpn_v2', pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, min_size=800, max_size=1333, score_thresh=0.05, nms_thresh=0.5, detections_per_img=300, **kwargs)[source]

Bases: BaseLightningDetector

RetinaNet with ResNet-50 FPN backbone.

RetinaNet is a one-stage object detector that uses: 1. Feature Pyramid Network (FPN) for multi-scale features 2. Focal loss to address class imbalance 3. Separate classification and regression heads

The model uses TorchVision class indexing (background at index 0).

Parameters:
  • num_classes (int) – Number of object classes (NOT including background).

  • backbone (str) – Backbone variant - “resnet50_fpn” or “resnet50_fpn_v2”.

  • pretrained (bool) – If True, use pretrained weights on COCO.

  • pretrained_backbone (bool) – If True, use ImageNet pretrained backbone.

  • trainable_backbone_layers (int) – Number of trainable backbone layers (0-5).

  • min_size (int) – Minimum image size for inference.

  • max_size (int) – Maximum image size for inference.

  • score_thresh (float) – Score threshold for predictions.

  • nms_thresh (float) – NMS threshold.

  • detections_per_img (int) – Maximum detections per image.

  • **kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> model = RetinaNet(num_classes=20, pretrained_backbone=True)
>>> trainer = Trainer(max_epochs=50)
>>> trainer.fit(model, datamodule)
forward(images, targets=None)[source]

Forward pass.

Parameters:
  • images (list[Tensor]) – List of image tensors (C, H, W).

  • targets (list[DetectionTarget] | None) – Optional list of target dicts for training.

Returns:

Dict of losses {‘classification’, ‘bbox_regression’}. Inference: List of prediction dicts with ‘boxes’, ‘labels’, ‘scores’.

Return type:

Training

get_model_info()[source]

Get model information.

Return type:

dict[str, Any]

from objdet.models.torchvision import RetinaNet

model = RetinaNet(
    num_classes=80,
    backbone="resnet50_fpn_v2",
    pretrained_backbone=True,
    score_thresh=0.05,
    nms_thresh=0.5,
)

YOLO Models

YOLOv8

class objdet.models.yolo.yolov8.YOLOv8(num_classes, model_size='n', pretrained=True, conf_thres=0.25, iou_thres=0.45, **kwargs)[source]

Bases: YOLOBaseLightning

YOLOv8 object detection model wrapped for Lightning.

YOLOv8 is a state-of-the-art real-time object detector featuring: - Anchor-free detection head - C2f modules for efficient feature extraction - Mosaic and MixUp augmentation (handled via transforms) - Task-aligned assigner for positive sample selection

Available model sizes: - n (nano): Fastest, lowest accuracy (~3.2M params) - s (small): Fast with good accuracy (~11.2M params) - m (medium): Balanced speed/accuracy (~25.9M params) - l (large): High accuracy (~43.7M params) - x (extra-large): Highest accuracy (~68.2M params)

Warning

There is a known bug in the training pipeline that causes IndexError: too many indices for tensor of dimension 2 during the loss computation. This affects training via both CLI and Python API. Investigation is ongoing to resolve this issue in the Ultralytics loss integration.

Parameters:
  • num_classes (int) – Number of object classes (no background).

  • model_size (str) – Model size variant (“n”, “s”, “m”, “l”, “x”).

  • pretrained (bool) – If True, load COCO pretrained weights.

  • conf_thres (float) – Confidence threshold for predictions.

  • iou_thres (float) – IoU threshold for NMS.

  • **kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> # Create YOLOv8-medium model
>>> model = YOLOv8(num_classes=20, model_size="m")
>>>
>>> # Train with Lightning
>>> trainer = Trainer(
...     max_epochs=100,
...     callbacks=[ModelCheckpoint(monitor="val/mAP")],
... )
>>> trainer.fit(model, datamodule)
MODEL_VARIANTS = {'l': 'yolov8l.pt', 'm': 'yolov8m.pt', 'n': 'yolov8n.pt', 's': 'yolov8s.pt', 'x': 'yolov8x.pt'}

Model Sizes:

Size

Variant

Parameters

n (nano)

yolov8n.pt

~3.2M

s (small)

yolov8s.pt

~11.2M

m (medium)

yolov8m.pt

~25.9M

l (large)

yolov8l.pt

~43.7M

x (extra-large)

yolov8x.pt

~68.2M

from objdet.models.yolo import YOLOv8

model = YOLOv8(
    num_classes=80,
    model_size="m",
    pretrained=True,
    conf_thres=0.25,
    iou_thres=0.45,
)

Warning

There is a known bug in the training pipeline that causes IndexError: too many indices for tensor of dimension 2 during loss computation.


YOLOv11

class objdet.models.yolo.yolov11.YOLOv11(num_classes, model_size='n', pretrained=True, conf_thres=0.25, iou_thres=0.45, **kwargs)[source]

Bases: YOLOBaseLightning

YOLOv11 (YOLO11) object detection model wrapped for Lightning.

YOLOv11 is the latest iteration of the YOLO series featuring: - Improved C3k2 blocks for better feature extraction - Enhanced attention mechanisms - Better small object detection - Optimized architecture for efficiency

Available model sizes: - n (nano): Fastest, lowest accuracy - s (small): Fast with good accuracy - m (medium): Balanced speed/accuracy - l (large): High accuracy - x (extra-large): Highest accuracy

Warning

There is a known bug in the training pipeline that may cause IndexError: too many indices for tensor of dimension 2 during the loss computation. This is the same issue as YOLOv8. Investigation is ongoing to resolve this issue.

Parameters:
  • num_classes (int) – Number of object classes (no background).

  • model_size (str) – Model size variant (“n”, “s”, “m”, “l”, “x”).

  • pretrained (bool) – If True, load COCO pretrained weights.

  • conf_thres (float) – Confidence threshold for predictions.

  • iou_thres (float) – IoU threshold for NMS.

  • **kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> # Create YOLOv11-large model
>>> model = YOLOv11(num_classes=20, model_size="l")
>>>
>>> # Train with Lightning
>>> trainer = Trainer(max_epochs=100)
>>> trainer.fit(model, datamodule)
MODEL_VARIANTS = {'l': 'yolo11l.pt', 'm': 'yolo11m.pt', 'n': 'yolo11n.pt', 's': 'yolo11s.pt', 'x': 'yolo11x.pt'}

Model Sizes:

Size

Variant

n (nano)

yolo11n.pt

s (small)

yolo11s.pt

m (medium)

yolo11m.pt

l (large)

yolo11l.pt

x (extra-large)

yolo11x.pt

from objdet.models.yolo import YOLOv11

model = YOLOv11(
    num_classes=80,
    model_size="l",
    pretrained=True,
)

Warning

YOLOv11 has the same known training bug as YOLOv8.