Models API¶

API reference for object detection models.

Model Registry¶

The global registry for all detection models. Use this to build models by name.

objdet.models.registry.MODEL_REGISTRY¶

Generic registry for plugin-style component management.

This class provides a centralized registry where components can be registered by name and later retrieved. It supports both decorator-style and direct registration.

Parameters:: name – Name of this registry (for logging/error messages).

objdet.models.registry.name¶: Registry name.

objdet.models.registry._registry¶: Internal dictionary mapping names to registered items.

Example

>>> registry = Registry[nn.Module]("models")
>>> registry.register("my_model")(MyModelClass)
>>> model_cls = registry.get("my_model")

Registered Models¶

Name	Aliases	Class
`faster_rcnn`	`fasterrcnn`, `frcnn`	`FasterRCNN`
`retinanet`	-	`RetinaNet`
`yolov8`	`yolo8`	`YOLOv8`
`yolov11`	`yolo11`	`YOLOv11`

from objdet.models import MODEL_REGISTRY

# Build model from registry
model = MODEL_REGISTRY.build("faster_rcnn", num_classes=80)

Base Class¶

BaseLightningDetector¶

class objdet.models.base.BaseLightningDetector(num_classes, class_index_mode=ClassIndexMode.TORCHVISION, learning_rate=0.001, weight_decay=0.0001, confidence_threshold=0.25, nms_threshold=0.45, pretrained=True, pretrained_backbone=True, optimizer='adamw', scheduler='cosine', scheduler_kwargs=None)[source]¶

Bases: LightningModule

Abstract base class for object detection models.

This class provides common functionality for all detection models: - Standard training/validation/test step implementations - Metric computation (mAP) - Optimizer and scheduler configuration - Logging integration

Subclasses must implement: - forward(): Model forward pass - _build_model(): Model architecture construction

Parameters:

num_classes (int) – Number of object classes to detect (excluding background for TorchVision models, including all classes for YOLO).
class_index_mode (ClassIndexMode | str) – How class indices are handled. TORCHVISION expects background at index 0, YOLO has no background class.
learning_rate (float) – Initial learning rate for optimizer.
weight_decay (float) – Weight decay for optimizer.
confidence_threshold (float) – Minimum confidence for predictions.
nms_threshold (float) – IoU threshold for NMS.
pretrained (bool) – Whether to use pretrained weights.
pretrained_backbone (bool) – Whether to use pretrained backbone only.

num_classes¶: Number of detection classes.

class_index_mode¶: Class index handling mode.

hparams¶: Hyperparameters (auto-saved by Lightning).

Example

>>> model = MyDetector(num_classes=80, pretrained=True)
>>> trainer = L.Trainer(max_epochs=100)
>>> trainer.fit(model, datamodule)

abstractmethod forward(images, targets=None)[source]¶

Forward pass of the model.

Parameters:

images (list[Tensor]) – List of image tensors, each of shape (C, H, W).
targets (list[DetectionTarget] | None) – Optional list of target dictionaries for training. Each target contains ‘boxes’ and ‘labels’ at minimum.

Returns:

Dictionary of losses. During inference (no targets): List of prediction dicts.

Return type:

During training (targets provided)

training_step(batch, batch_idx)[source]¶

Perform a single training step.

Parameters:

batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).
batch_idx (int) – Index of the current batch.

Return type:

Tensor

Returns:

Total loss for backpropagation.

validation_step(batch, batch_idx)[source]¶

Perform a single validation step.

Parameters:

batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).
batch_idx (int) – Index of the current batch.

Return type:

None

on_validation_epoch_end()[source]¶

Compute and log validation metrics at epoch end.

Return type:: None

test_step(batch, batch_idx)[source]¶

Perform a single test step.

Parameters:

batch (tuple[list[Tensor], list[DetectionTarget]] | list) – Tuple of (images, targets).
batch_idx (int) – Index of the current batch.

Return type:

None

on_test_epoch_end()[source]¶

Compute and log test metrics at epoch end.

Return type:: None

predict_step(batch, batch_idx)[source]¶

Perform a single prediction step.

Parameters:

batch (list[Tensor] | tuple[list[Tensor], Any]) – List of images or tuple of (images, …).
batch_idx (int) – Index of the current batch.

Return type:

list[DetectionPrediction]

Returns:

List of prediction dictionaries.

configure_optimizers()[source]¶

Configure optimizer and learning rate scheduler.

Return type:: OptimizerLRSchedulerConfig
Returns:: Dictionary with optimizer and optional lr_scheduler configuration.

property num_model_classes: int¶

Get the number of classes expected by the model.

For TorchVision models, this includes the background class. For YOLO models, this is the same as num_classes.

Returns:: Number of classes for the model architecture.

get_model_info()[source]¶

Get model information dictionary.

Return type:: dict[str, Any]
Returns:: Dictionary with model metadata.

TorchVision Models¶

FasterRCNN¶

Two-stage detector with Region Proposal Network.

class objdet.models.torchvision.faster_rcnn.FasterRCNN(num_classes, backbone='resnet50_fpn_v2', pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, min_size=800, max_size=1333, **kwargs)[source]¶

Bases: BaseLightningDetector

Faster R-CNN with ResNet-50 FPN backbone.

This is a two-stage object detector consisting of: 1. Region Proposal Network (RPN) for generating object proposals 2. Fast R-CNN head for classification and bounding box regression

The model uses TorchVision class indexing (background at index 0).

Parameters:

num_classes (int) – Number of object classes (NOT including background). The model will internally use num_classes + 1.
backbone (str) – Backbone variant - “resnet50_fpn” or “resnet50_fpn_v2”.
pretrained (bool) – If True, use pretrained weights on COCO.
pretrained_backbone (bool) – If True, use ImageNet pretrained backbone.
trainable_backbone_layers (int) – Number of trainable backbone layers (0-5).
min_size (int) – Minimum image size for inference.
max_size (int) – Maximum image size for inference.
**kwargs (Any) – Additional arguments for BaseLightningDetector.

model¶: The underlying TorchVision Faster R-CNN model.

Example

>>> model = FasterRCNN(num_classes=20, pretrained_backbone=True)
>>> images = [torch.rand(3, 800, 600) for _ in range(4)]
>>> predictions = model(images)

forward(images, targets=None)[source]¶

Forward pass.

Parameters:

images (list[Tensor]) – List of image tensors (C, H, W).
targets (list[DetectionTarget] | None) – Optional list of target dicts for training.

Returns:

Dict of losses. Inference: List of prediction dicts with boxes, labels, scores.

Return type:

Training

get_model_info()[source]¶

Get model information.

Return type:: dict[str, Any]

from objdet.models.torchvision import FasterRCNN
from lightning import Trainer

model = FasterRCNN(
    num_classes=80,
    backbone="resnet50_fpn_v2",
    pretrained_backbone=True,
    trainable_backbone_layers=3,
)

trainer = Trainer(max_epochs=100)
trainer.fit(model, datamodule)

RetinaNet¶

One-stage detector with focal loss.

class objdet.models.torchvision.retinanet.RetinaNet(num_classes, backbone='resnet50_fpn_v2', pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, min_size=800, max_size=1333, score_thresh=0.05, nms_thresh=0.5, detections_per_img=300, **kwargs)[source]¶

Bases: BaseLightningDetector

RetinaNet with ResNet-50 FPN backbone.

RetinaNet is a one-stage object detector that uses: 1. Feature Pyramid Network (FPN) for multi-scale features 2. Focal loss to address class imbalance 3. Separate classification and regression heads

The model uses TorchVision class indexing (background at index 0).

Parameters:

num_classes (int) – Number of object classes (NOT including background).
backbone (str) – Backbone variant - “resnet50_fpn” or “resnet50_fpn_v2”.
pretrained (bool) – If True, use pretrained weights on COCO.
pretrained_backbone (bool) – If True, use ImageNet pretrained backbone.
trainable_backbone_layers (int) – Number of trainable backbone layers (0-5).
min_size (int) – Minimum image size for inference.
max_size (int) – Maximum image size for inference.
score_thresh (float) – Score threshold for predictions.
nms_thresh (float) – NMS threshold.
detections_per_img (int) – Maximum detections per image.
**kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> model = RetinaNet(num_classes=20, pretrained_backbone=True)
>>> trainer = Trainer(max_epochs=50)
>>> trainer.fit(model, datamodule)

forward(images, targets=None)[source]¶

Forward pass.

Parameters:

images (list[Tensor]) – List of image tensors (C, H, W).
targets (list[DetectionTarget] | None) – Optional list of target dicts for training.

Returns:

Dict of losses {‘classification’, ‘bbox_regression’}. Inference: List of prediction dicts with ‘boxes’, ‘labels’, ‘scores’.

Return type:

Training

get_model_info()[source]¶

Get model information.

Return type:: dict[str, Any]

from objdet.models.torchvision import RetinaNet

model = RetinaNet(
    num_classes=80,
    backbone="resnet50_fpn_v2",
    pretrained_backbone=True,
    score_thresh=0.05,
    nms_thresh=0.5,
)

YOLO Models¶

YOLOv8¶

class objdet.models.yolo.yolov8.YOLOv8(num_classes, model_size='n', pretrained=True, conf_thres=0.25, iou_thres=0.45, **kwargs)[source]¶

Bases: YOLOBaseLightning

YOLOv8 object detection model wrapped for Lightning.

YOLOv8 is a state-of-the-art real-time object detector featuring: - Anchor-free detection head - C2f modules for efficient feature extraction - Mosaic and MixUp augmentation (handled via transforms) - Task-aligned assigner for positive sample selection

Available model sizes: - n (nano): Fastest, lowest accuracy (~3.2M params) - s (small): Fast with good accuracy (~11.2M params) - m (medium): Balanced speed/accuracy (~25.9M params) - l (large): High accuracy (~43.7M params) - x (extra-large): Highest accuracy (~68.2M params)

Warning

There is a known bug in the training pipeline that causes IndexError: too many indices for tensor of dimension 2 during the loss computation. This affects training via both CLI and Python API. Investigation is ongoing to resolve this issue in the Ultralytics loss integration.

Parameters:

num_classes (int) – Number of object classes (no background).
model_size (str) – Model size variant (“n”, “s”, “m”, “l”, “x”).
pretrained (bool) – If True, load COCO pretrained weights.
conf_thres (float) – Confidence threshold for predictions.
iou_thres (float) – IoU threshold for NMS.
**kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> # Create YOLOv8-medium model
>>> model = YOLOv8(num_classes=20, model_size="m")
>>>
>>> # Train with Lightning
>>> trainer = Trainer(
...     max_epochs=100,
...     callbacks=[ModelCheckpoint(monitor="val/mAP")],
... )
>>> trainer.fit(model, datamodule)

MODEL_VARIANTS = {'l': 'yolov8l.pt', 'm': 'yolov8m.pt', 'n': 'yolov8n.pt', 's': 'yolov8s.pt', 'x': 'yolov8x.pt'}¶

Model Sizes:

Size	Variant	Parameters
n (nano)	`yolov8n.pt`	~3.2M
s (small)	`yolov8s.pt`	~11.2M
m (medium)	`yolov8m.pt`	~25.9M
l (large)	`yolov8l.pt`	~43.7M
x (extra-large)	`yolov8x.pt`	~68.2M

from objdet.models.yolo import YOLOv8

model = YOLOv8(
    num_classes=80,
    model_size="m",
    pretrained=True,
    conf_thres=0.25,
    iou_thres=0.45,
)

Warning

There is a known bug in the training pipeline that causes IndexError: too many indices for tensor of dimension 2 during loss computation.

YOLOv11¶

class objdet.models.yolo.yolov11.YOLOv11(num_classes, model_size='n', pretrained=True, conf_thres=0.25, iou_thres=0.45, **kwargs)[source]¶

Bases: YOLOBaseLightning

YOLOv11 (YOLO11) object detection model wrapped for Lightning.

YOLOv11 is the latest iteration of the YOLO series featuring: - Improved C3k2 blocks for better feature extraction - Enhanced attention mechanisms - Better small object detection - Optimized architecture for efficiency

Available model sizes: - n (nano): Fastest, lowest accuracy - s (small): Fast with good accuracy - m (medium): Balanced speed/accuracy - l (large): High accuracy - x (extra-large): Highest accuracy

Warning

There is a known bug in the training pipeline that may cause IndexError: too many indices for tensor of dimension 2 during the loss computation. This is the same issue as YOLOv8. Investigation is ongoing to resolve this issue.

Parameters:

num_classes (int) – Number of object classes (no background).
model_size (str) – Model size variant (“n”, “s”, “m”, “l”, “x”).
pretrained (bool) – If True, load COCO pretrained weights.
conf_thres (float) – Confidence threshold for predictions.
iou_thres (float) – IoU threshold for NMS.
**kwargs (Any) – Additional arguments for BaseLightningDetector.

Example

>>> # Create YOLOv11-large model
>>> model = YOLOv11(num_classes=20, model_size="l")
>>>
>>> # Train with Lightning
>>> trainer = Trainer(max_epochs=100)
>>> trainer.fit(model, datamodule)

MODEL_VARIANTS = {'l': 'yolo11l.pt', 'm': 'yolo11m.pt', 'n': 'yolo11n.pt', 's': 'yolo11s.pt', 'x': 'yolo11x.pt'}¶

Model Sizes:

Size	Variant
n (nano)	`yolo11n.pt`
s (small)	`yolo11s.pt`
m (medium)	`yolo11m.pt`
l (large)	`yolo11l.pt`
x (extra-large)	`yolo11x.pt`

from objdet.models.yolo import YOLOv11

model = YOLOv11(
    num_classes=80,
    model_size="l",
    pretrained=True,
)

Warning

YOLOv11 has the same known training bug as YOLOv8.