Data Formats

ObjDet supports multiple data formats for object detection tasks, with optimized streaming via LitData.

Supported Formats

Format

DataModule

Description

COCO

COCODataModule

JSON annotations with image paths

Pascal VOC

VOCDataModule

XML annotations per image

YOLO

YOLODataModule

Text annotations per image

LitData

LitDataDataModule

Optimized streaming format

LitData Streaming Format

LitData provides optimized streaming for large-scale datasets with:

  • Native Streaming: Uses StreamingDataset and StreamingDataLoader for efficient data loading

  • Cloud Integration: Stream directly from S3, GCS, or Azure Blob Storage

  • Automatic Prefetching: Optimized chunk-based prefetching

  • Distributed Training: Built-in support for multi-GPU and multi-node training

Usage

from objdet.data.formats.litdata import (
    LitDataDataModule,
    DetectionStreamingDataset,
    create_streaming_dataloader,
)

# Using the DataModule (recommended)
datamodule = LitDataDataModule(
    data_dir="/data/coco_litdata",
    batch_size=16,
    num_workers=4,
)
datamodule.setup("fit")
train_loader = datamodule.train_dataloader()

# Using the dataset directly
dataset = DetectionStreamingDataset(
    input_dir="/data/coco_litdata/train",
    shuffle=True,
)

# Create dataloader with detection collation
loader = create_streaming_dataloader(
    dataset=dataset,
    batch_size=16,
    num_workers=4,
)

Configuration

data:
  class_path: objdet.data.formats.litdata.LitDataDataModule
  init_args:
    data_dir: /path/to/litdata
    train_subdir: train
    val_subdir: val
    batch_size: 16
    num_workers: 4

Converting Datasets to LitData

Convert existing datasets to the optimized format:

# CLI
objdet preprocess \
    --input /path/to/coco \
    --output /path/to/coco_litdata \
    --format coco
# Python API
from objdet.data.preprocessing import convert_to_litdata

convert_to_litdata(
    input_dir="/data/coco",
    output_dir="/data/coco_litdata",
    format_name="coco",
    num_workers=8,
)

COCO Format

Standard COCO JSON format with bounding boxes.

Expected Structure

coco_dataset/
├── annotations/
│   ├── instances_train.json
│   └── instances_val.json
└── images/
    ├── train/
    └── val/

Usage

from objdet.data.formats.coco import COCODataModule

datamodule = COCODataModule(
    data_dir="/data/coco",
    train_ann_file="annotations/instances_train.json",
    val_ann_file="annotations/instances_val.json",
)

Pascal VOC Format

XML annotations with per-image files.

Expected Structure

voc_dataset/
├── Annotations/     # XML files
├── ImageSets/Main/  # train.txt, val.txt
└── JPEGImages/      # Image files

Usage

from objdet.data.formats.voc import VOCDataModule

datamodule = VOCDataModule(
    data_dir="/data/voc",
)

YOLO Format

Text annotations with one file per image.

Expected Structure

yolo_dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/

Label Format

Each line: class_id center_x center_y width height (normalized 0-1)

Usage

from objdet.data.formats.yolo import YOLODataModule

datamodule = YOLODataModule(
    data_dir="/data/yolo",
)

Class Index Modes

Different model architectures expect different class indexing:

Mode

Background

Class Range

Models

torchvision

Index 0

1 to N

Faster R-CNN, RetinaNet

yolo

None

0 to N-1

YOLOv8, YOLOv11

Specify in your config:

data:
  class_index_mode: torchvision  # or "yolo"

Custom Transforms

Apply augmentations using Albumentations:

import albumentations as A

train_transforms = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
    A.Resize(800, 1333),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

datamodule = COCODataModule(
    data_dir="/data/coco",
    train_transforms=train_transforms,
)