Data Formats¶
ObjDet supports multiple data formats for object detection tasks, with optimized streaming via LitData.
Supported Formats¶
Format |
DataModule |
Description |
|---|---|---|
COCO |
|
JSON annotations with image paths |
Pascal VOC |
|
XML annotations per image |
YOLO |
|
Text annotations per image |
LitData |
|
Optimized streaming format |
LitData Streaming Format¶
LitData provides optimized streaming for large-scale datasets with:
Native Streaming: Uses
StreamingDatasetandStreamingDataLoaderfor efficient data loadingCloud Integration: Stream directly from S3, GCS, or Azure Blob Storage
Automatic Prefetching: Optimized chunk-based prefetching
Distributed Training: Built-in support for multi-GPU and multi-node training
Usage¶
from objdet.data.formats.litdata import (
LitDataDataModule,
DetectionStreamingDataset,
create_streaming_dataloader,
)
# Using the DataModule (recommended)
datamodule = LitDataDataModule(
data_dir="/data/coco_litdata",
batch_size=16,
num_workers=4,
)
datamodule.setup("fit")
train_loader = datamodule.train_dataloader()
# Using the dataset directly
dataset = DetectionStreamingDataset(
input_dir="/data/coco_litdata/train",
shuffle=True,
)
# Create dataloader with detection collation
loader = create_streaming_dataloader(
dataset=dataset,
batch_size=16,
num_workers=4,
)
Configuration¶
data:
class_path: objdet.data.formats.litdata.LitDataDataModule
init_args:
data_dir: /path/to/litdata
train_subdir: train
val_subdir: val
batch_size: 16
num_workers: 4
Converting Datasets to LitData¶
Convert existing datasets to the optimized format:
# CLI
objdet preprocess \
--input /path/to/coco \
--output /path/to/coco_litdata \
--format coco
# Python API
from objdet.data.preprocessing import convert_to_litdata
convert_to_litdata(
input_dir="/data/coco",
output_dir="/data/coco_litdata",
format_name="coco",
num_workers=8,
)
COCO Format¶
Standard COCO JSON format with bounding boxes.
Expected Structure¶
coco_dataset/
├── annotations/
│ ├── instances_train.json
│ └── instances_val.json
└── images/
├── train/
└── val/
Usage¶
from objdet.data.formats.coco import COCODataModule
datamodule = COCODataModule(
data_dir="/data/coco",
train_ann_file="annotations/instances_train.json",
val_ann_file="annotations/instances_val.json",
)
Pascal VOC Format¶
XML annotations with per-image files.
Expected Structure¶
voc_dataset/
├── Annotations/ # XML files
├── ImageSets/Main/ # train.txt, val.txt
└── JPEGImages/ # Image files
Usage¶
from objdet.data.formats.voc import VOCDataModule
datamodule = VOCDataModule(
data_dir="/data/voc",
)
YOLO Format¶
Text annotations with one file per image.
Expected Structure¶
yolo_dataset/
├── images/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
Label Format¶
Each line: class_id center_x center_y width height (normalized 0-1)
Usage¶
from objdet.data.formats.yolo import YOLODataModule
datamodule = YOLODataModule(
data_dir="/data/yolo",
)
Class Index Modes¶
Different model architectures expect different class indexing:
Mode |
Background |
Class Range |
Models |
|---|---|---|---|
|
Index 0 |
1 to N |
Faster R-CNN, RetinaNet |
|
None |
0 to N-1 |
YOLOv8, YOLOv11 |
Specify in your config:
data:
class_index_mode: torchvision # or "yolo"
Custom Transforms¶
Apply augmentations using Albumentations:
import albumentations as A
train_transforms = A.Compose([
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
A.Resize(800, 1333),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))
datamodule = COCODataModule(
data_dir="/data/coco",
train_transforms=train_transforms,
)