Serving API¶

API reference for model serving utilities.

Server¶

run_server¶

objdet.serving.server.run_server(config_path=None, checkpoint_path=None, host='0.0.0.0', port=8000, workers_per_device=1, accelerator='auto', devices='auto', max_batch_size=8, batch_timeout=0.01, api_path='/predict')[source]¶

Run the detection inference server.

Parameters:

config_path (str | Path | None) – Path to serving configuration YAML.
checkpoint_path (str | Path | None) – Path to model checkpoint (if not in config).
host (str) – Host to bind server.
port (int) – Port to bind server.
workers_per_device (int) – Number of worker processes per device.
accelerator (Literal['auto', 'cpu', 'cuda', 'mps']) – Accelerator type (“auto”, “cuda”, “cpu”).
devices (Union[int, Literal['auto']]) – Number of devices or “auto”.
max_batch_size (int) – Maximum batch size for dynamic batching.
batch_timeout (float) – Timeout for batch collection (seconds).
api_path (str) – API endpoint path.

Return type:

None

from objdet.serving import run_server

run_server(
    checkpoint_path="model.ckpt",
    host="0.0.0.0",
    port=8000,
    max_batch_size=8,
    accelerator="cuda",
    devices=1,
)

CLI Usage:

objdet serve --checkpoint model.ckpt --host 0.0.0.0 --port 8000

API Classes¶

DetectionAPI¶

LitServe API implementation for object detection inference.

class objdet.serving.api.DetectionAPI(checkpoint_path, model_class=None, device='cuda', confidence_threshold=0.25, max_batch_size=8)[source]¶

Bases: LitAPI

LitServe API for object detection.

This class implements the LitServe API interface for serving detection models. Supports dynamic batching and async processing.

Parameters:

checkpoint_path (str | Path) – Path to model checkpoint.
model_class (type | None) – Model class (if not inferrable from checkpoint).
device (str) – Device for inference.
confidence_threshold (float) – Minimum confidence for predictions.
max_batch_size (int) – Maximum batch size for dynamic batching.

Example

>>> api = DetectionAPI(checkpoint="model.ckpt")
>>> # Use with LitServer
>>> import litserve as ls
>>> server = ls.LitServer(api, accelerator="cuda")

setup(device)[source]¶

Setup method called by LitServe.

Parameters:: device (str) – Device assigned by LitServe.
Return type:: None

decode_request(request)[source]¶

Decode incoming request to image tensor.

Supports: - Base64-encoded images in ‘image’ field - Image URLs in ‘url’ field - Raw tensor data in ‘tensor’ field

Parameters:: request (dict[str, Any]) – Request dictionary.
Return type:: Tensor
Returns:: Image tensor (C, H, W).

predict(inputs)[source]¶

Run prediction on batch of inputs.

Parameters:: inputs (list[Tensor]) – List of image tensors.
Return type:: list[dict[str, Tensor]]
Returns:: List of prediction dictionaries.

encode_response(output)[source]¶

Encode prediction to JSON-serializable response.

Parameters:: output (dict[str, Tensor]) – Prediction dictionary with tensors.
Return type:: dict[str, Any]
Returns:: JSON-serializable response.

Request Format:

The API accepts requests with one of the following fields:

image: Base64-encoded image data
url: URL to fetch image from
tensor: Raw tensor data

Response Format:

{
  "detections": [
    {
      "box": [100, 50, 300, 400],
      "label": 1,
      "score": 0.95,
      "class_name": "person"
    }
  ],
  "num_detections": 5
}

ABTestingAPI¶

A/B testing wrapper for comparing multiple model versions.

class objdet.serving.api.ABTestingAPI(models, device='cuda')[source]¶

Bases: object

A/B testing wrapper for multiple detection models.

Routes requests to different model versions based on configured traffic splits.

Parameters:

models (dict[str, tuple[str | Path, float]]) – Dict mapping model name to (checkpoint_path, weight).
device (str) – Device for inference.

Example

>>> api = ABTestingAPI(
...     {
...         "v1": ("model_v1.ckpt", 0.7),  # 70% traffic
...         "v2": ("model_v2.ckpt", 0.3),  # 30% traffic
...     }
... )

setup(device)[source]¶

Setup all model APIs.

Return type:: None

decode_request(request)[source]¶

Decode request using first API.

Return type:: Tensor

predict(inputs)[source]¶

Run prediction with selected model.

Return type:: list[tuple[str, dict[str, Any]]]

encode_response(output)[source]¶

Encode response with model version info.

Return type:: dict[str, Any]

from objdet.serving.api import ABTestingAPI

# Configure models with traffic weights
api = ABTestingAPI(
    models={
        "v1": ("model_v1.ckpt", 0.7),  # 70% traffic
        "v2": ("model_v2.ckpt", 0.3),  # 30% traffic
    },
    device="cuda",
)

Response includes model version:

{
  "model_version": "v2",
  "detections": [{...}],
  "num_detections": 5
}