Serving API

API reference for model serving utilities.

Server

run_server

objdet.serving.server.run_server(config_path=None, checkpoint_path=None, host='0.0.0.0', port=8000, workers_per_device=1, accelerator='auto', devices='auto', max_batch_size=8, batch_timeout=0.01, api_path='/predict')[source]

Run the detection inference server.

Parameters:
  • config_path (str | Path | None) – Path to serving configuration YAML.

  • checkpoint_path (str | Path | None) – Path to model checkpoint (if not in config).

  • host (str) – Host to bind server.

  • port (int) – Port to bind server.

  • workers_per_device (int) – Number of worker processes per device.

  • accelerator (Literal['auto', 'cpu', 'cuda', 'mps']) – Accelerator type (“auto”, “cuda”, “cpu”).

  • devices (Union[int, Literal['auto']]) – Number of devices or “auto”.

  • max_batch_size (int) – Maximum batch size for dynamic batching.

  • batch_timeout (float) – Timeout for batch collection (seconds).

  • api_path (str) – API endpoint path.

Return type:

None

from objdet.serving import run_server

run_server(
    checkpoint_path="model.ckpt",
    host="0.0.0.0",
    port=8000,
    max_batch_size=8,
    accelerator="cuda",
    devices=1,
)

CLI Usage:

objdet serve --checkpoint model.ckpt --host 0.0.0.0 --port 8000

API Classes

DetectionAPI

LitServe API implementation for object detection inference.

class objdet.serving.api.DetectionAPI(checkpoint_path, model_class=None, device='cuda', confidence_threshold=0.25, max_batch_size=8)[source]

Bases: LitAPI

LitServe API for object detection.

This class implements the LitServe API interface for serving detection models. Supports dynamic batching and async processing.

Parameters:
  • checkpoint_path (str | Path) – Path to model checkpoint.

  • model_class (type | None) – Model class (if not inferrable from checkpoint).

  • device (str) – Device for inference.

  • confidence_threshold (float) – Minimum confidence for predictions.

  • max_batch_size (int) – Maximum batch size for dynamic batching.

Example

>>> api = DetectionAPI(checkpoint="model.ckpt")
>>> # Use with LitServer
>>> import litserve as ls
>>> server = ls.LitServer(api, accelerator="cuda")
setup(device)[source]

Setup method called by LitServe.

Parameters:

device (str) – Device assigned by LitServe.

Return type:

None

decode_request(request)[source]

Decode incoming request to image tensor.

Supports: - Base64-encoded images in ‘image’ field - Image URLs in ‘url’ field - Raw tensor data in ‘tensor’ field

Parameters:

request (dict[str, Any]) – Request dictionary.

Return type:

Tensor

Returns:

Image tensor (C, H, W).

predict(inputs)[source]

Run prediction on batch of inputs.

Parameters:

inputs (list[Tensor]) – List of image tensors.

Return type:

list[dict[str, Tensor]]

Returns:

List of prediction dictionaries.

encode_response(output)[source]

Encode prediction to JSON-serializable response.

Parameters:

output (dict[str, Tensor]) – Prediction dictionary with tensors.

Return type:

dict[str, Any]

Returns:

JSON-serializable response.

Request Format:

The API accepts requests with one of the following fields:

  • image: Base64-encoded image data

  • url: URL to fetch image from

  • tensor: Raw tensor data

Response Format:

{
  "detections": [
    {
      "box": [100, 50, 300, 400],
      "label": 1,
      "score": 0.95,
      "class_name": "person"
    }
  ],
  "num_detections": 5
}

ABTestingAPI

A/B testing wrapper for comparing multiple model versions.

class objdet.serving.api.ABTestingAPI(models, device='cuda')[source]

Bases: object

A/B testing wrapper for multiple detection models.

Routes requests to different model versions based on configured traffic splits.

Parameters:
  • models (dict[str, tuple[str | Path, float]]) – Dict mapping model name to (checkpoint_path, weight).

  • device (str) – Device for inference.

Example

>>> api = ABTestingAPI(
...     {
...         "v1": ("model_v1.ckpt", 0.7),  # 70% traffic
...         "v2": ("model_v2.ckpt", 0.3),  # 30% traffic
...     }
... )
setup(device)[source]

Setup all model APIs.

Return type:

None

decode_request(request)[source]

Decode request using first API.

Return type:

Tensor

predict(inputs)[source]

Run prediction with selected model.

Return type:

list[tuple[str, dict[str, Any]]]

encode_response(output)[source]

Encode response with model version info.

Return type:

dict[str, Any]

from objdet.serving.api import ABTestingAPI

# Configure models with traffic weights
api = ABTestingAPI(
    models={
        "v1": ("model_v1.ckpt", 0.7),  # 70% traffic
        "v2": ("model_v2.ckpt", 0.3),  # 30% traffic
    },
    device="cuda",
)

Response includes model version:

{
  "model_version": "v2",
  "detections": [{...}],
  "num_detections": 5
}