Serving API¶
API reference for model serving utilities.
Server¶
run_server¶
- objdet.serving.server.run_server(config_path=None, checkpoint_path=None, host='0.0.0.0', port=8000, workers_per_device=1, accelerator='auto', devices='auto', max_batch_size=8, batch_timeout=0.01, api_path='/predict')[source]¶
Run the detection inference server.
- Parameters:
config_path (
str|Path|None) – Path to serving configuration YAML.checkpoint_path (
str|Path|None) – Path to model checkpoint (if not in config).host (
str) – Host to bind server.port (
int) – Port to bind server.workers_per_device (
int) – Number of worker processes per device.accelerator (
Literal['auto','cpu','cuda','mps']) – Accelerator type (“auto”, “cuda”, “cpu”).devices (
Union[int,Literal['auto']]) – Number of devices or “auto”.max_batch_size (
int) – Maximum batch size for dynamic batching.batch_timeout (
float) – Timeout for batch collection (seconds).api_path (
str) – API endpoint path.
- Return type:
from objdet.serving import run_server
run_server(
checkpoint_path="model.ckpt",
host="0.0.0.0",
port=8000,
max_batch_size=8,
accelerator="cuda",
devices=1,
)
CLI Usage:
objdet serve --checkpoint model.ckpt --host 0.0.0.0 --port 8000
API Classes¶
DetectionAPI¶
LitServe API implementation for object detection inference.
- class objdet.serving.api.DetectionAPI(checkpoint_path, model_class=None, device='cuda', confidence_threshold=0.25, max_batch_size=8)[source]¶
Bases:
LitAPILitServe API for object detection.
This class implements the LitServe API interface for serving detection models. Supports dynamic batching and async processing.
- Parameters:
Example
>>> api = DetectionAPI(checkpoint="model.ckpt") >>> # Use with LitServer >>> import litserve as ls >>> server = ls.LitServer(api, accelerator="cuda")
- decode_request(request)[source]¶
Decode incoming request to image tensor.
Supports: - Base64-encoded images in ‘image’ field - Image URLs in ‘url’ field - Raw tensor data in ‘tensor’ field
Request Format:
The API accepts requests with one of the following fields:
image: Base64-encoded image dataurl: URL to fetch image fromtensor: Raw tensor data
Response Format:
{
"detections": [
{
"box": [100, 50, 300, 400],
"label": 1,
"score": 0.95,
"class_name": "person"
}
],
"num_detections": 5
}
ABTestingAPI¶
A/B testing wrapper for comparing multiple model versions.
- class objdet.serving.api.ABTestingAPI(models, device='cuda')[source]¶
Bases:
objectA/B testing wrapper for multiple detection models.
Routes requests to different model versions based on configured traffic splits.
- Parameters:
Example
>>> api = ABTestingAPI( ... { ... "v1": ("model_v1.ckpt", 0.7), # 70% traffic ... "v2": ("model_v2.ckpt", 0.3), # 30% traffic ... } ... )
from objdet.serving.api import ABTestingAPI
# Configure models with traffic weights
api = ABTestingAPI(
models={
"v1": ("model_v1.ckpt", 0.7), # 70% traffic
"v2": ("model_v2.ckpt", 0.3), # 30% traffic
},
device="cuda",
)
Response includes model version:
{
"model_version": "v2",
"detections": [{...}],
"num_detections": 5
}