API Reference
Start the REST server with routeframe serve (default: localhost:11435). All endpoints accept and return JSON.
POST /api/forecast
Run a time series forecast with a loaded model.
Request body
{
"model": "toto",
"input": [[20, 22, 19, 23, 21, 25, 24, 26, 27, 25]],
"prediction_length": 4,
"future_exogenous": [[0, 1, 1, 0]]
}
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name: toto, timesfm, chronos2 |
input | float[][] | Yes | Array of time series arrays. One inner array per variate. |
prediction_length | int | No | Number of future steps to predict. Default: 1. |
future_exogenous | float[][] | No | Known future covariate values (one array per covariate, length = prediction_length). |
Response
{
"model": "toto",
"mean": [[68.42, 71.05, 69.88, 72.31]],
"inference_time_ms": 4
}
| Field | Type | Description |
|---|---|---|
mean | float[][] | Point predictions. One inner array per variate. |
inference_time_ms | int | Wall-clock inference latency in milliseconds. |
GET /api/tags
List all models that are downloaded and ready to use.
{
"models": [
{
"name": "toto",
"architecture": "toto2",
"size_bytes": 16581280
},
{
"name": "timesfm",
"architecture": "timesfm",
"size_bytes": 462817120
}
]
}
POST /api/pull
Download a model from the registry in the background. The model will be available for inference once the download completes. With no flags, pulls the latest version and smallest parameter count.
// Request -- pull the default for this model
{ "model": "toto" }
// Request -- pin a specific version + parameter count + variant
{ "model": "toto", "version": "2.0", "params": "1b", "variant": "f32" }
// Response
{ "status": "started", "model": "toto" }
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name: toto, timesfm, chronos2. |
version | string | No | Model version (e.g. 1.0, 2.0). Defaults to the registry's default_version. |
params | string | No | Parameter-count label (e.g. 4m, 1b, open-base). Defaults to the version's default_params. |
variant | string | No | File variant: f16 (half-precision) or f32 (full precision). Defaults to f16 when available, else f32. |
{"model","variant"}) continues to work; it resolves to the model's legacy variants block. Toto 2.0 ships f32 only — see the Toto model page for why.POST /api/finetune
Start a fine-tuning job in the background. The job runs asynchronously — poll /api/finetune/status to track progress.
// Request
{
"output_model": "my-model",
"data_path": "/path/to/data.csv",
"targets": ["cpu_usage"],
"exogenous": ["is_holiday"],
"steps": 1400
}
// Response
{ "job_id": "ft-1", "status": "running" }
| Field | Type | Description |
|---|---|---|
output_model | string | Name for the saved fine-tuned model. |
data_path | string | Absolute path to a CSV file with a timestamp column and target column(s). |
targets | string[] | Column names to forecast. |
exogenous | string[] | Optional known-future covariate column names. |
steps | int | Training steps. Default: 1000. |
Check job status
{
"job_id": "ft-1",
"status": "completed",
"output_model": "my-model",
"error": null
}
Status values: running, completed, failed.
POST /api/show
Return metadata for a specific model.
// Request
{ "model": "toto" }
// Response
{
"name": "toto",
"architecture": "toto2",
"size_bytes": 16581280,
"path": "/Users/you/.routeframe/models/toto.gguf"
}
DELETE /api/delete
Remove a downloaded model from disk.
// Request
{ "model": "toto" }
// Response
{ "status": "deleted", "model": "toto" }
GET /health
Health check. Returns 200 OK when the server is ready.
{ "status": "ok", "version": "0.7.3" }
Error responses
All endpoints return standard HTTP status codes. Error bodies follow this shape:
{
"error": "model 'toto' not found — run: routeframe pull toto"
}
| Status | Meaning |
|---|---|
400 | Bad request — missing or invalid fields in the request body. |
404 | Model or resource not found. |
500 | Inference or internal error. |