Toto
Time-series foundation model family from Datadog. Trained on 2 trillion data points of real-world infrastructure, business, and IoT metrics. Toto 2.0 ships in five sizes from 4M to 2.5B parameters, so you can trade off forecast quality against memory and latency.
Quick start
With no flags, routeframe pull grabs the latest version and smallest parameter count:
That pulls Toto 2.0 / 4M (~16 MB). To see everything available, run:
Versions & sizes
Toto 2.0 default
Current generation. u-µP-scaled decoder with alternating time / variate attention and a quantile output head.
| Params | Size (f32) | Pull command |
|---|---|---|
| 4M default | 16 MB | routeframe pull toto |
| 22M | 84 MB | routeframe pull toto --params 22m |
| 313M | 1.2 GB | routeframe pull toto --params 313m |
| 1B | 3.9 GB | routeframe pull toto --params 1b |
| 2.5B | 9.1 GB | routeframe pull toto --params 2.5b |
Toto 1.0
Original Toto-Open-Base model. Mixture-of-Student-T output head. Supports multivariate inputs and exogenous covariates (via fine-tuning).
| Params | Size | Pull command |
|---|---|---|
| Open-Base (200M) | 289 MB (f16), 577 MB (f32) | routeframe pull toto --version 1.0 |
Pre-v0.7.0 routeframe clients still default to this version, so existing installations are unaffected by the 2.0 release.
Architecture
Toto 2.0
- Architecture
- Decoder-only transformer with u-µP scaling
- Attention
- Alternating time / variate, partial RoPE + xPos on Q/K
- Output head
- Quantile knots at [0.1, 0.2, ..., 0.9]
- Scaler
- Patched causal std-mean + asinh
- Default context
- 512 timesteps
- Patch size
- 32
Toto 1.0
- Architecture
- Decoder-only transformer (200M Open-Base)
- Output head
- Mixture of Student-T (24 components)
- Scaler
- Causal patch std-mean
- Default context
- 4096 timesteps
- Patch size
- 64
- Multivariate
- Yes (supports exogenous covariates via fine-tuning)
Feature support
| Feature | Toto 1.0 | Toto 2.0 |
|---|---|---|
| Zero-shot forecasting | ✓ | ✓ |
| Fine-tuning | ✓ | × |
| Exogenous covariates* | ✓ | × |
| Multivariate inputs | ✓ | ✓ |
* If you need exogenous covariates, use Toto 1.0.
Run a forecast
See the CLI docs for monitoring, exogenous covariates, and fine-tuning.