research

Timer-S1: 8.3B time series foundation model achieves state-of-the-art forecasting on GIFT-Eval

Researchers have introduced Timer-S1, a Mixture-of-Experts time series foundation model with 8.3 billion total parameters and 750 million activated parameters per token. The model achieves state-of-the-art forecasting performance on the GIFT-Eval leaderboard, with the best MASE and CRPS scores among pre-trained models.

2 min read

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Researchers have published Timer-S1, a Mixture-of-Experts (MoE) time series foundation model designed to overcome scalability limitations in existing pre-trained time series systems. The model contains 8.3 billion total parameters with 750 million activated parameters per token and supports a context length of 11.5K tokens.

Architecture and Training Approach

Timer-S1 employs a novel "Serial Scaling" strategy across three dimensions: model architecture, dataset, and training pipeline. The architecture integrates sparse TimeMoE blocks with generic TimeSTP (Serial-Token Prediction) blocks, which the researchers position as a more appropriate training objective for time series forecasting than standard next-token prediction.

The key innovation lies in the Serial-Token Prediction paradigm, which introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and error accumulation that occurs with conventional next-token prediction approaches.

Dataset and Post-Training

The team curated TimeBench, a training corpus containing one trillion time points. The dataset includes meticulous data augmentation to mitigate predictive bias across different time series domains.

The training pipeline incorporates a post-training stage with two components: continued pre-training to enhance short-term performance and long-context extension to optimize for longer time horizons.

Benchmark Performance

Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance. The model attains the best MASE (Mean Absolute Scaled Error) and CRPS (Continuous Ranked Probability Score) scores among all pre-trained models tested.

MASE and CRPS are standard metrics for time series forecasting accuracy. MASE measures forecast error relative to a baseline, while CRPS evaluates probabilistic forecasts by comparing predicted distributions to actual outcomes.

Release and Availability

The researchers state that Timer-S1 will be released to facilitate further research, though specific details regarding code and model availability are not yet disclosed in the abstract.

What This Means

Timer-S1 represents a step forward in scaling time series models to match the parameter counts and architectural sophistication of language models. The combination of sparse MoE architecture and Serial-Token Prediction suggests a different approach is needed for time series than for language modeling—serial nature of forecasting demands different training objectives and inference strategies. If the model is released as stated, it could serve as a foundation for downstream time series applications across finance, energy, climate, and other domains. The billion-scale parameter count and demonstrated SOTA performance indicate that time series modeling has reached the stage where massive pre-trained foundation models are becoming competitive with domain-specific approaches.