research

ms-Mamba outperforms Transformer models on time-series forecasting with fewer parameters

Researchers introduced ms-Mamba, a multi-scale Mamba architecture for time-series forecasting that outperforms recent Transformer and Mamba-based models while using significantly fewer parameters. On the Solar-Energy dataset, ms-Mamba achieved 0.229 mean-squared error versus 0.240 for S-Mamba while using only 3.53M parameters compared to 4.77M.

March 6, 2026 · 5:20 AM2 min read

ms-Mamba Outperforms State-of-the-Art Time-Series Models With Fewer Parameters

A new architecture called Multi-scale Mamba (ms-Mamba) achieves better performance than recent Transformer and Mamba-based models on time-series forecasting benchmarks while maintaining a smaller model footprint.

The Architecture

ms-Mamba addresses a fundamental limitation in existing forecasting models: they process temporal data at a single time scale. The new approach incorporates multiple temporal scales by using multiple Mamba blocks with different sampling rates (Δs), allowing the model to capture patterns across different time horizons simultaneously.

Mamba, a state-space model architecture introduced in 2023, has gained traction as a computationally efficient alternative to Transformers for sequence modeling. This work extends Mamba specifically for multi-scale temporal understanding.

Benchmark Results

On the Solar-Energy dataset, ms-Mamba achieved measurably better results than its closest competitor S-Mamba:

Mean-squared error: 0.229 vs. 0.240 (4.6% improvement)
Parameters: 3.53M vs. 4.77M (26% fewer)
Memory usage: 13.46MB vs. 18.18MB (26% reduction)
Computational operations: 14.93G vs. 20.53G MACs (27% reduction)

These metrics are averaged across four different forecast lengths. The results suggest ms-Mamba achieves better accuracy while being more efficient across multiple dimensions—a meaningful achievement in a field where model size and inference cost directly impact deployment feasibility.

Significance for Time-Series Forecasting

Time-series forecasting underpins critical applications: energy demand prediction, stock market analysis, weather forecasting, and supply chain optimization. Most existing deep learning approaches—recurrent networks, Transformers, and early Mamba models—treat temporal data uniformly, potentially missing patterns that emerge at different frequencies.

ms-Mamba's multi-scale approach mirrors successful techniques in computer vision (like feature pyramids) and signal processing (wavelet analysis), where capturing information at multiple resolutions improves model robustness. For time-series, this means simultaneously learning fast-changing dynamics (hourly fluctuations) and slow trends (seasonal patterns) in a single, parameter-efficient model.

Next Steps

The authors indicate that code and trained models will be made available, suggesting the work is intended for reproducibility and adoption by the research community.

What This Means

ms-Mamba demonstrates that the recently-popular Mamba architecture can be effectively adapted for domain-specific challenges with multi-scale structure. For practitioners, the efficiency gains (fewer parameters, lower memory, fewer FLOPs) without accuracy loss suggest ms-Mamba could reduce deployment costs for time-series applications at scale. The result also reinforces a broader trend: specialized architectures with domain-aware design beat general-purpose Transformers on specific tasks. However, this is an academic paper without commercial implementation; real-world adoption will depend on integration into forecasting platforms and validation on production datasets beyond the benchmarks tested.

Source: arxiv.org ↗

time-series-forecasting mamba architecture research efficiency benchmarks multi-scale state-space-models