Allen Institute Releases OlmoEarth v1.1 with 3x Compute Reduction for Satellite Imagery
Allen Institute (AI2) released OlmoEarth v1.1, a family of transformer-based models for satellite imagery processing that reduces compute costs by up to 3x compared to the original OlmoEarth v1. The efficiency gains come from collapsing Sentinel-2 resolution bands into single tokens, cutting sequence lengths by three times while maintaining benchmark performance.
OlmoEarth v1.1: 3x Compute Reduction for Satellite Imagery Models
Allen Institute for AI (AI2) released OlmoEarth v1.1 on May 19, 2026, a family of transformer-based models that cuts compute costs by up to 3x compared to OlmoEarth v1 while maintaining performance on remote sensing benchmarks.
Technical Implementation
The efficiency gains stem from a fundamental redesign of how the model tokenizes Sentinel-2 satellite imagery. OlmoEarth v1 created separate tokens for each of Sentinel-2's three resolutions (10m, 20m, and 60m), generating 6 tokens per spatial patch per 2-timestep input. OlmoEarth v1.1 collapses these into single tokens, reducing token counts by three times.
This approach wasn't straightforward. AI2 reports that naive token merging caused a 10 percentage point drop on m-eurosat kNN, a standard remote sensing benchmark. The team modified their pretraining regimen to maintain cross-band relationship modeling, detailed in their technical report.
Model Family
AI2 released three model sizes:
- Base
- Tiny
- Nano
All variants process Sentinel-2 data with tensors formatted as [H, W, T, D=12], where H and W represent latitudinal and longitudinal pixels, T is the temporal dimension, and D covers 12 Sentinel-2 channels.
Performance Trade-offs
AI2 states OlmoEarth v1.1 maintains similar performance to v1 on their benchmark mix and partner-constructed tasks, though the technical report notes "some regressions." Because both versions train on identical datasets, performance differences isolate methodological changes.
Compute costs in transformer models scale quadratically with token sequence length, making the 3x reduction in tokens significant for inference and fine-tuning. AI2 measured efficiency using MACs (multiply-accumulate operations per forward pass).
Deployment Context
Since OlmoEarth v1's November 2025 release, partners have deployed it for mangrove tracking, forest loss classification, and country-scale crop mapping. AI2 reports deployments now scale to national, continental, and global areas, with data export, preprocessing, inference, and post-processing dominated by compute costs.
What This Means
OlmoEarth v1.1 addresses the practical bottleneck in satellite imagery AI: compute cost at scale. A 3x reduction enables more frequent planet-scale map refreshes and lowers barriers for organizations without large compute budgets. For researchers, training on identical datasets to v1 creates a controlled comparison for studying pretraining methodologies in remote sensing.
The model family is available on Hugging Face with full training code. AI2 recommends existing OlmoEarth v1 users test v1.1 for their specific tasks given the documented performance regressions.
Related Articles
xAI Launches Grok Build 0.1: Coding Model with 256K Context for Agentic Workflows
xAI has released Grok Build 0.1, a coding-specialized model with a 256K context window and unlimited text output. The model is designed for agentic software engineering workflows and powers xAI's Grok Build CLI tool.
Stability AI Releases Stable Audio 3.0 Model Family Trained on Licensed Data
Stability AI has released Stable Audio 3.0, a model family for audio generation trained on fully licensed data. The company positions the release as a foundation for commercial audio applications, though specific technical specifications have not yet been disclosed.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
NemoStation releases Marlin-2B: 2-billion parameter video VLM achieves dense captioning performance between Tarsier-34B
NemoStation has released Marlin-2B, a 2-billion parameter video vision-language model that produces structured scene and event captions with second-precise timestamps. The model tops the CaReBench dense captioning leaderboard and sits between Tarsier-34B and Gemini-1.5-Pro on DREAM-1K, while matching Gemini-2.0-Flash on temporal grounding benchmarks.
Comments
Loading...