model release

Allen Institute Releases OlmoEarth v1.1 with 3x Compute Reduction for Satellite Imagery

TL;DR

Allen Institute (AI2) released OlmoEarth v1.1, a family of transformer-based models for satellite imagery processing that reduces compute costs by up to 3x compared to the original OlmoEarth v1. The efficiency gains come from collapsing Sentinel-2 resolution bands into single tokens, cutting sequence lengths by three times while maintaining benchmark performance.

2 min read
0

OlmoEarth v1.1: 3x Compute Reduction for Satellite Imagery Models

Allen Institute for AI (AI2) released OlmoEarth v1.1 on May 19, 2026, a family of transformer-based models that cuts compute costs by up to 3x compared to OlmoEarth v1 while maintaining performance on remote sensing benchmarks.

Technical Implementation

The efficiency gains stem from a fundamental redesign of how the model tokenizes Sentinel-2 satellite imagery. OlmoEarth v1 created separate tokens for each of Sentinel-2's three resolutions (10m, 20m, and 60m), generating 6 tokens per spatial patch per 2-timestep input. OlmoEarth v1.1 collapses these into single tokens, reducing token counts by three times.

This approach wasn't straightforward. AI2 reports that naive token merging caused a 10 percentage point drop on m-eurosat kNN, a standard remote sensing benchmark. The team modified their pretraining regimen to maintain cross-band relationship modeling, detailed in their technical report.

Model Family

AI2 released three model sizes:

  • Base
  • Tiny
  • Nano

All variants process Sentinel-2 data with tensors formatted as [H, W, T, D=12], where H and W represent latitudinal and longitudinal pixels, T is the temporal dimension, and D covers 12 Sentinel-2 channels.

Performance Trade-offs

AI2 states OlmoEarth v1.1 maintains similar performance to v1 on their benchmark mix and partner-constructed tasks, though the technical report notes "some regressions." Because both versions train on identical datasets, performance differences isolate methodological changes.

Compute costs in transformer models scale quadratically with token sequence length, making the 3x reduction in tokens significant for inference and fine-tuning. AI2 measured efficiency using MACs (multiply-accumulate operations per forward pass).

Deployment Context

Since OlmoEarth v1's November 2025 release, partners have deployed it for mangrove tracking, forest loss classification, and country-scale crop mapping. AI2 reports deployments now scale to national, continental, and global areas, with data export, preprocessing, inference, and post-processing dominated by compute costs.

What This Means

OlmoEarth v1.1 addresses the practical bottleneck in satellite imagery AI: compute cost at scale. A 3x reduction enables more frequent planet-scale map refreshes and lowers barriers for organizations without large compute budgets. For researchers, training on identical datasets to v1 creates a controlled comparison for studying pretraining methodologies in remote sensing.

The model family is available on Hugging Face with full training code. AI2 recommends existing OlmoEarth v1 users test v1.1 for their specific tasks given the documented performance regressions.

Related Articles

model release

xAI Launches Grok Build 0.1: Coding Model with 256K Context for Agentic Workflows

xAI has released Grok Build 0.1, a coding-specialized model with a 256K context window and unlimited text output. The model is designed for agentic software engineering workflows and powers xAI's Grok Build CLI tool.

model release

Stability AI Releases Stable Audio 3.0 Model Family Trained on Licensed Data

Stability AI has released Stable Audio 3.0, a model family for audio generation trained on fully licensed data. The company positions the release as a foundation for commercial audio applications, though specific technical specifications have not yet been disclosed.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

NemoStation releases Marlin-2B: 2-billion parameter video VLM achieves dense captioning performance between Tarsier-34B

NemoStation has released Marlin-2B, a 2-billion parameter video vision-language model that produces structured scene and event captions with second-precise timestamps. The model tops the CaReBench dense captioning leaderboard and sits between Tarsier-34B and Gemini-1.5-Pro on DREAM-1K, while matching Gemini-2.0-Flash on temporal grounding benchmarks.

Comments

Loading...