research

Meta's TRIBE v2 AI predicts brain activity from images, audio, and speech with 70,000-voxel fMRI mapping

TL;DR

Meta's FAIR lab released TRIBE v2, an AI model that predicts human brain activity from images, audio, and text. Trained on over 1,000 hours of fMRI data from 720 subjects, the model maps predictions to 70,000 voxels and often matches group-average brain responses more accurately than individual brain scans.

3 min read
0

Meta's TRIBE v2 AI Predicts Brain Activity from Images, Audio, and Speech

Meta's FAIR lab released TRIBE v2, an AI model that predicts how the human brain responds to visual, auditory, and language stimuli. Trained on more than 1,000 hours of fMRI data from 720 subjects, the model can map predictions to 70,000 voxels—the 3D units that comprise an fMRI scan—and often outperforms individual brain scans when compared to group-average responses.

Architecture and Training

TRIBE v2 processes three input types through separate pre-trained Meta models: Llama 3.2 for text, Wav2Vec-Bert-2.0 for audio, and Video-JEPA-2 for video. These models generate embeddings capturing semantic content. A transformer then processes all three representations jointly, identifying patterns across stimuli and subjects. A final person-specific layer translates output into predicted brain activation maps.

The model generalizes to new subjects without retraining and shows steady accuracy improvements with additional training data—a scaling pattern similar to large language models.

Performance and Validation

In testing, TRIBE v2's predictions correlated more strongly with actual group-average brain responses than individual subjects' scans in many cases. On the Human Connectome Project dataset (7 Tesla fMRI), the model achieved twice the median individual subject's correlation with group averages.

Compared to optimized linear baseline models, TRIBE v2 showed significant improvements across all datasets tested. The predecessor TRIBE v1—trained on only four subjects predicting 1,000 voxels—won the Algonauts 2025 competition against 263 other teams, suggesting v2's substantially improved scope and accuracy.

Replicating Decades of Neuroscience

Using controlled test protocols from the Individual Brain Charting dataset, researchers validated TRIBE v2 against classical neuroscience findings. In visual tasks, the model correctly identified specialized brain regions for faces, places, bodies, and characters. In language experiments, it replicated known patterns: distinguishing speech from silence, emotional from physical pain processing, and showing expected left-hemisphere activation for complete sentences versus word lists.

By selectively disabling input channels, the team mapped which sensory modality drives activity in specific regions: audio predicts activity near auditory cortex, video maps to visual cortex, text activates language areas. In multimodal regions like the temporal-parietal-occipital junction, using all three channels improved prediction accuracy by up to 50 percent versus single channels.

Significant Limitations

TRIBE v2 treats the brain as a passive sensory receiver without modeling active decision-making or motor output. fMRI's indirect measurement via blood flow introduces multi-second delays, obscuring millisecond-scale neural dynamics. The model covers only three sensory channels; smell, touch, and balance remain unmapped. It cannot capture developmental changes or clinical conditions.

Accuracy varies by stimulus type and brain region, with some areas showing notably lower prediction quality than others.

Availability and Impact

Meta has released TRIBE v2's code, weights, and an interactive demo freely. The researchers propose three primary use cases: planning expensive neuroscience experiments computationally before lab time, building more brain-like AI architectures, and accelerating neuroscience research by reducing measurement bottlenecks.

For neuroscience, the model could substantially lower research costs by allowing researchers to prototype experiments computationally before committing resources to actual fMRI studies.

What This Means

TRIBE v2 demonstrates that large-scale multimodal AI models trained on neuroimaging data can capture generalizable patterns of human brain function. This has immediate practical value for neuroscience labs, potentially cutting experimental timelines and costs. However, the model's limitations—treating the brain as passive, missing temporal resolution, and incomplete sensory coverage—mean it complements rather than replaces empirical neuroscience. The scaling improvements with more training data suggest future versions will improve as fMRI datasets expand.

Related Articles

research

AI2 Research: Hybrid Models Excel at Content Words, Transformers Better at Token Repetition

Allen Institute for AI researchers conducted token-level analysis comparing their 7B-parameter Olmo 3 transformer and Olmo Hybrid models. The study finds hybrid architectures show a loss gap advantage of 0.04 on content words (nouns, verbs, adjectives) versus 0.02 on function words, while transformers match or exceed hybrids on repeated tokens and closing braces.

research

Mistral AI fine-tunes Pixtral-12B on satellite imagery, boosting classification accuracy from 56% to 91%

Mistral AI has published research showing that fine-tuning its Pixtral-12B vision language model on satellite imagery increases classification accuracy from 56% to 91% on the Aerial Image Dataset. Using Low-Rank Adaptation (LoRA) with 8,000 training samples across 30 scene categories, the company reduced hallucinations from 5% to 0.1% for under $10 in compute costs.

research

NVIDIA Shows Task-Seeded Synthetic Data Boosts Nemotron-3 Nano by +11.1 on GPQA

NVIDIA demonstrated that task-seeded synthetic Q&A data improves model performance across multiple benchmarks in a 100B-token continuation experiment on Nemotron-3 Nano. The approach improved GPQA scores by +11.1 points, MMLU-Pro by +1.8, average code by +1.9, and commonsense understanding by +1.6.

research

OpenAI claims reasoning model disproved 80-year-old Erdős conjecture in geometry

OpenAI claims its new reasoning model has produced an original mathematical proof disproving a geometry conjecture first posed by Paul Erdős in 1946. The company says this is the first time AI has autonomously solved a prominent open problem central to a field of mathematics, with verification from mathematicians including Thomas Bloom and Noga Alon.

Comments

Loading...