Meta releases SAM 3.1, adding 7x faster multi-object tracking to vision foundation model
Meta has released SAM 3.1, an update to its Segment Anything Model that adds Object Multiplex, a shared-memory approach for joint multi-object tracking. The new version achieves approximately 7x faster inference when tracking 128 objects on a single H100 GPU while improving video object segmentation (VOS) performance on 6 out of 7 benchmarks.
Meta Releases SAM 3.1 with 7x Faster Multi-Object Tracking
Meta has released SAM 3.1, an updated version of its Segment Anything Model that introduces Object Multiplex, a shared-memory architecture designed for efficient joint multi-object tracking in videos.
Key Improvements
SAM 3.1 builds on SAM 3, Meta's unified foundation model for promptable segmentation that handles both images and videos. The new version maintains backward compatibility with SAM 3's core capabilities—detecting, segmenting, and tracking objects via text or visual prompts (points, boxes, masks)—while adding significant performance gains for tracking scenarios.
The Object Multiplex approach delivers approximately 7x faster inference when tracking 128 objects on a single H100 GPU, according to Meta's claims. The company states this speedup occurs without sacrificing accuracy. Additionally, SAM 3.1 shows improved video object segmentation (VOS) performance across 6 out of 7 benchmarks tested.
Concept Vocabulary Scale
SAM 3's underlying capability to handle open-vocabulary concepts remains central to SAM 3.1. The model can exhaustively segment all instances of concepts specified via short text phrases, handling over 50x more unique concepts than existing benchmarks, according to Meta.
Deployment Details
The SAM 3.1 model checkpoints are available on Hugging Face at facebook/sam3.1. Meta notes that there is no Hugging Face Transformers integration for this release. Users must access the SAM 3 GitHub repository for installation instructions, code examples, and full documentation. The model requires users to share contact information via Meta's privacy framework before access.
As of the latest data, the model has recorded 1,865 downloads in the previous month on Hugging Face. Currently, no inference providers have deployed SAM 3.1, meaning users must run the model locally or on their own infrastructure.
What This Means
SAM 3.1 positions Meta's segmentation foundation model for production use cases in video understanding and multi-object tracking—domains where inference speed directly impacts real-world feasibility. The 7x speedup at scale (128 objects) suggests the model targets enterprise applications in video analysis, autonomous systems, and robotics rather than single-object or lightweight use cases. The lack of inference provider deployment currently limits accessibility to developers with GPU infrastructure, though the open availability of checkpoints aligns with Meta's broader strategy of releasing foundational models for research and production adoption.
Related Articles
Anthropic's unreleased Mythos model enables autonomous large-scale cyberattacks, officials warn
Anthropic is privately warning top government officials that its unreleased model "Mythos" makes large-scale cyberattacks significantly more likely in 2026. The model enables AI agents to operate autonomously with high sophistication to penetrate corporate, government and municipal systems. One official told Axios a large-scale attack could occur this year as employees unknowingly create security vulnerabilities through unsupervised agentic AI use.
NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains
NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.
Kwaipilot releases KAT-Coder-Pro V2 with 256K context for enterprise coding
Kwaipilot released KAT-Coder-Pro V2, the latest model in its KAT-Coder series, on March 27, 2026. The model features a 256,000-token context window and is priced at $0.30 per million input tokens and $1.20 per million output tokens. It targets enterprise-grade software engineering with focus on multi-system coordination and web aesthetics generation.
Cohere releases 2B open-source speech model with 5.42% word error rate
Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.
Comments
Loading...