Meta releases SAM 3.1, adding 7x faster multi-object tracking to vision foundation model
Meta has released SAM 3.1, an update to its Segment Anything Model that adds Object Multiplex, a shared-memory approach for joint multi-object tracking. The new version achieves approximately 7x faster inference when tracking 128 objects on a single H100 GPU while improving video object segmentation (VOS) performance on 6 out of 7 benchmarks.
Meta Releases SAM 3.1 with 7x Faster Multi-Object Tracking
Meta has released SAM 3.1, an updated version of its Segment Anything Model that introduces Object Multiplex, a shared-memory architecture designed for efficient joint multi-object tracking in videos.
Key Improvements
SAM 3.1 builds on SAM 3, Meta's unified foundation model for promptable segmentation that handles both images and videos. The new version maintains backward compatibility with SAM 3's core capabilities—detecting, segmenting, and tracking objects via text or visual prompts (points, boxes, masks)—while adding significant performance gains for tracking scenarios.
The Object Multiplex approach delivers approximately 7x faster inference when tracking 128 objects on a single H100 GPU, according to Meta's claims. The company states this speedup occurs without sacrificing accuracy. Additionally, SAM 3.1 shows improved video object segmentation (VOS) performance across 6 out of 7 benchmarks tested.
Concept Vocabulary Scale
SAM 3's underlying capability to handle open-vocabulary concepts remains central to SAM 3.1. The model can exhaustively segment all instances of concepts specified via short text phrases, handling over 50x more unique concepts than existing benchmarks, according to Meta.
Deployment Details
The SAM 3.1 model checkpoints are available on Hugging Face at facebook/sam3.1. Meta notes that there is no Hugging Face Transformers integration for this release. Users must access the SAM 3 GitHub repository for installation instructions, code examples, and full documentation. The model requires users to share contact information via Meta's privacy framework before access.
As of the latest data, the model has recorded 1,865 downloads in the previous month on Hugging Face. Currently, no inference providers have deployed SAM 3.1, meaning users must run the model locally or on their own infrastructure.
What This Means
SAM 3.1 positions Meta's segmentation foundation model for production use cases in video understanding and multi-object tracking—domains where inference speed directly impacts real-world feasibility. The 7x speedup at scale (128 objects) suggests the model targets enterprise applications in video analysis, autonomous systems, and robotics rather than single-object or lightweight use cases. The lack of inference provider deployment currently limits accessibility to developers with GPU infrastructure, though the open availability of checkpoints aligns with Meta's broader strategy of releasing foundational models for research and production adoption.
Related Articles
Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens
Baidu has released Qianfan-OCR-Fast, a multimodal model specialized for optical character recognition tasks. The model offers a 66,000 token context window and is priced at $0.68 per 1M input tokens and $2.81 per 1M output tokens.
DeepSeek Releases V4 Flash: 284B-Parameter MoE Model with 1M Context Window, Free via OpenRouter
DeepSeek has released V4 Flash, a Mixture-of-Experts model with 284B total parameters and 13B activated parameters per forward pass. The model supports a 1M-token context window and is available free through OpenRouter, targeting high-throughput coding and chat applications.
Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens
Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.
Mira Murati's Thinking Machines announces full-duplex AI model with 0.40-second response time
Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, announced TML-Interaction-Small, a full-duplex AI model that processes input while generating responses simultaneously. The company claims 0.40-second response time, matching natural human conversation speed.
Comments
Loading...