Microsoft Releases Fara-7B: 7B Parameter Computer Use Agent Trained in 2.5 Days on 64 H100s
Microsoft Research has released Fara-7B, a 7-billion parameter small language model designed for computer automation tasks. The model, which took 2.5 days to train on 64 H100 GPUs, can navigate websites to complete tasks like booking restaurants and shopping, using screenshots as input with a 128K token context window.
Microsoft Releases Fara-7B Computer Use Agent
Microsoft Research has released Fara-7B, a 7-billion parameter multimodal model specialized for automating web-based tasks. The model was trained between October 26-29, 2025, requiring 2.5 days on 64 H100 GPUs.
Model Specifications
Fara-7B is built on Qwen 2.5-VL (7B) and processes screenshots alongside text inputs to predict actions for computer use tasks. The model supports a 128K token context window and outputs chain-of-thought reasoning followed by structured tool calls indicating specific actions.
Inputs include user goals in text, current browser screenshots, and history of previous actions. The model directly predicts grounded actions with arguments, such as pixel coordinates for mouse clicks.
Training and Data
According to Microsoft, the model was trained on a "large-scale, fully synthetic dataset of action trajectories generated and verified by a multi-agent pipeline." The company has not disclosed specific dataset sizes or compositions. Training used both public and private data sources.
The model is English-only and follows a decoder-only architecture adapted for multimodal inputs.
Safety and Limitations
Microsoft claims Fara-7B incorporates "critical point recognition" to halt execution when tasks require user permission or sensitive information. The model is trained to refuse eight categories of harmful tasks, including illegal activities, deceptive tasks, harassment, and misinformation.
Critical points where the model stops include entering personal information, completing purchases, making phone calls, sending emails, and signing into accounts. The model underwent automated red teaming for grounding, jailbreaks, and copyright violations, though specific evaluation results were not disclosed.
Availability and Use Cases
The model is available under an MIT license on Hugging Face and Azure AI Foundry. Microsoft designed it for automating shopping, travel booking, restaurant reservations, and account workflows through step-by-step browser actions.
The company notes the model can be deployed on-device for privacy and lower latency. For inference, Microsoft requires using a specific ChatML template with a detailed system prompt defining function signatures and interaction guidelines.
What This Means
Fara-7B represents Microsoft's entry into computer use agents at a compact 7B parameter scale, competing with larger models from Anthropic and others in this emerging category. The 2.5-day training time on 64 H100s suggests efficient training methods, though the reliance on fully synthetic data raises questions about real-world performance compared to models trained on human demonstrations. The MIT license and focus on on-device deployment differentiate this from API-only alternatives, though the English-only limitation and lack of disclosed benchmark scores make direct performance comparisons difficult.
Related Articles
DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes
DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.
DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3
DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.
DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3
DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.
Anthropic's Fable 5 model expected to return next week after 15-day government shutdown
The Trump administration is close to allowing Anthropic to restore access to its Fable 5 model, which has been offline for 15 days due to national security concerns. Insiders expect restrictions could be lifted as soon as next week, though Pentagon and NSA approval is still required.
Comments
Loading...