Microsoft Releases Fara-7B: 7B Parameter Computer Use Agent Trained in 2.5 Days on 64 H100s
Microsoft Research has released Fara-7B, a 7-billion parameter small language model designed for computer automation tasks. The model, which took 2.5 days to train on 64 H100 GPUs, can navigate websites to complete tasks like booking restaurants and shopping, using screenshots as input with a 128K token context window.
Microsoft Releases Fara-7B Computer Use Agent
Microsoft Research has released Fara-7B, a 7-billion parameter multimodal model specialized for automating web-based tasks. The model was trained between October 26-29, 2025, requiring 2.5 days on 64 H100 GPUs.
Model Specifications
Fara-7B is built on Qwen 2.5-VL (7B) and processes screenshots alongside text inputs to predict actions for computer use tasks. The model supports a 128K token context window and outputs chain-of-thought reasoning followed by structured tool calls indicating specific actions.
Inputs include user goals in text, current browser screenshots, and history of previous actions. The model directly predicts grounded actions with arguments, such as pixel coordinates for mouse clicks.
Training and Data
According to Microsoft, the model was trained on a "large-scale, fully synthetic dataset of action trajectories generated and verified by a multi-agent pipeline." The company has not disclosed specific dataset sizes or compositions. Training used both public and private data sources.
The model is English-only and follows a decoder-only architecture adapted for multimodal inputs.
Safety and Limitations
Microsoft claims Fara-7B incorporates "critical point recognition" to halt execution when tasks require user permission or sensitive information. The model is trained to refuse eight categories of harmful tasks, including illegal activities, deceptive tasks, harassment, and misinformation.
Critical points where the model stops include entering personal information, completing purchases, making phone calls, sending emails, and signing into accounts. The model underwent automated red teaming for grounding, jailbreaks, and copyright violations, though specific evaluation results were not disclosed.
Availability and Use Cases
The model is available under an MIT license on Hugging Face and Azure AI Foundry. Microsoft designed it for automating shopping, travel booking, restaurant reservations, and account workflows through step-by-step browser actions.
The company notes the model can be deployed on-device for privacy and lower latency. For inference, Microsoft requires using a specific ChatML template with a detailed system prompt defining function signatures and interaction guidelines.
What This Means
Fara-7B represents Microsoft's entry into computer use agents at a compact 7B parameter scale, competing with larger models from Anthropic and others in this emerging category. The 2.5-day training time on 64 H100s suggests efficient training methods, though the reliance on fully synthetic data raises questions about real-world performance compared to models trained on human demonstrations. The MIT license and focus on on-device deployment differentiate this from API-only alternatives, though the English-only limitation and lack of disclosed benchmark scores make direct performance comparisons difficult.
Related Articles
Microsoft Edge adds Copilot feature to analyze content across all open browser tabs
Microsoft is updating Edge to let Copilot read and analyze content across all open browser tabs simultaneously. The update includes AI-generated podcasts from tabs, study mode with quizzes, and long-term conversation memory.
Microsoft Cancels Claude Code Licenses, Pushes Developers to GitHub Copilot CLI
Microsoft is removing Claude Code access from its Experiences + Devices division by June 30, 2026, redirecting thousands of engineers to GitHub Copilot CLI instead. The decision follows six months of Claude Code proving more popular than Microsoft's own coding tool among internal developers.
Microsoft Edge mobile adds multi-tab summarization, podcast generation, and browsing history recall via Copilot
Microsoft Edge mobile version 148 and higher integrates six AI-powered features from its desktop version, including the ability to summarize multiple tabs simultaneously, generate podcasts from web pages, and recall browsing history for continued conversations. The update also adds a Journeys feature that tracks research topics and a Study and Learn mode for interactive quizzes.
Baidu Releases Qianfan-OCR-Fast Model with 66K Context at $0.68 Per 1M Input Tokens
Baidu has released Qianfan-OCR-Fast, a multimodal model specialized for optical character recognition tasks. The model offers a 66,000 token context window and is priced at $0.68 per 1M input tokens and $2.81 per 1M output tokens.
Comments
Loading...