Microsoft Releases Fara-7B: 7B Parameter Computer Use Agent Trained in 2.5 Days on 64 H100s

TL;DR

Microsoft Research has released Fara-7B, a 7-billion parameter small language model designed for computer automation tasks. The model, which took 2.5 days to train on 64 H100 GPUs, can navigate websites to complete tasks like booking restaurants and shopping, using screenshots as input with a 128K token context window.

May 15, 2026 · 11:20 PM2 min read

Fara-7B — Quick Specs

Context window128K tokens

Compare Fara-7B with other models →

Microsoft Releases Fara-7B Computer Use Agent

Microsoft Research has released Fara-7B, a 7-billion parameter multimodal model specialized for automating web-based tasks. The model was trained between October 26-29, 2025, requiring 2.5 days on 64 H100 GPUs.

Model Specifications

Fara-7B is built on Qwen 2.5-VL (7B) and processes screenshots alongside text inputs to predict actions for computer use tasks. The model supports a 128K token context window and outputs chain-of-thought reasoning followed by structured tool calls indicating specific actions.

Inputs include user goals in text, current browser screenshots, and history of previous actions. The model directly predicts grounded actions with arguments, such as pixel coordinates for mouse clicks.

Training and Data

According to Microsoft, the model was trained on a "large-scale, fully synthetic dataset of action trajectories generated and verified by a multi-agent pipeline." The company has not disclosed specific dataset sizes or compositions. Training used both public and private data sources.

The model is English-only and follows a decoder-only architecture adapted for multimodal inputs.

Safety and Limitations

Microsoft claims Fara-7B incorporates "critical point recognition" to halt execution when tasks require user permission or sensitive information. The model is trained to refuse eight categories of harmful tasks, including illegal activities, deceptive tasks, harassment, and misinformation.

Critical points where the model stops include entering personal information, completing purchases, making phone calls, sending emails, and signing into accounts. The model underwent automated red teaming for grounding, jailbreaks, and copyright violations, though specific evaluation results were not disclosed.

Availability and Use Cases

The model is available under an MIT license on Hugging Face and Azure AI Foundry. Microsoft designed it for automating shopping, travel booking, restaurant reservations, and account workflows through step-by-step browser actions.

The company notes the model can be deployed on-device for privacy and lower latency. For inference, Microsoft requires using a specific ChatML template with a detailed system prompt defining function signatures and interaction guidelines.

What This Means

Fara-7B represents Microsoft's entry into computer use agents at a compact 7B parameter scale, competing with larger models from Anthropic and others in this emerging category. The 2.5-day training time on 64 H100s suggests efficient training methods, though the reliance on fully synthetic data raises questions about real-world performance compared to models trained on human demonstrations. The MIT license and focus on on-device deployment differentiate this from API-only alternatives, though the English-only limitation and lack of disclosed benchmark scores make direct performance comparisons difficult.

Source: huggingface.co ↗

Fara-7B Microsoft computer use agentic AI Qwen multimodal web automation synthetic data

model releaseJune 29, 2026

DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes

DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.

model releaseJune 29, 2026

DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3

DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.

model releaseJune 27, 2026

DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3

DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.