Alibaba Qwen Releases 35B Language World Model for Agent Environment Simulation Across 7 Domains
Alibaba's Qwen team released Qwen-AgentWorld-35B-A3B, a 35 billion parameter language world model designed for agentic environment simulation. The model covers seven domains—MCP tool calling, Search, Terminal, Software Engineering, Android, Web, and OS—in a single model with a 262,144 token context window.
Qwen-AgentWorld-35B-A3B — Quick Specs
Alibaba Qwen Releases 35B Language World Model for Agent Environment Simulation
Alibaba's Qwen team released Qwen-AgentWorld-35B-A3B, a 35 billion parameter language world model designed for agentic environment simulation. The model covers seven domains—MCP tool calling, Search, Terminal, Software Engineering, Android, Web, and OS—in a single model with a 262,144 token context window.
Architecture and Training
Qwen-AgentWorld-35B-A3B is built on Qwen3.5-35B-A3B-Base with 35 billion total parameters and 3 billion activated parameters using a Mixture of Experts (MoE) architecture. The model employs 256 experts with 8 activated experts plus 1 shared expert per layer.
According to Qwen, the model was trained through a three-stage pipeline: continual pre-training (CPT) to inject environment knowledge, supervised fine-tuning (SFT) to activate next-state-prediction reasoning, and reinforcement learning (RL) using GSPO to improve simulation fidelity. The team claims this makes it a "native world model" where environment modeling is the training objective from CPT onward, not a post-hoc adaptation.
The architecture uses 40 layers with a hidden dimension of 2048 and combines Gated DeltaNet and Gated Attention mechanisms. The model supports rotary position embeddings with dimension 64.
Benchmark Performance
On AgentWorldBench, Qwen-AgentWorld-35B-A3B achieved an overall score of 56.39 across seven domains, evaluated on five dimensions: Format, Factuality, Consistency, Realism, and Quality. This places it between GPT-4o (58.25) and Claude Opus 4.6 (57.80) in overall performance.
The model's strongest performance was in the MCP domain (64.79) and Software Engineering (65.63), while Search (36.69) and Web (49.55) showed lower scores. According to Qwen, the model demonstrates zero-shot generalization to out-of-domain environments and supports controllable perturbations.
Deployment and Usage
The model is available on Hugging Face and compatible with vLLM, SGLang, and Transformers. Qwen recommends running with a minimum context length of 128K tokens despite the 262K maximum, as the model leverages extended context for multi-turn environment simulation.
Recommended inference parameters are temperature=0.6, top_p=0.95, and top_k=20, with an output length of 32,768 tokens for most queries. The model uses a thinking mode by default (enclosed in <think>...</think> tags) to reason about environment state transitions before producing predictions.
Pricing information has not been disclosed. The model requires tensor parallelism across 4 GPUs for deployment.
What This Means
Qwen-AgentWorld represents a shift toward specialized models trained explicitly for agent simulation rather than repurposing general-purpose language models. The unified seven-domain approach in a single 35B parameter model suggests environment simulation may not require frontier-scale models, though performance still trails GPT-4o. The MoE architecture with only 3B activated parameters makes deployment more efficient than dense models of similar total size, potentially enabling faster local agent development workflows. Whether language world models become standard infrastructure for agent development will depend on whether simulated environments adequately replace real environment testing.
Related Articles
NVIDIA Releases Quantized DiffusionGemma 26B: 1,100+ Tokens/Second with 256K Context Window
NVIDIA released a quantized version of Google DeepMind's DiffusionGemma 26B A4B IT, a multimodal model with 25.2B total parameters (3.8B active) that processes text, image, and video inputs. The NVFP4-quantized model achieves generation speeds exceeding 1,100 tokens per second on NVIDIA H100 GPUs while supporting a 256K token context window.
Sakana AI Releases Fugu Ultra: Multi-Agent Orchestration System with 1M Context Window at $5/$30 per Million Tokens
Sakana AI has released Fugu Ultra, a multi-agent orchestration system that routes tasks across pools of underlying models rather than operating as a single monolithic model. The system supports a 1M token context window and is priced at $5 per million input tokens and $30 per million output tokens.
Krea Releases 12-Billion Parameter Text-to-Image Model with 8-Step Generation
Krea.ai released Krea 2 Turbo, a 12-billion parameter diffusion transformer model for text-to-image generation. The open-weight model generates images in 8 inference steps and supports resolutions up to 2048x2048 pixels.
Mistral OCR 4 Launches With Bounding Boxes, 170 Language Support at $2-4 Per 1,000 Pages
Mistral AI released OCR 4, a compact document extraction model that returns bounding boxes, block classification, and inline confidence scores alongside text. The model supports 170 languages, scores 85.20 on OlmOCRBench, and is priced at $4 per 1,000 pages via API ($2 with batch discount) or $5 per 1,000 pages through Document AI.
Comments
Loading...