product updateMicrosoft

Microsoft expands Copilot Cowork with AI model critique feature and cross-model comparison

TL;DR

Microsoft is expanding Copilot Cowork availability and introducing a Critique function that enables one AI model to review another's output. The update also includes a new Researcher agent claiming best-in-class deep research performance, outperforming Perplexity by 7 points, and a Model Council feature for direct model comparison.

2 min read
0

Microsoft Expands Copilot Cowork With AI Models Reviewing Each Other's Work

Microsoft is broadening access to Copilot Cowork and introducing a new Critique function that lets AI models evaluate each other's outputs, part of Wave 3 of Microsoft 365 Copilot.

The expanded Copilot Cowork feature builds on the previously announced Claude Cowork capability, enabling the system to handle multi-step tasks using tools, accessing and outputting files, calendar planning, and daily briefings. The feature is now available through Microsoft's Frontier program.

AI Models Checking Each Other's Work

The new Critique function represents a shift toward ensemble model validation. In this workflow, one AI model generates a draft response while a second model reviews and critiques the output. Microsoft draws from both Anthropic and OpenAI models for this capability, allowing different model combinations to work in tandem.

This approach addresses a persistent challenge in AI deployment: single models can propagate errors or miss nuances without external validation. By enabling cross-model review, Microsoft is attempting to improve output quality through algorithmic consensus.

Researcher Agent Performance Claims

Microsoft introduced a new Researcher tool featuring the Critique function and claims it achieves "best-in-class deep research performance." According to Microsoft's benchmark, the Researcher agent outperforms Perplexity with Claude Opus 4.6 by 7 points.

However, the benchmark notably excludes comparison with OpenAI's GPT-5-based Deep Research, limiting assessment of competitive positioning in this capability area.

Model Council for Side-by-Side Comparison

A new Model Council feature allows users to compare answers from different AI models simultaneously, displaying where models agree or diverge. This provides transparency into model behavior and reasoning differences, helping users identify which model performs better for specific tasks.

The feature addresses a practical pain point for organizations deploying multiple models: without direct comparison tools, determining model strengths for different use cases requires manual testing.

What This Means

Microsoft's emphasis on AI-to-AI validation and explicit model comparison reflects industry movement toward collaborative and competitive model architectures. Rather than optimizing single models in isolation, these updates suggest a strategy of leveraging multiple models as checks on each other—reducing hallucination, improving reasoning accuracy, and giving enterprise users visible control over model selection.

The Critique function's reliance on both Anthropic and OpenAI models demonstrates Microsoft's hedging strategy in the multimodel ecosystem. However, the absence of OpenAI's latest deep research tool from benchmarks raises questions about how these capabilities stack up against competitors' newest offerings. The limited claim (7-point margin over one competing tool) suggests marginal rather than substantial advantage.

Related Articles

product update

Microsoft Cancels Claude Code Licenses, Pushes Developers to GitHub Copilot CLI

Microsoft is removing Claude Code access from its Experiences + Devices division by June 30, 2026, redirecting thousands of engineers to GitHub Copilot CLI instead. The decision follows six months of Claude Code proving more popular than Microsoft's own coding tool among internal developers.

product update

OpenAI brings Codex coding agent to iOS and Android with remote environment monitoring

OpenAI has integrated its Codex coding agent into the ChatGPT mobile app for iOS and Android, allowing developers to monitor live development environments and manage workflows from their phones. The update, announced May 14, 2026, is now available in preview across all ChatGPT plans.

product update

OpenAI adds remote Codex control to ChatGPT mobile apps for iOS and Android

OpenAI has integrated remote Codex control into the ChatGPT mobile apps for iPhone and Android. Users can now approve tasks, review outputs, and manage Codex running on Mac computers, laptops, or remote environments directly from their smartphones.

product update

Google names upcoming Gemini AI agent 'Spark,' adds autonomous task execution to mobile app

Google is preparing to launch Gemini Spark, an autonomous AI agent that will operate within the Gemini mobile app. According to code found in Google app beta version 17.23, Spark can access connected apps, personal data, and location to execute tasks like managing inboxes and scheduling meetings, though Google warns it may occasionally act without permission.

Comments

Loading...