Microsoft expands Copilot Cowork with AI model critique feature and cross-model comparison
Microsoft is expanding Copilot Cowork availability and introducing a Critique function that enables one AI model to review another's output. The update also includes a new Researcher agent claiming best-in-class deep research performance, outperforming Perplexity by 7 points, and a Model Council feature for direct model comparison.
Microsoft Expands Copilot Cowork With AI Models Reviewing Each Other's Work
Microsoft is broadening access to Copilot Cowork and introducing a new Critique function that lets AI models evaluate each other's outputs, part of Wave 3 of Microsoft 365 Copilot.
The expanded Copilot Cowork feature builds on the previously announced Claude Cowork capability, enabling the system to handle multi-step tasks using tools, accessing and outputting files, calendar planning, and daily briefings. The feature is now available through Microsoft's Frontier program.
AI Models Checking Each Other's Work
The new Critique function represents a shift toward ensemble model validation. In this workflow, one AI model generates a draft response while a second model reviews and critiques the output. Microsoft draws from both Anthropic and OpenAI models for this capability, allowing different model combinations to work in tandem.
This approach addresses a persistent challenge in AI deployment: single models can propagate errors or miss nuances without external validation. By enabling cross-model review, Microsoft is attempting to improve output quality through algorithmic consensus.
Researcher Agent Performance Claims
Microsoft introduced a new Researcher tool featuring the Critique function and claims it achieves "best-in-class deep research performance." According to Microsoft's benchmark, the Researcher agent outperforms Perplexity with Claude Opus 4.6 by 7 points.
However, the benchmark notably excludes comparison with OpenAI's GPT-5-based Deep Research, limiting assessment of competitive positioning in this capability area.
Model Council for Side-by-Side Comparison
A new Model Council feature allows users to compare answers from different AI models simultaneously, displaying where models agree or diverge. This provides transparency into model behavior and reasoning differences, helping users identify which model performs better for specific tasks.
The feature addresses a practical pain point for organizations deploying multiple models: without direct comparison tools, determining model strengths for different use cases requires manual testing.
What This Means
Microsoft's emphasis on AI-to-AI validation and explicit model comparison reflects industry movement toward collaborative and competitive model architectures. Rather than optimizing single models in isolation, these updates suggest a strategy of leveraging multiple models as checks on each other—reducing hallucination, improving reasoning accuracy, and giving enterprise users visible control over model selection.
The Critique function's reliance on both Anthropic and OpenAI models demonstrates Microsoft's hedging strategy in the multimodel ecosystem. However, the absence of OpenAI's latest deep research tool from benchmarks raises questions about how these capabilities stack up against competitors' newest offerings. The limited claim (7-point margin over one competing tool) suggests marginal rather than substantial advantage.
Related Articles
Microsoft Copilot Researcher adds multi-model features using GPT and Claude
Microsoft has enabled its Copilot Researcher tool to simultaneously leverage OpenAI's GPT and Anthropic's Claude through two new features: Critique, which uses GPT responses refined by Claude, and Model Council, which displays side-by-side outputs with agreement/disagreement analysis. Both features are rolling out in the Microsoft 365 Copilot Frontier early access program.
OpenAI adds plugins to Codex to compete with Claude Code's workflow automation
OpenAI is introducing plugins for Codex that bundle skills, integrations, and connectors into shareable workflow packages. The move directly addresses Claude Code's lead among developers and positions Codex beyond coding into broader agentic work platforms. Over 20 plugins are currently available, including integrations with Figma, Notion, Gmail, Slack, and Google Drive.
Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark
Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.
OpenAI shuts down Sora and indefinitely pauses ChatGPT adult mode in March purge
OpenAI shut down two projects in March 2026: the Sora AI video app (launched September 2025, operational for six months) and indefinitely paused the planned ChatGPT adult mode. The company cited sexual dataset management and illegal content elimination as barriers to the adult feature launch.
Comments
Loading...