Microsoft on Monday introduced Critique, a multi‑model deep research system now integrated into M365 Copilot. The system separates generation from evaluation and uses a combination of frontier AI models, including Anthropic's Claude and OpenAI's ChatGPT, to produce optimal responses.
Why it matters: Instead of relying on a single model, Critique lets one model generate answers while another evaluates them – a "checks and balances" approach that produces more reliable, accurate research outputs than any single model alone.
How Critique Works
The system architecture is simple but powerful: separation of generation from evaluation. One model (or a set of models) generates potential answers. A different model then evaluates those answers for accuracy, completeness, and relevance. The best result is selected and refined.
This approach solves a core problem in AI‑powered research – single models can hallucinate or miss context. By having a "critic" model check the "generator" model's work, the system produces outputs that exceed what any single frontier model can achieve alone.
Why Microsoft Chose a Multi‑Model Approach
Microsoft has been quietly testing this architecture for months. Internal benchmarks reportedly show that combining Claude's reasoning strength with ChatGPT's broad knowledge base produces significantly better research outputs than either model alone.
The system is now the default in Copilot's Researcher feature, which helps users compile complex research reports, analyse documents, and synthesise information from multiple sources.
What This Means for Users
For business and enterprise users of Microsoft 365 Copilot, Critique means:
- More reliable research outputs – fewer hallucinations and factual errors
- Deeper analysis – the system can handle complex, multi‑step research tasks
- Better citations – the critic model verifies source accuracy
- No extra cost – included with existing Copilot subscriptions
The Bigger Picture: Multi‑Model AI Is the Next Frontier
Microsoft's move reflects a growing trend in enterprise AI: using multiple models together rather than betting on one. Each frontier model has distinct strengths:
- Claude – excels at reasoning, long‑context understanding, and nuanced analysis
- ChatGPT – broad knowledge base, strong in structured outputs and code
- Gemini – real‑time web access and Google ecosystem integration
By combining them, systems like Critique can get the best of all worlds. Expect to see similar multi‑model architectures from Google, Anthropic, and other players soon.
Critique is already rolling out to Microsoft 365 Copilot users as the default for the Researcher feature. No additional setup or configuration is required – it works automatically in the background.