Microsoft on Monday introduced Critique, a multi‑model deep research system now integrated into M365 Copilot. The system separates generation from evaluation and uses a combination of frontier AI models, including Anthropic's Claude and OpenAI's ChatGPT, to produce optimal responses.

Why it matters: Instead of relying on a single model, Critique lets one model generate answers while another evaluates them – a "checks and balances" approach that produces more reliable, accurate research outputs than any single model alone.

2+
AI models working together
GPT-5.4
OpenAI's latest model
Claude Opus
Anthropic's flagship
Default
In Copilot Researcher

How Critique Works

The system architecture is simple but powerful: separation of generation from evaluation. One model (or a set of models) generates potential answers. A different model then evaluates those answers for accuracy, completeness, and relevance. The best result is selected and refined.

This approach solves a core problem in AI‑powered research – single models can hallucinate or miss context. By having a "critic" model check the "generator" model's work, the system produces outputs that exceed what any single frontier model can achieve alone.

Why Microsoft Chose a Multi‑Model Approach

Microsoft has been quietly testing this architecture for months. Internal benchmarks reportedly show that combining Claude's reasoning strength with ChatGPT's broad knowledge base produces significantly better research outputs than either model alone.

The system is now the default in Copilot's Researcher feature, which helps users compile complex research reports, analyse documents, and synthesise information from multiple sources.

🔍 The Tech Behind It: Critique uses a "generator‑critic" loop. The generator (often a combination of Claude and GPT) produces multiple candidate answers. A critic model evaluates each candidate against the original query, then either selects the best or sends feedback for another round. This iterative process continues until the critic is satisfied or a time limit is reached.

What This Means for Users

For business and enterprise users of Microsoft 365 Copilot, Critique means:

  • More reliable research outputs – fewer hallucinations and factual errors
  • Deeper analysis – the system can handle complex, multi‑step research tasks
  • Better citations – the critic model verifies source accuracy
  • No extra cost – included with existing Copilot subscriptions

The Bigger Picture: Multi‑Model AI Is the Next Frontier

Microsoft's move reflects a growing trend in enterprise AI: using multiple models together rather than betting on one. Each frontier model has distinct strengths:

  • Claude – excels at reasoning, long‑context understanding, and nuanced analysis
  • ChatGPT – broad knowledge base, strong in structured outputs and code
  • Gemini – real‑time web access and Google ecosystem integration

By combining them, systems like Critique can get the best of all worlds. Expect to see similar multi‑model architectures from Google, Anthropic, and other players soon.

✅ Availability
Critique is already rolling out to Microsoft 365 Copilot users as the default for the Researcher feature. No additional setup or configuration is required – it works automatically in the background.

Frequently Asked Questions