The post Microsoft Made GPT and Claude Work Together—And the Result Beats Every AI Research Tool Out There appeared on BitcoinEthereumNews.com. In brief MicrosoftThe post Microsoft Made GPT and Claude Work Together—And the Result Beats Every AI Research Tool Out There appeared on BitcoinEthereumNews.com. In brief Microsoft

Microsoft Made GPT and Claude Work Together—And the Result Beats Every AI Research Tool Out There

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

In brief

  • Microsoft released two different modes that pair GPT and Claude to increase the quality of AI research.
  • Critique makes the models collaborate, whereas Council makes them work in parallel while a third judge finds the discrepancies.
  • This two-model workflow fixes hallucinations, weak citations, and other problems associated with mono-model AI research.

Deep research AI has been one of the hottest arms races in tech this year. Google announced its research agent for Gemini in December 2024, OpenAI released its own research agent in February 2025, xAI followed suit, Perplexity doubled down, and Anthropic’s Claude built a loyal following among professionals who need detailed, cited answers, introducing its agent in April of last year.

Every company has been trying to convince you that their single AI model is the smartest researcher in the room. Microsoft just said: Why pick one?

The company announced two new features on Monday for Copilot’s Researcher tool—called Critique and Council—that put OpenAI’s GPT and Anthropic’s Claude to work on the same research task in sequence. The result, according to Microsoft’s testing against an industry benchmark, scores higher than every system included in that test, including models from the top AI companies.

“Critique is a new multi model deep research system designed for complex research tasks. It separates generation from evaluation and utilizes a combination of models from Frontier labs, including Anthropic and OpenAI,” Microsoft explains. “One model leads the generation phase, planning the task, iterating through retrieval, and producing an initial draft, while a second model focuses on review and refinement, acting as an expert reviewer before the final report is produced.”

Here’s the basic problem Critique is designed to fix: Every AI research tool today works the same way. You ask a question, one model plans a search, scours sources, writes a report, and hands it back to you. That single model is doing everything with no one checking its work.

This can end up with some hallucinations slipping in, some errors in citations, fake or inaccurate claims, etc.

Critique breaks that workflow in two. GPT handles the first phase—it plans the research, pulls sources, and writes an initial draft. Then Claude steps in as a strict editor, reviewing the report for factual accuracy, citation quality, and whether the answer actually addressed what was asked. Only after that review does the final report reach the user. Microsoft says the roles can eventually run in the opposite direction too, with Claude drafting and GPT critiquing, though for now GPT goes first.

On the DRACO benchmark—a standardized test covering 100 complex research tasks across 10 domains including medicine, law, and technology—Copilot with Critique scored 57.4. points with Anthropic’s Claude Opus 4.6 by itself hitting 42.7. Microsoft’s combined system beats the next best result by nearly 14%.

Image: Microsoft

The biggest gains showed up in breadth of analysis and presentation quality, with factual accuracy also posting a significant improvement.

The second feature, Council, takes a different approach to the same problem. Instead of having one model review the other’s work, Council runs GPT and Claude simultaneously and puts their full reports side by side. A third “judge” model then reads both and writes a summary explaining where the two AIs agreed, where they diverged, and what unique angles each one caught that the other missed. Comparing AI research tools manually has been something users have had to do themselves until now.

In Critique, the models essentially collaborate with each other while in Council the models compete against each other.

Critique is the default experience in Researcher whereas Council requires you to select “Model Council” from the picker to activate the side-by-side mode. Both features are currently available to users enrolled in Microsoft’s Frontier program, the early-access channel for Copilot’s newest capabilities. A Microsoft 365 Copilot license ($30/user/month) is required, but users also need to be enrolled in Frontier to access them.

Image: Microsoft

OpenAI and Microsoft have a multibillion-dollar partnership, but Microsoft’s bet is that no single model stays on top for long, and that the real value is in the orchestration layer that routes tasks to whichever combination works best.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source: https://decrypt.co/362805/microsoft-gpt-claude-work-together-ai-research

Market Opportunity
DeepBook Logo
DeepBook Price(DEEP)
$0.025225
$0.025225$0.025225
-4.86%
USD
DeepBook (DEEP) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Brent Crude Forecast: Societe Generale Issues Stark $150 Risk Warning Amid Market Turbulence

Brent Crude Forecast: Societe Generale Issues Stark $150 Risk Warning Amid Market Turbulence

BitcoinWorld Brent Crude Forecast: Societe Generale Issues Stark $150 Risk Warning Amid Market Turbulence Global energy markets face renewed volatility as Societe
Share
bitcoinworld2026/03/31 16:50
Headwind Helps Best Wallet Token

Headwind Helps Best Wallet Token

The post Headwind Helps Best Wallet Token appeared on BitcoinEthereumNews.com. Google has announced the launch of a new open-source protocol called Agent Payments Protocol (AP2) in partnership with Coinbase, the Ethereum Foundation, and 60 other organizations. This allows AI agents to make payments on behalf of users using various methods such as real-time bank transfers, credit and debit cards, and, most importantly, stablecoins. Let’s explore in detail what this could mean for the broader cryptocurrency markets, and also highlight a presale crypto (Best Wallet Token) that could explode as a result of this development. Google’s Push for Stablecoins Agent Payments Protocol (AP2) uses digital contracts known as ‘Intent Mandates’ and ‘Verifiable Credentials’ to ensure that AI agents undertake only those payments authorized by the user. Mandates, by the way, are cryptographically signed, tamper-proof digital contracts that act as verifiable proof of a user’s instruction. For example, let’s say you instruct an AI agent to never spend more than $200 in a single transaction. This instruction is written into an Intent Mandate, which serves as a digital contract. Now, whenever the AI agent tries to make a payment, it must present this mandate as proof of authorization, which will then be verified via the AP2 protocol. Alongside this, Google has also launched the A2A x402 extension to accelerate support for the Web3 ecosystem. This production-ready solution enables agent-based crypto payments and will help reshape the growth of cryptocurrency integration within the AP2 protocol. Google’s inclusion of stablecoins in AP2 is a massive vote of confidence in dollar-pegged cryptocurrencies and a huge step toward making them a mainstream payment option. This widens stablecoin usage beyond trading and speculation, positioning them at the center of the consumption economy. The recent enactment of the GENIUS Act in the U.S. gives stablecoins more structure and legal support. Imagine paying for things like data crawls, per-task…
Share
BitcoinEthereumNews2025/09/18 01:27
Best Crypto to Buy Today 17 September – XRP, Pi Coin, Solana

Best Crypto to Buy Today 17 September – XRP, Pi Coin, Solana

Scouting for the best crypto to buy today is no easy task. The sprawling digital asset market has hovered near the $4 trillion mark for a while, even though Bitcoin hit a fresh all-time high (ATH) of $124,128 just last month. The enthusiasm isn’t limited to Bitcoin either. Significant capital continues to pour into leading […] The post Best Crypto to Buy Today 17 September – XRP, Pi Coin, Solana appeared first on Cryptonews.
Share
Coinstats2025/09/18 06:36