Problem
Relying on a single LLM for code review creates blind spots. Claude excels at architectural reasoning and security analysis but may miss performance edge cases. GPT models are strong at API correctness. Gemini 2.5 Pro has deep reasoning for complex logic bugs. Using only one model means you miss the issues that another model's strengths would catch. Manually running code through multiple models is tedious and breaks your workflow.
Solution
Option 1: Zen MCP for multi-model routing via OpenRouter
Configure Zen MCP to query multiple models through a single MCP interface:
{
"mcpServers": {
"zen": {
"command": "npx",
"args": ["-y", "zen-mcp"],
"env": {
"OPENROUTER_API_KEY": "sk-or-xxx",
"ZEN_MODELS": "openai/o3,google/gemini-2.5-pro,anthropic/claude-sonnet-4"
}
}
}
}
Then in Claude Code, ask for a multi-model review:
Use zen to review this pull request with all configured models. Ask each
model to focus on: security vulnerabilities, performance issues, and
correctness of error handling. Summarize where they agree and disagree.
Option 2: Manual multi-model review workflow
If you prefer direct control, set up a structured review process:
# CLAUDE.md review instructions
## Code Review Checklist
When reviewing code, use zen MCP to get opinions from multiple models:
1. Security review: Route to Claude (strongest at security analysis)
2. Logic correctness: Route to o3 (strongest at reasoning)
3. Performance review: Route to Gemini 2.5 Pro (strongest at optimization)
4. Combine findings into a single review summary with confidence levels
Option 3: Consensus-based merge decisions
// Example review aggregation pattern
interface ModelReview {
model: string;
severity: "critical" | "warning" | "info";
issues: string[];
approved: boolean;
}
function shouldMerge(reviews: ModelReview[]): boolean {
const criticalIssues = reviews.some(
(r) => r.issues.some((i) => r.severity === "critical")
);
const majorityApproved =
reviews.filter((r) => r.approved).length > reviews.length / 2;
return !criticalIssues && majorityApproved;
}
Example output from a multi-model review:
## Multi-Model Review Summary
### Claude Sonnet 4
- CRITICAL: SQL injection in user search endpoint (line 45)
- WARNING: Missing rate limiting on auth endpoints
### o3
- WARNING: Race condition in concurrent inventory updates
- INFO: Consider using database transactions for atomicity
### Gemini 2.5 Pro
- CRITICAL: Agrees with SQL injection finding
- WARNING: N+1 query in order listing (will degrade at scale)
### Consensus
- 2/3 models flagged SQL injection -> HIGH CONFIDENCE fix required
- Unique findings from each model added to review
Why It Works
Different models are trained on different data and have different architectural biases, so they catch different categories of bugs. Routing through OpenRouter via Zen MCP lets you query multiple models from a single integration point without managing separate API keys or switching between chat interfaces. The consensus approach surfaces high-confidence issues (found by multiple models) while still capturing unique insights from each model's specialization.
Context
- OpenRouter provides unified access to models from OpenAI, Google, Anthropic, and Meta through a single API key
- Chain-of-Recursive-Thoughts is a related MCP that uses LLM-as-a-judge patterns for deeper analysis
- Claude Code best practices suggest using a separate Claude session as a "reviewer" checking the "coder" session's output
- Cost per review depends on model selection; using smaller models (GPT-4.1 mini, Gemini Flash) for initial screening reduces costs
- This pattern complements rather than replaces human code review -- use it for automated first-pass analysis