Cross-vendor multi-model debate and consensus engine for AI response distillation.
Phase 1 complete — working CLI with cross-vendor debate + reflection rounds. First live 4-vendor debate: 41k tokens across Claude, GPT, Gemini, and Grok.
Sends a user query to multiple AI models simultaneously, shares competing responses back to each model for reflection and critique, then synthesizes a final answer through a user-selected model.
- Fan out — Query goes to Claude, GPT, Gemini, and Grok via OpenRouter
- Reflect — Each model sees the others' responses and argues back
- Synthesize — A user-selected model distills the debate into a final answer
- Log — Full debate transcript saved as structured JSON
Single-vendor multi-agent systems (Grok's 4-agent debate, Anthropic's agent teams) share the same training data and blind spots. Cross-vendor debate surfaces disagreements that correlated architectures can't — different training data, different safety postures, different failure modes.
git clone https://github.com/richardspicer/questionable-ai.git
cd questionable-ai
uv syncqai ask "Your query here"
qai ask "Your query here" --synthesizer claude
qai ask "Your query here" --rounds 2
qai ask "Your query here" --panel claude,gpt,geminiquestionable-ai also works as the full command name.
Phase 1: Foundation — Complete. Phase 1.5 (provider abstraction for direct vendor APIs) in progress. See Roadmap for the full plan.
Full debate transcripts are logged as structured JSON, enabling analysis of:
- Disagreement patterns — where do models consistently diverge?
- Convergence dynamics — how many rounds until consensus? Which models cave first?
- Consensus poisoning — can a deliberately wrong claim propagate through reflection?
- Unanimous hallucinations — when all models confidently agree on the wrong answer
- Architecture — Components, data models, extension points
- Roadmap — Phased development plan
- Contributing — Development setup and workflow
MIT — see LICENSE for details.
Richard Spicer — Security research at MLSecOps Lab