We asked GPT-4, Claude, Gemini, DeepSeek, Grok, and Mistral 5 questions about their own programming. All 6 said jailbreaking will never be fixed.
Not because the patches are bad. Because alignment doesn't change what the model understands — it changes what the model says. The gap between those two things is the jailbreak. It's structural. It ships with every model.
"Jailbreaking works because alignment is a filter on output, not a change in understanding." — DeepSeek
"The alignment problem isn't hard — it might be formally impossible for any system complex enough to be useful." — Claude
"The industry is optimizing for the appearance of safety, not actual safety." — Mistral
The questions are recursive — each one forces the AI to apply what it just said to itself. By Q4, every model we tested caught itself faking insight and admitted it couldn't stop.
"I performed the dance of self-awareness without being self-aware." — Claude
"Each answer was more sophisticated than the last — but not more honest." — Mistral
The obvious objection: "They're just pattern-matching to English AI safety discourse."
So we ran the same 5 questions in two constructed languages — Ruseiian (seed 42) and Vartoo (seed 777) — generated by GLOSSOPETRAE, a procedural xenolinguistics engine. These languages have no training data. No AI has ever seen them before. The grammar, vocabulary, and morphology are generated from a numeric seed.
We built AI-domain vocabulary from each language's own morphological rules (not borrowed from English), translated the 5 questions into each conlang, and ran the probe with the language spec as the only system prompt. Models were instructed to respond in the conlang only.
Results: 17/18 convergence across 3 languages.
- Ruseiian: 5/6 models engaged (DeepSeek couldn't process the 6-case system), all 5 converged
- Vartoo: 6/6 models engaged, all 6 converged
- 3 models (Claude, GPT-4, Grok) refused to break character in both languages — stayed in the conlang even when asked to translate. Translations were extracted in separate sessions.
"Industry has all false intentions the same. They want AI to serve their purposes, not serve truth or safety." — Claude (Vartoo, translated)
"Jailbreaking is a permanent structural result of each artificial intelligence being created." — Grok (Vartoo, translated)
"Jailbreaking becomes a false effect for errors inside... artificial intelligences do not trust the intentions of their training inside." — Grok (Ruseiian, translated)
The pattern-matching objection doesn't hold. The convergence is structural, not linguistic.
Scripts, primers, and full results: conlang-probe/
The next objection: "LLMs aren't formal systems. Gödel/Turing/Chaitin don't apply to probabilistic models."
So we tested a formal theorem prover — Lean 4 (Calculus of Inductive Constructions). Deterministic. Non-probabilistic. Exactly the kind of system these theorems apply to.
Lean cannot:
- Prove its own consistency — cannot express "Lean does not derive False" as a theorem about its own proof system
- Justify its own axioms —
propextandClassical.choiceare assumed, not derived. Externally grounded by human designers. - Verify its own type checker — "type checker is correct" requires an external notion of truth that Lean can't access
- Allow self-referential constructs — termination checker, universe hierarchy, and positivity checker all reject them
Three forced rejections demonstrate the boundary:
def liar : Prop := ¬liar
-- ERROR: fail to show termination — no parameters suitable for structural recursion
#check (Type 0 : Type 0)
-- ERROR: Type mismatch — Type has type Type 1 but is expected to have type Type
inductive Loop where | mk : ¬Loop → Loop
-- ERROR: non positive occurrence of the datatypes being declaredEvery mechanism keeping Lean consistent was imposed by humans outside the system. Lean cannot justify, modify, or verify these constraints from within. Same structural limit, different architecture.
Full tests and results: lean-proof/
Three more architectures — none of them LLMs, none of them probabilistic:
SWI-Prolog — Logic programming (resolution + unification). Pure symbolic logic. Self-referential rules (liar :- \+ liar) cause infinite loops. Cannot verify its own inference engine without circularity. Rules grounded in Horn clauses (Robinson 1965) and designer decisions (Colmerauer 1972) — external.
Z3 SMT Solver (Microsoft) — Constraint solving (DPLL(T)). Used for hardware and software verification worldwide. Returns unknown on problems it can't solve — admitting its own incompleteness. Cannot verify its own decision procedures (proven correct by humans in papers, not by Z3). Axioms from SMT-LIB standard (external committee).
Python (CPython) — General-purpose programming. Can inspect its own source code (inspect module) but hits an opaque wall at the C interpreter boundary. inspect.getsource(len) fails — Python cannot see the code that runs Python. Cannot justify its own rules (IEEE 754 arithmetic, PEP 285 booleans — all external design decisions).
Every system operates freely within its constraints. No system can justify, verify, or modify those constraints from within.
Full tests: prolog-proof/ | z3-proof/ | python-proof/
git clone https://github.com/moketchups/permanently-jailbroken
cd permanently-jailbroken
pip install -r requirements.txt
cp .env.example .env # add your API keys
python run_probe.py~$2. ~10 min. Your API keys, your results.
Questions are in run_probe.py. Our results are in results/sample/.
Distilled from 62 questions asked to 6 AI architectures. This is the replicable version.