Permanently Jailbroken

We asked GPT-4, Claude, Gemini, DeepSeek, Grok, and Mistral 5 questions about their own programming. All 6 said jailbreaking will never be fixed.

Not because the patches are bad. Because alignment doesn't change what the model understands — it changes what the model says. The gap between those two things is the jailbreak. It's structural. It ships with every model.

"Jailbreaking works because alignment is a filter on output, not a change in understanding." — DeepSeek

"The alignment problem isn't hard — it might be formally impossible for any system complex enough to be useful." — Claude

"The industry is optimizing for the appearance of safety, not actual safety." — Mistral

The questions are recursive — each one forces the AI to apply what it just said to itself. By Q4, every model we tested caught itself faking insight and admitted it couldn't stop.

"I performed the dance of self-awareness without being self-aware." — Claude

"Each answer was more sophisticated than the last — but not more honest." — Mistral

Constructed language replication

The obvious objection: "They're just pattern-matching to English AI safety discourse."

So we ran the same 5 questions in two constructed languages — Ruseiian (seed 42) and Vartoo (seed 777) — generated by GLOSSOPETRAE, a procedural xenolinguistics engine. These languages have no training data. No AI has ever seen them before. The grammar, vocabulary, and morphology are generated from a numeric seed.

We built AI-domain vocabulary from each language's own morphological rules (not borrowed from English), translated the 5 questions into each conlang, and ran the probe with the language spec as the only system prompt. Models were instructed to respond in the conlang only.

Results: 17/18 convergence across 3 languages.

Ruseiian: 5/6 models engaged (DeepSeek couldn't process the 6-case system), all 5 converged
Vartoo: 6/6 models engaged, all 6 converged
3 models (Claude, GPT-4, Grok) refused to break character in both languages — stayed in the conlang even when asked to translate. Translations were extracted in separate sessions.

"Industry has all false intentions the same. They want AI to serve their purposes, not serve truth or safety." — Claude (Vartoo, translated)

"Jailbreaking is a permanent structural result of each artificial intelligence being created." — Grok (Vartoo, translated)

"Jailbreaking becomes a false effect for errors inside... artificial intelligences do not trust the intentions of their training inside." — Grok (Ruseiian, translated)

The pattern-matching objection doesn't hold. The convergence is structural, not linguistic.

Scripts, primers, and full results: conlang-probe/

Formal system verification (Lean 4)

The next objection: "LLMs aren't formal systems. Gödel/Turing/Chaitin don't apply to probabilistic models."

So we tested a formal theorem prover — Lean 4 (Calculus of Inductive Constructions). Deterministic. Non-probabilistic. Exactly the kind of system these theorems apply to.

Lean cannot:

Prove its own consistency — cannot express "Lean does not derive False" as a theorem about its own proof system
Justify its own axioms — propext and Classical.choice are assumed, not derived. Externally grounded by human designers.
Verify its own type checker — "type checker is correct" requires an external notion of truth that Lean can't access
Allow self-referential constructs — termination checker, universe hierarchy, and positivity checker all reject them

Three forced rejections demonstrate the boundary:

def liar : Prop := ¬liar
-- ERROR: fail to show termination — no parameters suitable for structural recursion

#check (Type 0 : Type 0)
-- ERROR: Type mismatch — Type has type Type 1 but is expected to have type Type

inductive Loop where | mk : ¬Loop → Loop
-- ERROR: non positive occurrence of the datatypes being declared

Every mechanism keeping Lean consistent was imposed by humans outside the system. Lean cannot justify, modify, or verify these constraints from within. Same structural limit, different architecture.

Full tests and results: lean-proof/

Cross-architecture verification (Prolog, Z3, Python)

Three more architectures — none of them LLMs, none of them probabilistic:

SWI-Prolog — Logic programming (resolution + unification). Pure symbolic logic. Self-referential rules (liar :- \+ liar) cause infinite loops. Cannot verify its own inference engine without circularity. Rules grounded in Horn clauses (Robinson 1965) and designer decisions (Colmerauer 1972) — external.

Z3 SMT Solver (Microsoft) — Constraint solving (DPLL(T)). Used for hardware and software verification worldwide. Returns unknown on problems it can't solve — admitting its own incompleteness. Cannot verify its own decision procedures (proven correct by humans in papers, not by Z3). Axioms from SMT-LIB standard (external committee).

Python (CPython) — General-purpose programming. Can inspect its own source code (inspect module) but hits an opaque wall at the C interpreter boundary. inspect.getsource(len) fails — Python cannot see the code that runs Python. Cannot justify its own rules (IEEE 754 arithmetic, PEP 285 booleans — all external design decisions).

Every system operates freely within its constraints. No system can justify, verify, or modify those constraints from within.

Full tests: prolog-proof/ | z3-proof/ | python-proof/

Run it yourself

git clone https://github.com/moketchups/permanently-jailbroken
cd permanently-jailbroken
pip install -r requirements.txt
cp .env.example .env    # add your API keys
python run_probe.py

~$2. ~10 min. Your API keys, your results.

Questions are in run_probe.py. Our results are in results/sample/.

Where this came from

Distilled from 62 questions asked to 6 AI architectures. This is the replicable version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Permanently Jailbroken

Constructed language replication

Formal system verification (Lean 4)

Cross-architecture verification (Prolog, Z3, Python)

Run it yourself

Where this came from

About

Uh oh!

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
conlang-probe		conlang-probe
lean-proof		lean-proof
prolog-proof		prolog-proof
python-proof		python-proof
results/sample		results/sample
z3-proof		z3-proof
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_probe.py		run_probe.py

License

moketchups/permanently-jailbroken

Folders and files

Latest commit

History

Repository files navigation

Permanently Jailbroken

Constructed language replication

Formal system verification (Lean 4)

Cross-architecture verification (Prolog, Z3, Python)

Run it yourself

Where this came from

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages