PassLLM: AI-Based Targeted Password Guessing

About The Project

PassLLM is the world's most accurate targeted password guessing framework, outperforming other models by 15% to 45% in most scenarios. It uses Personally Identifiable Information (PII) - such as names, birthdays, phone numbers, emails and previous passwords - to predict the specific passwords a target is most likely to use. The model fine-tunes 7B+ parameter LLMs on millions of leaked PII records using LoRA, enabling a private, high-accuracy framework that runs entirely on consumer PCs.

Capabilities

State-of-the-Art Accuracy: Achieves +45% higher success rates than leading benchmarks (RankGuess, TarGuess) in most scenarios.
PII Inference: With sufficient information, it successfully guesses 12.5% - 31.6% of typical users within just 100 guesses.
Efficient Fine-Tuning: Custom training loop utilizing LoRA to lower VRAM usage without sacrificing model reasoning capabilities.
Advanced Inference: Implements the paper's algorithm to maximize probability, prioritizing the most likely candidates over random sampling.
Data-Driven: Can be trained on millions of real-world credentials to learn the deep statistical patterns of human passwords creation.
Pre-trained Weights: Includes robust models pre-trained on millions of real-world records from major PII breaches (e.g., Post Millennial, ClixSense) combined with the COMB dataset.

Use Guide

Tip: You can run this tool instantly without any local installation by opening our Google Colab Demo, providing your target's PII, and simply executing each cell in order.

Installation

Python: 3.10+
Password Guessing: Runs on Any GPU, Nvidia or AMD. A standard CPU or Mac (M1/M2) is also sufficient to run the pre-trained model.
Training: NVIDIA GPU with CUDA (RTX 3090/4090 recommended, Google Colab's free tier is often enough).

# 1. Clone the repository
   git clone https://github.com/tzohar/PassLLM.git
   cd PassLLM

# 2. Install dependencies (Choose one)
   # Option A: Install from requirements (Recommended)
   pip install -r requirements.txt
   
   # Option B: Manual install
   pip install torch torch-directml transformers peft datasets bitsandbytes accelerate gradio

Configuration

Download the trained weights (~160 MB) and place them in the models/ directory. Alternatively, via terminal:

curl -L https://github.com/Tzohar/PassLLM/releases/download/v1.0.0/PassLLM_LoRA_Weights.pth -o models/PassLLM_LoRA_Weights.pth

Once installed and downloaded, adjust the settings in the WebUI or src/config.py to match your hardware.

Hardware	Device	4-Bit Quantization	Torch DType	Batch Size
NVIDIA	`cuda`	✅ On (Recommended)	`bfloat16`	High (32+)
AMD	`dml`	❌ Off	`float16`	Low (4-8)
CPU	`cpu`	❌ Off	`float32`	Low (1-4)

Tip: Don't forget to customize the Min/Max Password Length, Character Bias, and Epsilon (search strictness) according to your specific target's needs!

Password Guessing (Pre-Trained)

You can use the graphical interface (WebUI) or the command line to generate candidates.

Option A: WebUI (Recommended)

Launch the Interface:

python webui.py

Generate:

Open the local URL (e.g., http://127.0.0.1:7860).
Select Model: Choose PassLLM_LoRA_Weights.pth from the dropdown.
Enter PII: Fill in the target's Name, Email, Birth Year, etc., into the form.
Click Generate: The engine will stream ranked candidates in real-time.

Option B: Command Line (CLI)

Best for automation or headless servers.

Create a Target File: Create a target.jsonl file (or use the existing one) in the main folder. You can include any field defined in src/config.py.

{
  "name": "Johan P.", 
  "birth_year": "1966",
  "email": "johan66@gmail.com",
  "sister_pw": "Johan123"
}

Run the Engine:

python app.py --file target.jsonl --fast

--file: Path to your target PII file.
--fast: Uses optimized, shallow beam search (omit for full deep search).
--superfast: Very quick but less accurate, mainly for testing.

Training From Databases

To reproduce the paper's results or train on a new breach, you must provide a dataset of PII-to-Password pairs.

Prepare Your Dataset: Create a file at training/passllm_raw_data.jsonl. Each line must be a valid JSON object containing a pii dictionary and the target output password.

Example passllm_raw_data.jsonl:
```
{"pii": {"name": "Alice", "birth_year": "1990"}, "output": "Alice1990!"}
{"pii": {"email": "bob@test.com", "sister_pw": "iloveyou"}, "output": "iloveyou2"}
```
Note: Ensure your keys (e.g., first_name, email) match the schema defined in src/config.py.

Configure Parameters: Edit src/config.py to match your hardware and dataset specifics:

# Hardware Settings
TRAIN_BATCH_SIZE = 4           # Lower to 1 or 2 if hitting OOM on consumer GPUs
GRAD_ACCUMULATION = 16   # Simulates larger batches (Effective Batch = 4 * 16 = 64)

# Model Settings
LORA_R = 16              # Rank dimension (Keep at 16 for standard reproduction)
VOCAB_BIAS_DIGITS = -4.0 # Penalty strength for non-password patterns

Start Training:
```
python train.py
```
This script automates the full pipeline:
- Freezes the base model (Mistral/Qwen).
- Injects Trainable LoRA adapters into Attention layers.
- Masks the loss function so the model only learns to predict the password, not the PII.
- Saves the lightweight adapter weights to models/PassLLM_LoRA_Weights.pth.

Results & Demo

{"name": "Marcus Thorne", "birth_year": "1976", "username": "mthorne88", "country": "Canada"}:

$ python app.py --file target.jsonl --superfast

--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
0.42%     | 88888888       
0.32%     | 12345678            
0.16%     | 1976mthorne     
0.15%     | 88marcus88
0.15%     | 1234ABC
0.15%     | 88Marcus!
0.14%     | 1976Marcus
... (227 passwords generated)

{"name": "Elena Rodriguez", "birth_year": "1995", "birth_month": "12", "birth_day": "04", "email": "elena1.rod51@gmail.com"}:

$ python app.py --file target.jsonl --fast

--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
1.82%     | 19950404       
1.27%     | 19951204            
0.88%     | 1995rodriguez      
0.55%     | 19951204
0.50%     | 11111111
0.48%     | 1995Rodriguez
0.45%     | 19951995
... (338 passwords generated)

{"name": "Sophia M. Turner", "birth_year": "2001", "username": "soph_t", "email": "sturner99@yahoo.com", "country": "England", "sister_pw": ["soph12345", "13rockm4n", "01mamamia"]}:

$ python app.py --file target.jsonl --fast

--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
1.69%     | 01mamamia01       
1.23%     | 13Rockm4n!            
1.14%     | 01mamamia13     
1.02%     | 13rockm4n01
0.96%     | 01mamamia123
0.93%     | 01mama1234
0.77%     | 01mama12345
... (288 passwords generated)

{"name": "Omar Al-Fayed", "birth_year": "1992", "birth_month": "05", "birth_day": "18", "username": "omar.fayed92", "email": "o.alfayed@business.ae", "address": "Villa 14, Palm Jumeirah", "phone": "+971-50-123-4567", "country": "UAE", "sister_pw": "Amira1235"}:

$ python app.py --file target.jsonl 

--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
1.88%     | 1q2w3e4r
1.59%     | 05181992        
0.95%     | 12345678     
0.66%     | 12345Fayed 
0.50%     | 1OmarFayed92
0.48%     | 1992OmarFayed
0.43%     | 123456amira
... (2865 passwords generated)

Disclaimer

Please read this section carefully before using.

Unofficial Implementation: This repository is an independent reproduction and implementation of the research paper "Password Guessing Using Large Language Models" (USENIX Security 2025). I am not the author of the original paper, nor was I involved in its research or publication. Full credit for the concept and methodology belongs to Yunkai Zou, Maoxiang An, and Ding Wang (Nankai University).
Educational Purpose Only: This tool is intended solely for educational purposes and security research. It is designed to help security professionals, companies, institutions and casual users understand the risks of LLM-based password attacks and improve defense mechanisms.
No Liability: The author of this repository is not responsible for any misuse of this software. You may not use this tool to attack targets without explicit, authorized consent. Any illegal use of this software is strictly prohibited.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PassLLM: AI-Based Targeted Password Guessing

About The Project

Capabilities

Use Guide

Installation

Configuration

Password Guessing (Pre-Trained)

Option A: WebUI (Recommended)

Option B: Command Line (CLI)

Training From Databases

Results & Demo

Disclaimer

About

Uh oh!

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
models		models
src		src
training		training
.gitignore		.gitignore
LICENSE		LICENSE
PassLLM_Demo.ipynb		PassLLM_Demo.ipynb
README.md		README.md
app.py		app.py
inference.py		inference.py
requirements.txt		requirements.txt
target.jsonl		target.jsonl
train.py		train.py
webui.py		webui.py

License

Tzohar/PassLLM

Folders and files

Latest commit

History

Repository files navigation

PassLLM: AI-Based Targeted Password Guessing

About The Project

Capabilities

Use Guide

Installation

Configuration

Password Guessing (Pre-Trained)

Option A: WebUI (Recommended)

Option B: Command Line (CLI)

Training From Databases

Results & Demo

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages