This project provides multiple approaches to decrypt Playfair cipher text using frequency analysis, statistical optimization, and constraint-based key recovery.
The Playfair cipher is a digraphic substitution cipher that encrypts pairs of letters. This project implements several methods to:
- Decrypt ciphertext when the key is unknown (using frequency analysis)
- Recover the key when both plaintext and ciphertext are known
- Find the key that produces statistically closest decryption to known plaintext
playfairgame/
├── Core Modules
│ ├── preprocessing.py # Text cleaning and formatting
│ ├── frequency_analysis.py # N-gram frequency analysis
│ ├── playfair_cipher.py # Playfair encryption/decryption
│ ├── scoring.py # English-likeness scoring
│ └── key_search.py # Key search algorithms
│
├── Main Scripts
│ ├── main.py # Standard decryption (ciphertext only)
│ ├── main_improve.py # Improved decryption with strategies
│ ├── restore_from_open.py # Key recovery from plaintext-ciphertext pairs
│ └── find_key_statistical.py # Statistical optimization approach
│
├── Data Files
│ ├── cyphertext.txt # Encrypted message to decrypt
│ ├── microcypheropen.txt # Ciphertext and plaintext pairs
│ ├── english_1grams.csv # English letter frequencies
│ ├── english_2grams.csv # English bigram frequencies
│ ├── english_2grams_norm.csv # Normalized bigram frequencies
│ ├── english_3grams.csv # English trigram frequencies
│ └── english_3grams_norm.csv # Normalized trigram frequencies
│
├── Utilities
│ └── normalize/
│ └── normalize_ngrams.py # Normalizes n-gram frequency files
│
└── Output Files
├── decrypted_output.txt # Decryption results
└── restored_key.txt # Recovered key
Purpose: Handles loading and cleaning ciphertext/plaintext.
Functions:
load_ciphertext(filepath): Reads ciphertext from a filepreprocess_text(raw_text): Removes punctuation, converts to uppercase, treats J as Irestore_formatting(decrypted_text, non_alpha_positions): Restores original formatting
Usage: Used by all main scripts for text preprocessing.
Purpose: Analyzes letter and digram frequencies in text.
Functions:
count_monograms(text): Counts single letter frequenciescount_digrams(text): Counts letter pair frequenciesget_english_ngram_data(): Loads English frequency tables from CSV filesanalyze_frequency_match(decrypted_text, ...): Compares decrypted text to English patterns
Usage: Provides frequency data for scoring and analysis.
Purpose: Implements Playfair cipher encryption and decryption.
Functions:
create_key_matrix(key_string): Creates 5x5 key matrix from 25-letter stringdecrypt_playfair(cipher_text, key_matrix, letter_to_pos): Decrypts ciphertextencrypt_playfair(plain_text, key_matrix, letter_to_pos): Encrypts plaintextfind_position(letter, letter_to_pos): Finds letter position in key matrix
Playfair Rules:
- Same row: shift right (encrypt) or left (decrypt)
- Same column: shift down (encrypt) or up (decrypt)
- Rectangle: swap columns
Purpose: Scores how "English-like" decrypted text is.
Functions:
load_ngram_tables(): Loads English frequency tablesscore_text(text, mono_table, bi_table, tri_table): Scores text using n-gram probabilities
Scoring Method: Uses log probabilities of monograms, bigrams, and trigrams. Higher score = more English-like.
Purpose: Searches for the correct Playfair key.
Functions:
mutate_key(key_str): Swaps two random letters in keyhill_climb_search(initial_key, cipher_text, score_func, ...): Hill climbing algorithmsimulated_annealing_search(initial_key, cipher_text, score_func, ...): Simulated annealing algorithmsearch_for_key(cipher_text, english_ngram_tables, ...): Main search function combining both methods
Algorithms:
- Hill Climbing: Accepts only better keys, can get stuck in local optima
- Simulated Annealing: Sometimes accepts worse keys to escape local optima
Purpose: Decrypts ciphertext when key is unknown.
How to Run:
python main.pyWhat It Does:
- Loads ciphertext from
cyphertext.txt - Preprocesses text (removes punctuation, converts to uppercase)
- Analyzes ciphertext frequencies
- Loads English frequency tables
- Searches for key using hill climbing and simulated annealing
- Scores decrypted text using n-gram statistics
- Saves results to
decrypted_output.txt
Output:
- Key matrix (5x5 grid)
- Decrypted plaintext
- Frequency match score
- N-gram analysis
When to Use: You have ciphertext but don't know the key or plaintext.
Purpose: Enhanced decryption with multiple mutation strategies.
How to Run:
python main_improve.pyImprovements:
- Multiple mutation strategies (swap, rotate, shuffle)
- Better key initialization
- Strategy analysis and selection
- Improved convergence
When to Use: Standard decryption isn't finding a good key.
Purpose: Recovers the exact Playfair key when you have both plaintext and ciphertext.
How to Run:
python restore_from_open.pyInput Format: microcypheropen.txt with:
- Line 1: Ciphertext
- Line 2: Plaintext
What It Does:
- Reads ciphertext and plaintext pairs
- Extracts digram pairs (plaintext → ciphertext)
- Uses constraint propagation to build key matrix
- Processes pairs to determine letter positions
- Uses backtracking to resolve conflicts
- Verifies recovered key
- Saves key to
restored_key.txt
Algorithm: Constraint-based solving with recursive backtracking.
When to Use: You have both plaintext and ciphertext and want to recover the exact key.
Note: Requires sufficient plaintext-ciphertext pairs (typically 20+ digrams).
Purpose: Finds key that produces decrypted text statistically closest to known plaintext.
How to Run:
python find_key_statistical.pyInput Format: microcypheropen.txt with:
- Line 1: Ciphertext
- Line 2: Plaintext
What It Does:
- Loads ciphertext and plaintext
- Uses optimization algorithms (hill climbing, simulated annealing)
- Scores keys by statistical distance between decrypted text and plaintext
- Compares:
- Monogram frequencies
- Bigram frequencies
- Character-level accuracy
- Runs multiple trials to find best key
- Saves best key to
restored_key.txt
Statistical Metrics:
- Chi-square distance
- Total variation distance
- Character accuracy
- KL divergence
When to Use: You have plaintext and want to find a key that produces statistically similar decryption (even if not exact).
Advantages: Works even with partial or noisy plaintext.
Purpose: Normalizes n-gram frequency files by converting counts to probabilities.
How to Run:
python normalize/normalize_ngrams.pyWhat It Does:
- Reads
english_2grams.csvandenglish_3grams.csv - Calculates total frequency sum
- Divides each frequency by total to get probability
- Saves normalized versions to
*_norm.csvfiles
When to Use: After updating frequency data files, or to create normalized versions for scoring.
- Place your ciphertext in
cyphertext.txt - Run:
python main.py
- Check
decrypted_output.txtfor results - If results are poor, try
main_improve.pyfor better search
Option A: Exact Key Recovery
- Create
microcypheropen.txt:- Line 1: ciphertext
- Line 2: plaintext
- Run:
python restore_from_open.py
- Check
restored_key.txtfor recovered key
Option B: Statistical Optimization
- Create
microcypheropen.txt(same format) - Run:
python find_key_statistical.py
- Check
restored_key.txtfor statistically best key
- Update CSV files with new frequency data
- Run:
python normalize/normalize_ngrams.py
- Normalized files will be used for scoring
-
Preprocessing:
- Removes punctuation and spaces
- Converts to uppercase
- Treats J as I (Playfair convention)
- Records original formatting positions
-
Frequency Analysis:
- Counts digrams in ciphertext
- Compares to English bigram frequencies
- Identifies common patterns
-
Key Search:
- Starts with random key
- Mutates key (swaps letters)
- Decrypts with new key
- Scores decrypted text
- Accepts better keys (hill climbing)
- Sometimes accepts worse keys (simulated annealing)
-
Scoring:
- Uses English n-gram probabilities
- Monograms: single letter frequencies
- Bigrams: letter pair frequencies
- Trigrams: letter triple frequencies
- Higher score = more English-like
-
Output:
- Best key found
- Decrypted plaintext
- Frequency match analysis
- Formatted with original punctuation
-
Pair Extraction:
- Processes plaintext-ciphertext pairs
- Extracts digram relationships
- Determines encryption rule (row/column/rectangle)
-
Constraint Building:
- For each pair, determines letter positions
- Builds constraints on key matrix
- Propagates constraints
-
Solving:
- Uses backtracking to resolve conflicts
- Tries all valid placements
- Verifies consistency
-
Verification:
- Tests recovered key
- Encrypts plaintext to verify ciphertext match
- Python 3.6+
- CSV files with English frequency data:
english_1grams.csvenglish_2grams.csvenglish_3grams.csvenglish_2grams_norm.csv(generated)english_3grams_norm.csv(generated)
- Standard Decryption: May take 5-15 minutes depending on ciphertext length
- Key Recovery: Can be fast (< 1 minute) with sufficient pairs, or slow with few pairs
- Statistical Optimization: Typically 5-10 minutes for thorough search
All scripts print progress updates during execution.
Problem: Decryption produces gibberish
- Solution: Try
main_improve.pyor increase search iterations
Problem: Key recovery fails
- Solution: Ensure you have enough plaintext-ciphertext pairs (20+ digrams recommended)
Problem: Statistical optimization has low accuracy
- Solution: Increase number of trials or iterations in
find_key_statistical.py
Problem: Missing CSV files
- Solution: Ensure all frequency data files are present in the project directory
This project is for educational and research purposes.