Skip to content

mtiutin/playfaircryptoanalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Playfair Cipher Decryption Project

This project provides multiple approaches to decrypt Playfair cipher text using frequency analysis, statistical optimization, and constraint-based key recovery.

Overview

The Playfair cipher is a digraphic substitution cipher that encrypts pairs of letters. This project implements several methods to:

  • Decrypt ciphertext when the key is unknown (using frequency analysis)
  • Recover the key when both plaintext and ciphertext are known
  • Find the key that produces statistically closest decryption to known plaintext

Project Structure

playfairgame/
├── Core Modules
│   ├── preprocessing.py          # Text cleaning and formatting
│   ├── frequency_analysis.py     # N-gram frequency analysis
│   ├── playfair_cipher.py        # Playfair encryption/decryption
│   ├── scoring.py                # English-likeness scoring
│   └── key_search.py             # Key search algorithms
│
├── Main Scripts
│   ├── main.py                   # Standard decryption (ciphertext only)
│   ├── main_improve.py           # Improved decryption with strategies
│   ├── restore_from_open.py      # Key recovery from plaintext-ciphertext pairs
│   └── find_key_statistical.py  # Statistical optimization approach
│
├── Data Files
│   ├── cyphertext.txt            # Encrypted message to decrypt
│   ├── microcypheropen.txt       # Ciphertext and plaintext pairs
│   ├── english_1grams.csv        # English letter frequencies
│   ├── english_2grams.csv         # English bigram frequencies
│   ├── english_2grams_norm.csv   # Normalized bigram frequencies
│   ├── english_3grams.csv        # English trigram frequencies
│   └── english_3grams_norm.csv   # Normalized trigram frequencies
│
├── Utilities
│   └── normalize/
│       └── normalize_ngrams.py   # Normalizes n-gram frequency files
│
└── Output Files
    ├── decrypted_output.txt      # Decryption results
    └── restored_key.txt           # Recovered key

Core Modules

preprocessing.py

Purpose: Handles loading and cleaning ciphertext/plaintext.

Functions:

  • load_ciphertext(filepath): Reads ciphertext from a file
  • preprocess_text(raw_text): Removes punctuation, converts to uppercase, treats J as I
  • restore_formatting(decrypted_text, non_alpha_positions): Restores original formatting

Usage: Used by all main scripts for text preprocessing.


frequency_analysis.py

Purpose: Analyzes letter and digram frequencies in text.

Functions:

  • count_monograms(text): Counts single letter frequencies
  • count_digrams(text): Counts letter pair frequencies
  • get_english_ngram_data(): Loads English frequency tables from CSV files
  • analyze_frequency_match(decrypted_text, ...): Compares decrypted text to English patterns

Usage: Provides frequency data for scoring and analysis.


playfair_cipher.py

Purpose: Implements Playfair cipher encryption and decryption.

Functions:

  • create_key_matrix(key_string): Creates 5x5 key matrix from 25-letter string
  • decrypt_playfair(cipher_text, key_matrix, letter_to_pos): Decrypts ciphertext
  • encrypt_playfair(plain_text, key_matrix, letter_to_pos): Encrypts plaintext
  • find_position(letter, letter_to_pos): Finds letter position in key matrix

Playfair Rules:

  • Same row: shift right (encrypt) or left (decrypt)
  • Same column: shift down (encrypt) or up (decrypt)
  • Rectangle: swap columns

scoring.py

Purpose: Scores how "English-like" decrypted text is.

Functions:

  • load_ngram_tables(): Loads English frequency tables
  • score_text(text, mono_table, bi_table, tri_table): Scores text using n-gram probabilities

Scoring Method: Uses log probabilities of monograms, bigrams, and trigrams. Higher score = more English-like.


key_search.py

Purpose: Searches for the correct Playfair key.

Functions:

  • mutate_key(key_str): Swaps two random letters in key
  • hill_climb_search(initial_key, cipher_text, score_func, ...): Hill climbing algorithm
  • simulated_annealing_search(initial_key, cipher_text, score_func, ...): Simulated annealing algorithm
  • search_for_key(cipher_text, english_ngram_tables, ...): Main search function combining both methods

Algorithms:

  • Hill Climbing: Accepts only better keys, can get stuck in local optima
  • Simulated Annealing: Sometimes accepts worse keys to escape local optima

Main Scripts

1. main.py - Standard Decryption

Purpose: Decrypts ciphertext when key is unknown.

How to Run:

python main.py

What It Does:

  1. Loads ciphertext from cyphertext.txt
  2. Preprocesses text (removes punctuation, converts to uppercase)
  3. Analyzes ciphertext frequencies
  4. Loads English frequency tables
  5. Searches for key using hill climbing and simulated annealing
  6. Scores decrypted text using n-gram statistics
  7. Saves results to decrypted_output.txt

Output:

  • Key matrix (5x5 grid)
  • Decrypted plaintext
  • Frequency match score
  • N-gram analysis

When to Use: You have ciphertext but don't know the key or plaintext.


2. main_improve.py - Improved Decryption

Purpose: Enhanced decryption with multiple mutation strategies.

How to Run:

python main_improve.py

Improvements:

  • Multiple mutation strategies (swap, rotate, shuffle)
  • Better key initialization
  • Strategy analysis and selection
  • Improved convergence

When to Use: Standard decryption isn't finding a good key.


3. restore_from_open.py - Key Recovery from Known Pairs

Purpose: Recovers the exact Playfair key when you have both plaintext and ciphertext.

How to Run:

python restore_from_open.py

Input Format: microcypheropen.txt with:

  • Line 1: Ciphertext
  • Line 2: Plaintext

What It Does:

  1. Reads ciphertext and plaintext pairs
  2. Extracts digram pairs (plaintext → ciphertext)
  3. Uses constraint propagation to build key matrix
  4. Processes pairs to determine letter positions
  5. Uses backtracking to resolve conflicts
  6. Verifies recovered key
  7. Saves key to restored_key.txt

Algorithm: Constraint-based solving with recursive backtracking.

When to Use: You have both plaintext and ciphertext and want to recover the exact key.

Note: Requires sufficient plaintext-ciphertext pairs (typically 20+ digrams).


4. find_key_statistical.py - Statistical Optimization

Purpose: Finds key that produces decrypted text statistically closest to known plaintext.

How to Run:

python find_key_statistical.py

Input Format: microcypheropen.txt with:

  • Line 1: Ciphertext
  • Line 2: Plaintext

What It Does:

  1. Loads ciphertext and plaintext
  2. Uses optimization algorithms (hill climbing, simulated annealing)
  3. Scores keys by statistical distance between decrypted text and plaintext
  4. Compares:
    • Monogram frequencies
    • Bigram frequencies
    • Character-level accuracy
  5. Runs multiple trials to find best key
  6. Saves best key to restored_key.txt

Statistical Metrics:

  • Chi-square distance
  • Total variation distance
  • Character accuracy
  • KL divergence

When to Use: You have plaintext and want to find a key that produces statistically similar decryption (even if not exact).

Advantages: Works even with partial or noisy plaintext.


Utility Scripts

normalize/normalize_ngrams.py

Purpose: Normalizes n-gram frequency files by converting counts to probabilities.

How to Run:

python normalize/normalize_ngrams.py

What It Does:

  1. Reads english_2grams.csv and english_3grams.csv
  2. Calculates total frequency sum
  3. Divides each frequency by total to get probability
  4. Saves normalized versions to *_norm.csv files

When to Use: After updating frequency data files, or to create normalized versions for scoring.


Step-by-Step Usage Guide

Scenario 1: Decrypt Unknown Ciphertext

  1. Place your ciphertext in cyphertext.txt
  2. Run:
    python main.py
  3. Check decrypted_output.txt for results
  4. If results are poor, try main_improve.py for better search

Scenario 2: Recover Key from Known Plaintext-Ciphertext

Option A: Exact Key Recovery

  1. Create microcypheropen.txt:
    • Line 1: ciphertext
    • Line 2: plaintext
  2. Run:
    python restore_from_open.py
  3. Check restored_key.txt for recovered key

Option B: Statistical Optimization

  1. Create microcypheropen.txt (same format)
  2. Run:
    python find_key_statistical.py
  3. Check restored_key.txt for statistically best key

Scenario 3: Update Frequency Data

  1. Update CSV files with new frequency data
  2. Run:
    python normalize/normalize_ngrams.py
  3. Normalized files will be used for scoring

How It Works

Standard Decryption Process

  1. Preprocessing:

    • Removes punctuation and spaces
    • Converts to uppercase
    • Treats J as I (Playfair convention)
    • Records original formatting positions
  2. Frequency Analysis:

    • Counts digrams in ciphertext
    • Compares to English bigram frequencies
    • Identifies common patterns
  3. Key Search:

    • Starts with random key
    • Mutates key (swaps letters)
    • Decrypts with new key
    • Scores decrypted text
    • Accepts better keys (hill climbing)
    • Sometimes accepts worse keys (simulated annealing)
  4. Scoring:

    • Uses English n-gram probabilities
    • Monograms: single letter frequencies
    • Bigrams: letter pair frequencies
    • Trigrams: letter triple frequencies
    • Higher score = more English-like
  5. Output:

    • Best key found
    • Decrypted plaintext
    • Frequency match analysis
    • Formatted with original punctuation

Key Recovery Process

  1. Pair Extraction:

    • Processes plaintext-ciphertext pairs
    • Extracts digram relationships
    • Determines encryption rule (row/column/rectangle)
  2. Constraint Building:

    • For each pair, determines letter positions
    • Builds constraints on key matrix
    • Propagates constraints
  3. Solving:

    • Uses backtracking to resolve conflicts
    • Tries all valid placements
    • Verifies consistency
  4. Verification:

    • Tests recovered key
    • Encrypts plaintext to verify ciphertext match

Requirements

  • Python 3.6+
  • CSV files with English frequency data:
    • english_1grams.csv
    • english_2grams.csv
    • english_3grams.csv
    • english_2grams_norm.csv (generated)
    • english_3grams_norm.csv (generated)

Performance Notes

  • Standard Decryption: May take 5-15 minutes depending on ciphertext length
  • Key Recovery: Can be fast (< 1 minute) with sufficient pairs, or slow with few pairs
  • Statistical Optimization: Typically 5-10 minutes for thorough search

All scripts print progress updates during execution.


Troubleshooting

Problem: Decryption produces gibberish

  • Solution: Try main_improve.py or increase search iterations

Problem: Key recovery fails

  • Solution: Ensure you have enough plaintext-ciphertext pairs (20+ digrams recommended)

Problem: Statistical optimization has low accuracy

  • Solution: Increase number of trials or iterations in find_key_statistical.py

Problem: Missing CSV files

  • Solution: Ensure all frequency data files are present in the project directory

License

This project is for educational and research purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages