Skip to content

kbhujbal/Handwritten_Text_Extraction_OCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Handwritten Text Extraction OCR

A deep learning-based Optical Character Recognition (OCR) system for extracting text from handwritten documents using Connectionist Temporal Classification (CTC) and Convolutional Neural Networks (CNN).

Features

  • Handwritten text recognition using deep learning
  • Utilizes CTC (Connectionist Temporal Classification) for sequence prediction
  • Implements a CNN + LSTM architecture for robust text recognition
  • Supports variable-length text sequences
  • Preprocessing pipeline for image normalization and frame splitting
  • Batch processing capabilities for efficient training

Architecture

The model architecture consists of:

  • Time-distributed CNN layers for feature extraction
  • Bidirectional LSTM layers for sequence modeling
  • CTC loss function for sequence prediction
  • Batch normalization for improved training stability

Requirements

  • Python 3.x
  • Keras
  • OpenCV
  • NumPy
  • Pandas
  • TensorFlow (backend for Keras)

Project Structure

Handwritten_Text_Extraction_OCR/
β”œβ”€β”€ Data/                  # Dataset directory
β”‚   β”œβ”€β”€ list.csv          # Dataset annotations
β”‚   └── class.txt         # Character classes
β”œβ”€β”€ CTCModel.py           # CTC model implementation
β”œβ”€β”€ configuration.py      # Model configuration parameters
β”œβ”€β”€ model.py             # Main model architecture
β”œβ”€β”€ read_images.py       # Image preprocessing utilities
└── LICENSE              # MIT License

Usage

  1. Prepare your dataset:

    • Place images in the Data directory
    • Update list.csv with image paths and annotations
    • Ensure class.txt contains all character classes
  2. Configure parameters in configuration.py:

    • Set window dimensions
    • Adjust batch size and epochs
    • Configure model parameters
  3. Train the model:

    python model.py
  4. The model will:

    • Preprocess images
    • Train on the dataset
    • Save the trained model
    • Evaluate performance

Model Training

The training process includes:

  • Image preprocessing and normalization
  • Frame splitting for sequence processing
  • CTC loss optimization
  • Performance evaluation using:
    • Loss metrics
    • Label Error Rate (LER)
    • Sequence Error Rate (SER)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Kunal Bhujbal

About

πŸ“ Deep learning OCR system for handwritten text recognition using CTC loss & CNN-BiLSTM architecture. Features include image preprocessing with adaptive frame splitting, time-distributed convolutional layers for feature extraction, bidirectional LSTM for sequence modeling, batch normalization for training stability. LER & SER performance metrics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages