A deep learning-based Optical Character Recognition (OCR) system for extracting text from handwritten documents using Connectionist Temporal Classification (CTC) and Convolutional Neural Networks (CNN).
- Handwritten text recognition using deep learning
- Utilizes CTC (Connectionist Temporal Classification) for sequence prediction
- Implements a CNN + LSTM architecture for robust text recognition
- Supports variable-length text sequences
- Preprocessing pipeline for image normalization and frame splitting
- Batch processing capabilities for efficient training
The model architecture consists of:
- Time-distributed CNN layers for feature extraction
- Bidirectional LSTM layers for sequence modeling
- CTC loss function for sequence prediction
- Batch normalization for improved training stability
- Python 3.x
- Keras
- OpenCV
- NumPy
- Pandas
- TensorFlow (backend for Keras)
Handwritten_Text_Extraction_OCR/
βββ Data/ # Dataset directory
β βββ list.csv # Dataset annotations
β βββ class.txt # Character classes
βββ CTCModel.py # CTC model implementation
βββ configuration.py # Model configuration parameters
βββ model.py # Main model architecture
βββ read_images.py # Image preprocessing utilities
βββ LICENSE # MIT License
-
Prepare your dataset:
- Place images in the Data directory
- Update list.csv with image paths and annotations
- Ensure class.txt contains all character classes
-
Configure parameters in configuration.py:
- Set window dimensions
- Adjust batch size and epochs
- Configure model parameters
-
Train the model:
python model.py
-
The model will:
- Preprocess images
- Train on the dataset
- Save the trained model
- Evaluate performance
The training process includes:
- Image preprocessing and normalization
- Frame splitting for sequence processing
- CTC loss optimization
- Performance evaluation using:
- Loss metrics
- Label Error Rate (LER)
- Sequence Error Rate (SER)
This project is licensed under the MIT License - see the LICENSE file for details.
Kunal Bhujbal