Optical Character Recognition (OCR) is a technology that converts printed, handwritten, or scanned text in images or documents into machine-readable text.
-
Image Preprocessing
-
The input image is cleaned to improve text visibility.
-
Common steps:
- Grayscale conversion
- Noise removal
- Binarization (convert to black and white)
- Deskewing and resizing
-
-
Text Detection
- The algorithm identifies regions of interest (ROI) that contain text.
- Modern OCR systems use deep learning models (like EAST or CRAFT) to locate text areas accurately.
-
Character Recognition
- Each detected text region is analyzed to recognize characters or words.
- Traditional OCR uses pattern matching or feature extraction.
- Modern systems use deep learning (CNNs, LSTMs, Transformers) for better accuracy.
-
Post-processing
- Corrects errors and reconstructs structured text.
- Includes spell-checking, language modeling, and format preservation (like paragraphs or tables).
-
Converts scanned or photographed documents (PDFs, images) into:
- Editable text (TXT, DOCX, etc.)
- Searchable PDFs
- Extracted structured data (e.g., names, dates, numbers)
- Tesseract OCR (open-source by Google)
- EasyOCR (deep learning–based)
- PaddleOCR, AWS Textract, Google Vision API, Azure OCR