Skip to content

ramesh6762/OCR_API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🧠 Overview: OCR Text Extraction

Optical Character Recognition (OCR) is a technology that converts printed, handwritten, or scanned text in images or documents into machine-readable text.


⚙️ How It Works

  1. Image Preprocessing

    • The input image is cleaned to improve text visibility.

    • Common steps:

      • Grayscale conversion
      • Noise removal
      • Binarization (convert to black and white)
      • Deskewing and resizing
  2. Text Detection

    • The algorithm identifies regions of interest (ROI) that contain text.
    • Modern OCR systems use deep learning models (like EAST or CRAFT) to locate text areas accurately.
  3. Character Recognition

    • Each detected text region is analyzed to recognize characters or words.
    • Traditional OCR uses pattern matching or feature extraction.
    • Modern systems use deep learning (CNNs, LSTMs, Transformers) for better accuracy.
  4. Post-processing

    • Corrects errors and reconstructs structured text.
    • Includes spell-checking, language modeling, and format preservation (like paragraphs or tables).

📄 Output

  • Converts scanned or photographed documents (PDFs, images) into:

    • Editable text (TXT, DOCX, etc.)
    • Searchable PDFs
    • Extracted structured data (e.g., names, dates, numbers)

⚙️ Common Tools and Libraries

  • Tesseract OCR (open-source by Google)
  • EasyOCR (deep learning–based)
  • PaddleOCR, AWS Textract, Google Vision API, Azure OCR

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published