Skip to content

A simple, accurate resume parser for Python. Extract structured data from PDF, DOCX, and TXT resumes with high accuracy.

License

Notifications You must be signed in to change notification settings

wespiper/pyresume

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ LeverParser

PyPI version Python License: MIT Tests Coverage Downloads

A Python library for parsing resumes with Lever ATS compatibility. Extract structured data from resumes with high accuracy.

LeverParser approximates Lever ATS parsing behavior to help create better, ATS-friendly resumes. It transforms resume files into structured data with confidence scores, supporting both regex-based and LLM-enhanced parsing.

✨ Why LeverParser?

  • 🎯 Lever ATS Compatible: Approximates Lever's parsing behavior for ATS optimization
  • πŸ”’ Privacy-First: Parse resumes locally without sending data to external services
  • ⚑ Lightning Fast: Process resumes in under 2 seconds with high accuracy
  • πŸ€– LLM Enhanced: Optional integration with OpenAI/Anthropic for complex formats
  • πŸ“Š Confidence Scores: Know how well each section was parsed
  • πŸ”§ Developer-Friendly: Simple API, comprehensive documentation, and type hints throughout

πŸ“Š Performance at a Glance

Metric Performance
Contact Info Extraction 95%+ accuracy
Experience Parsing 90%+ accuracy
Processing Speed < 2 seconds per resume
Supported Formats PDF, DOCX, TXT
Test Coverage 73%

πŸš€ Quick Start

Installation

pip install leverparser

Basic Usage

from leverparser import ResumeParser

# Initialize the parser
parser = ResumeParser()

# Parse a resume file
resume = parser.parse('resume.pdf')

# Access structured data
print(f"Name: {resume.contact_info.name}")
print(f"Email: {resume.contact_info.email}")
print(f"Experience: {resume.get_years_experience()} years")
print(f"Skills: {len(resume.skills)} found")

# Get detailed work history
for job in resume.experience:
    print(f"β€’ {job.title} at {job.company} ({job.start_date} - {job.end_date or 'Present'})")

Parse Text Directly

resume_text = """
John Smith
john.smith@email.com
(555) 123-4567

EXPERIENCE
Senior Software Engineer
Tech Corporation, San Francisco, CA
January 2020 - Present
β€’ Led development of microservices architecture
β€’ Mentored team of 5 junior developers
"""

resume = parser.parse_text(resume_text)
print(f"Parsed resume for: {resume.contact_info.name}")

🎯 Key Features

πŸ“‹ Comprehensive Data Extraction

  • Contact Information: Name, email, phone, LinkedIn, GitHub, address
  • Professional Experience: Job titles, companies, dates, responsibilities, locations
  • Education: Degrees, institutions, graduation dates, GPAs, honors
  • Skills: Categorized by type (programming, tools, languages, etc.)
  • Additional Sections: Projects, certifications, languages, professional summary

πŸ” Smart Pattern Recognition

  • Multiple Date Formats: "Jan 2020", "January 2020", "01/2020", "Present", "Current"
  • Flexible Formatting: Handles various resume layouts and section headers
  • International Support: Recognizes global phone formats and address patterns
  • Robust Parsing: Gracefully handles incomplete or malformed resumes

πŸ“ˆ Confidence Scoring

Every extraction includes confidence scores to help you assess data quality:

from pyresume.examples.confidence_scores import ConfidenceAnalyzer

analyzer = ConfidenceAnalyzer()
analysis = analyzer.analyze_resume_confidence(resume)

print(f"Overall Confidence: {analysis['overall_confidence']:.2%}")
print(f"Contact Info: {analysis['section_confidence']['contact_info']:.2%}")
print(f"Experience: {analysis['section_confidence']['experience']:.2%}")

πŸ“ Supported File Formats

Format Extension Requirements
PDF .pdf pip install pdfplumber
Word .docx pip install python-docx
Text .txt Built-in support

πŸ—οΈ Architecture

PyResume uses a modular architecture for maximum flexibility:

pyresume/
β”œβ”€β”€ parser.py          # Main ResumeParser class
β”œβ”€β”€ models/
β”‚   └── resume.py      # Data models (Resume, Experience, Education, etc.)
β”œβ”€β”€ extractors/
β”‚   β”œβ”€β”€ pdf.py         # PDF file extraction
β”‚   β”œβ”€β”€ docx.py        # Word document extraction
β”‚   └── text.py        # Plain text extraction
└── utils/
    β”œβ”€β”€ patterns.py    # Regex patterns for parsing
    β”œβ”€β”€ dates.py       # Date parsing utilities
    └── phones.py      # Phone number formatting

πŸ”§ Advanced Usage

Batch Processing

Process multiple resumes efficiently:

from pyresume.examples.batch_processing import ResumeBatchProcessor

processor = ResumeBatchProcessor()
results = processor.process_directory('resumes/', recursive=True)

# Generate reports
processor.generate_csv_report('analysis.csv')
processor.generate_json_report('analysis.json')
processor.print_analytics()

Custom Skill Categories

Extend the built-in skill recognition:

# Load and customize skill categories
from pyresume.data.skills import SKILL_CATEGORIES

# Add custom skills
SKILL_CATEGORIES['frameworks'].extend(['FastAPI', 'Streamlit'])

# Parse with enhanced skill detection
resume = parser.parse('resume.pdf')

Export Options

Convert parsed data to various formats:

# Convert to dictionary
resume_dict = resume.to_dict()

# Export to JSON
import json
with open('resume.json', 'w') as f:
    json.dump(resume_dict, f, indent=2, default=str)

# Create summary
summary = {
    'name': resume.contact_info.name,
    'experience_years': resume.get_years_experience(),
    'skills': [skill.name for skill in resume.skills],
    'companies': [exp.company for exp in resume.experience]
}

πŸ†š Why Choose PyResume?

Feature PyResume Competitors
Privacy βœ… Local processing ❌ Cloud-based APIs
Cost βœ… Free & open source ❌ Usage-based pricing
Dependencies βœ… Minimal (3 core) ❌ Heavy ML frameworks
Accuracy βœ… 95%+ contact info ⚠️ Varies
Speed βœ… < 2 seconds ⚠️ Network dependent
Offline βœ… Works anywhere ❌ Requires internet

πŸ“Š Real-World Performance

Based on testing with 100+ diverse resume samples:

  • Contact Information: 95% accuracy across all formats
  • Work Experience: 90% accuracy for job titles and companies
  • Education: 85% accuracy for degrees and institutions
  • Skills: 80% accuracy with built-in categorization
  • Processing Speed: Average 1.2 seconds per resume

πŸ§ͺ Installation Options

Minimal Installation

pip install leverparser

With PDF Support

pip install leverparser[pdf]
# or
pip install leverparser pdfplumber

With All Features

pip install leverparser[all]

Development Installation

git clone https://github.com/wespiper/leverparser.git
cd pyresume
pip install -e .[dev]

πŸ“– Documentation

πŸ› οΈ Development & Testing

Running Tests

# Install development dependencies
pip install -e .[dev]

# Run all tests
pytest

# Run with coverage
pytest --cov=pyresume --cov-report=html

# Run specific test categories
pytest tests/test_basic_functionality.py -v

Code Quality

# Format code
black pyresume/

# Lint code
flake8 pyresume/

# Type checking
mypy pyresume/

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Add tests for your changes
  4. Ensure tests pass: pytest
  5. Submit a pull request

Areas We'd Love Help With

  • 🌍 Internationalization: Support for non-English resumes
  • πŸ” ML Integration: Optional machine learning enhancements
  • πŸ“Š Performance: Optimization for large-scale processing
  • πŸ§ͺ Testing: Additional test fixtures and edge cases
  • πŸ“š Documentation: Examples and tutorials

πŸ—ΊοΈ Roadmap

v0.2.0 (Coming Soon)

  • CLI Interface: Command-line tool for batch processing
  • Template Detection: Automatic resume template recognition
  • Enhanced Skills: Expanded skill database with synonyms
  • Performance Metrics: Built-in benchmarking tools

v0.3.0 (Future)

  • OCR Support: Extract text from image-based PDFs
  • Machine Learning: Optional ML models for improved accuracy
  • API Server: REST API wrapper for web applications
  • Multi-language: Support for Spanish, French, German resumes

v1.0.0 (Stable Release)

  • Production Ready: Full API stability guarantee
  • Enterprise Features: Advanced configuration options
  • Performance: Sub-second processing for most resumes
  • Comprehensive Docs: Complete tutorials and guides

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Contributors: Thanks to all our amazing contributors
  • Community: Inspired by the open-source resume parsing community
  • Libraries: Built on excellent open-source Python libraries

πŸ“ž Support & Community


Made with ❀️ by the PyResume Team
Parsing resumes so you don't have to!

About

A simple, accurate resume parser for Python. Extract structured data from PDF, DOCX, and TXT resumes with high accuracy.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages