A flexible Python web scraper that lets you:
- Input any URL at runtime
- Preview available HTML tags and select which elements to scrape
- Export scraped data to CSV or Excel
- Choose tags dynamically (no hardcoded tag list)
- Confirm before final scraping
- Handles invalid input gracefully
- Displays saved file location at the end
This scraper does not currently support JavaScript-rendered pages.
Support for JS-rendered pages (via Selenium or Playwright) is planned for a future release.
β Dynamic Tag Detection β Pre-scrapes the page and lists all available HTML tags
β User-Controlled Scraping β Select which tags you want to scrape
β Multiple Export Options β Save as CSV or Excel
β Error Handling β Handles invalid choices without crashing
β Clear Exit Options β Press q anytime to quit
β File Path Confirmation β Confirms where your files were saved
- Python 3.8+
- The following Python libraries (see
requirements.txt):requestsbeautifulsoup4pandas
Clone the repository:
git clone https://github.com/YOUR_USERNAME/python-web-scraper.git
cd python-web-scraperInstall dependencies:
pip install -r requirements.txtRun the scraper:
python scraper.py- Enter the URL you want to scrape
- The script previews all HTML tags found
- Select tags to scrape (e.g.,
p, h1, h2) - Choose CSV or Excel output
- Confirm and scrape
- Files are saved in the current folder, and the path is displayed at the end
- β Add support for JavaScript-rendered pages
- β Add a Streamlit Web Interface for easy use
- β Deploy on Streamlit Cloud so anyone can try it online
- β Support search by CSS selectors or attributes
Pull requests are welcome! For major changes, please open an issue first to discuss what youβd like to change.