-
-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededscraper
Description
A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.
Steps we need to take to actualize these goals (not necessarily in order):
- Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
- Get a sense for where government sites tend to keep their terms of service/privacy policy pages
- Define what our metrics and evaluation system are for a "good" terms of service page
- Write a new Python class in
scrapers/scrapers/that builds uponbase_scraper.pyand contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step) - Write tests for this new class
I invite anyone to add/modify this list!
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededscraper