Jeffrey Ware & Roger Camps http://jsdev.learnersguild.org/goals/102-Build_a_Node_Web_Crawler_-_Web_Scraper.html
Web Scraper with Node.js Challenge Rubric
Have built a full-stack application Have worked with databases Can build on a tutorial to make it your own Can render information from an HTTP call Are comfortable with node modules Are interested in learning more about Web Scraper technology Description
“Web scraping is a technique in data extraction where you pull information from websites.” *1
Create a web scraper which gathers information from the web. The tutorial listed below will take you through, step by step, setting up your own crawler. Modify this tutorial to fit some subject of interest to you, build an app with unique content of your choosing (i.e a list movie titles with images, a list of related blogs, news articles, etc.).
Web scrapers are used to pull information from websites where no API is available, or where you’re pulling data from multiple sites at once. A web scraper pulls information from HTML. After you have the information it can be formatted and stored in a more structured way like in a JSON object.
When designing your scraper think about the kind of data you want to receive. It is possible to search a single page for information or search multiple pages at once for a specific type of content (i.e searching multiple sites to find videos about puppies.) It is your job to build a simple web scraper and then modify it to search for information from several different sites to populate your app.
There are many common / practical uses to this, and this is a technique employed by many companies these days. A scraper can be used to populate a list of movies, or particular blogs that you like.
This project also gives you a great practical example of the limits that efficient code can bring, since you will be pushing the limit of a process when your crawler is running.
Follow the tutorial here: https://scotch.io/tutorials/scraping-the-web-with-node-js
- Artifact produced is a repo
- Can run a command to start the scraper server
- Can scrape a single site for multiple pieces of information (i.e extracting name, director, and release date from IMDB)
- Can store information in the form of JSON
- Content is generated from multiple sites
- Build an interface to display information extracted
- UI is easy to navigate and intuitive
- Commit messages are concise and descriptive
- Code is well formatted without any linting errors
- Variables, functions, CSS classes, etc. are meaningfully
- Functions are small and serve a single purpose
- Code is well organized into a meaningful file structure
Recreate the project without jQuery or Request Recreate the project with another type of content Resources
Tutorial: