Skip to content

Kit2345/abstract-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

academic-abstract-scraper

Background

Research papers often involve repetitive tasks. Automation can help reduce time spent on menial tasks. When conducting a Systematic Review of literature, you may need to categorise a great number of papers. One common way of categorising a large volume of papers is by the presence of various keywords in their abstracts. These papers are likely to be stored in an Excel file as URLs, where the papers are hosted on various web pages. To prevent time-consuming analsysis, an algorithm can be used to check each of the websites and scan the abstracts for keywords.

Required steps

MVP

  • Extract URLs from an Excel file into Python.
  • Configure a web scraping programme that can go to the URL and scrape the text from the abstract of an article.
  • Return whether the articles contain that keyword.

EXTENSION

  • Return a dataset that shows the presence or absence of keywords for many articles.
  • Export the analysis to a new Excel file.
  • Identify schema for accessing abstracts on different domains (e.g. Scopus vs ResearchGate)
  • Extract domain name automatically from a URL

Why Python?

Python is ideal for this kind of data extraction task where we are also happy working in a Command Line Interface.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages