academic-abstract-scraper

Background

Research papers often involve repetitive tasks. Automation can help reduce time spent on menial tasks. When conducting a Systematic Review of literature, you may need to categorise a great number of papers. One common way of categorising a large volume of papers is by the presence of various keywords in their abstracts. These papers are likely to be stored in an Excel file as URLs, where the papers are hosted on various web pages. To prevent time-consuming analsysis, an algorithm can be used to check each of the websites and scan the abstracts for keywords.

Required steps

MVP

Extract URLs from an Excel file into Python.
Configure a web scraping programme that can go to the URL and scrape the text from the abstract of an article.
Return whether the articles contain that keyword.

EXTENSION

Return a dataset that shows the presence or absence of keywords for many articles.
Export the analysis to a new Excel file.
Identify schema for accessing abstracts on different domains (e.g. Scopus vs ResearchGate)
Extract domain name automatically from a URL

Why Python?

Python is ideal for this kind of data extraction task where we are also happy working in a Command Line Interface.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
README.md		README.md
abstracts.csv		abstracts.csv
allAbstracts.py		allAbstracts.py
input.csv		input.csv
script.py		script.py
testRead.py		testRead.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

academic-abstract-scraper

Background

Required steps

Why Python?

About

Uh oh!

Releases

Packages

Languages

Kit2345/abstract-scraper

Folders and files

Latest commit

History

Repository files navigation

academic-abstract-scraper

Background

Required steps

Why Python?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages