Skip to content

Latest commit

 

History

History
86 lines (58 loc) · 2.39 KB

README.md

File metadata and controls

86 lines (58 loc) · 2.39 KB

Google Scholar Scraper

GitHub repo status GitHub license GitHub contributors GitHub tag (latest by date) GitHub repo size

Unfortunately, Google Scholar does not support exporting results... I needed the most cited papers for a research project, and after trying an imperfect script I decided to write my own.

Important note: The spiders don't send more than 2 requests per second to Google Scholar. The reason is that we don't like to solve the CAPTCHA, so it's better to wait a little and acting like a human. Changing IP address sometimes is a good idea... 😩

Features

  • Supports multiple languages
  • Customizable date range
  • Sorts by number of citations
  • Sorts by year
  • Searches for articles
  • Searches for case law
  • Searches in a profile by ID
  • Graphical interface

A shocked skeleton

Usage

Install the dependencies:

pip install -r requirements.txt

Run the scraper just by typing the keyword:

python core.py "cryptography"

Customize the date range:

python core.py "metaverse" -s 1997 -e 2018

Limit the languages to one or more:

python core.py "medical" -l en es zh-tw fr

Set the output file path:

python core.py "machine learning" -s 2002 -o exports/most_cited_ml_articles_since_2002.csv

Sort the output by year:

python core.py "oceanography" -y

Search for case law:

python core.py "privacy" -c

Get a specific profile articles by the user ID:

python core.py "nms69lqaaaaj" -p -o jeff_dean_articles.csv

Make the program quiet:

python core.py "philosophy" -e 1234 -q

Here is some example exports to see if the scraper meets your needs or not!

License

This project is licensed under the MIT license found in the LICENSE file in the root directory of this repository.