Google Scholar Scraper

Unfortunately, Google Scholar does not support exporting results... I needed the most cited papers for a research project, and after trying an imperfect script I decided to write my own.

Important note: The spiders don't send more than 2 requests per second to Google Scholar. The reason is that we don't like to solve the CAPTCHA, so it's better to wait a little and acting like a human. Changing IP address sometimes is a good idea... 😩

Features

Supports multiple languages
Customizable date range
Sorts by number of citations
Sorts by year
Searches for articles
Searches for case law
Searches in a profile by ID
Graphical interface

Usage

Install the dependencies:

pip install -r requirements.txt

Run the scraper just by typing the keyword:

python core.py "cryptography"

Customize the date range:

python core.py "metaverse" -s 1997 -e 2018

Limit the languages to one or more:

python core.py "medical" -l en es zh-tw fr

Set the output file path:

python core.py "machine learning" -s 2002 -o exports/most_cited_ml_articles_since_2002.csv

Sort the output by year:

python core.py "oceanography" -y

Search for case law:

python core.py "privacy" -c

Get a specific profile articles by the user ID:

python core.py "nms69lqaaaaj" -p -o jeff_dean_articles.csv

Make the program quiet:

python core.py "philosophy" -e 1234 -q

Here is some example exports to see if the scraper meets your needs or not!

License

This project is licensed under the MIT license found in the LICENSE file in the root directory of this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Google Scholar Scraper

Features

Usage

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Google Scholar Scraper

Features

Usage

License