Scholar Crawler

A work in progress. It is designed to crawl through Google Scholar and build networks of co-authorship.

Installation

Clone or download the repo, it runs from source.
For cloning via HTTP: git clone https://github.com/jamespreed/scholar-crawler.git For cloning via SSH: git clone git@github.com:jamespreed/scholar-crawler.git

Requirements

Because of captchas, this runs using selenium and Firefox, so you must have Firefox installed. This is currently designed for Windows, but the only Feel free to use the browser of your choice, you will need to roll your own session class.

Here is a Conda environment file (copy and save it as scholar.yaml) you can use to create an environment via conda env create -n scholar -f scholar.yaml

name: scholar
channels:
- defaults
- conda-forge
dependencies:
- python=3.7.*
- pywin32=227  # [win]
- selenium=3.14.0
- geckodriver=0.26.0
- lxml=4.4.2
- urllib3=1.25.8

Usage

While still in alpha, this needs to be run in interactive mode. Or you can build your own scripts.

from scholar_crawler import ScholarQueue

sq = ScholarQueue()  # launches Firefox
sq.search_authors('some dude')  # 
sq.crawl()

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
graphs		graphs
scholar_crawler		scholar_crawler
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scholar Crawler

Installation

Requirements

Usage

About

Releases

Packages

Languages

jamespreed/scholar-crawler

Folders and files

Latest commit

History

Repository files navigation

Scholar Crawler

Installation

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages