This repo contains example scripts of scrapes in both Ruby and Python using concepts taught in the NICAR 2015 advanced web scraping course. The class focuses on using the web inspector to find the information needed to conduct more sophisticated scrapes. The slide deck for the presentation can be found here.
###Python
The Python scrapes require only two modules not included with Python standard library. BeautifulSoup4 is a module for parsing markdown languages such as HTML and XML. Requests is used to make both get and post web requests.
Both can be installed individually using pip
or together using pip install -r requirements.txt
.
###Ruby
The Ruby scripts require three different libraries. The first is Nokogiri, Ruby's parser for HTML and XML. The ASP.NET scrape requires Mechanize to emulate a browser. Rest-Client is needed to make web requests in the mapscrape.rb example.
If you have Bundler installed you can simply navigated to the Ruby directory and use bundle install
to install the required libraries. Otherwise, use gem install <package name>
.