Launch OpenWPM crawls using Kubernetes Job workloads or stand up some docker-compose services to run the crawl in a distributed fashion.
A Redis work queue is set up and loaded with the list of URLs to crawl.
Containers running either locally or in the cloud execute the OpenWPM crawler.py script which will continuously fetch sites to run and exit once there are no additional sites in the queue.
To install all the required tools (using conda)
./install.sh
conda activate openwpm-crawler
See ./deployment/local/README.md.
See ./deployment/gcp/README.md.
See ./deployment/local-compose/README.md. This is the simplest option, requiring only docker-compose which is shipped with Docker on both Mac and Windows, however behaviour might slightly differ from cloud crawls.
jupyter notebook
After launching Jupyter, navigate to analysis/Sample Analysis.ipynb
and choose Kernel -> Change Kernel -> openwpm-crawler
in the menu.