A scraping Master-slave system based on Google App Engine
This repository showcases an approach to orchestrate from a local process a Scraper deployed in Google App Engine. The proposal is a workaround to the HTTP 429 Too Many Requests Error. The main idea is to redeploy the Scraper to get a new IP whenever the Error shows up.
Take a look at the article I published about this
To test this locally clone the repo and run:
pip install -r requirements.txt
python master.py
in one terminalgunicorn -b :8080 slave:app --timeout 360000 --preload
in a different terminal
The output of the master looks like this.