qq_music_collection_crawler

Purpose

Crawl collection information from QQ music for further data analysis. The crawlers will first try to retrive data from QQ music collection homepage, then it goes to detail page to get the list of songs. The two crawlers are separated into two files. The first file "qq_music_collection_step1.py" will crawl the basic information of the collection and store to the Mongo Database. The second file "qq_music_collection_step2.py" gets the detail informaiton form the detial page.

Python version

Python 3.6.5

How to run

Install virtualenv

pip install virtualenv

activate virutalenv & source

cd [project_path]
virtualenv .venv
source .venv/bin/activate/ --python=python3

Install packages pip install -r requirements.txt
Set up MongoDB connection Depending on your Mongo set up, you need to configure the uri in settings.py in order to store to the database. Modify settings.py.example to proceed.
Modify request.py Likely you would need some kind of IP rotation to avoid being banned. Modify the request.py file and add proxy in it.
Run first script python qq_music_collection_step1.py This will get the basic information of the collections from homepage
Run second script python qq_music_collection_step2.py This will get the detail information of each collection, obtained from the first step

Avoid getting banned

Known anti-crawling mechanism

Need set referer in header for each request
Need set User-Agent
IP rotation (hasn't tested yet)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
categories.txt		categories.txt
db.py		db.py
qq_music_collection_step1.py		qq_music_collection_step1.py
qq_music_collection_step2.py		qq_music_collection_step2.py
request.py		request.py
requirements.txt		requirements.txt
settings.py.example		settings.py.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qq_music_collection_crawler

Purpose

Python version

How to run

Avoid getting banned

About

Releases

Packages

Languages

Gravellent/qq_music_collection_spider

Folders and files

Latest commit

History

Repository files navigation

qq_music_collection_crawler

Purpose

Python version

How to run

Avoid getting banned

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages