Skip to content

Scripts to download videos using particular search phrases from Pornhub, split the audio components out and run audiogrep to transcribe the audio files

Notifications You must be signed in to change notification settings

arunk/godiogrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

These set of files downloads videos from the PornHub website, strip the audio from the video files and run an audio transcription tool called audiogrep on the audio files. Video files are downloaded to ./video and stripped audio files and transcribed text files to ./audio

Install

Use with virtualenv if possible, otherwise to install system-wide ensure that you have installed pip then run the following in a shell

$ sudo pip install -r requirements.txt

Leave out the sudo if you installing it inside a virtual environment.

Running

Open main.py and edit the following if required:

SEARCH_PHRASES - add search phrases to this list. the videos listed on the search result pages of these phrases will be downloaded

NUM_PAGES - the maximum number of pages of search results to download videos from.

MAX_DURATION - the longest duration of video to download.

Now run

$ python main.py

This might take a few hours to run depending on the number of search phrases, number of pages, maximum duration, internet connection speed etc.

Then run

$ bash process.sh

This might take a few hours again depending on the number of videos downloaded.

Finding god

Use grep

$ grep -lir "god" audio/*.txt

About

Scripts to download videos using particular search phrases from Pornhub, split the audio components out and run audiogrep to transcribe the audio files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published