Sample Faust project to process tweets in real-time and count hashtags.
A custom faust CLI command is responsible for filtering a stream of tweets using a list of words in CSV format.
More information about Twitter API track
filter: https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters.html.
For this, the command is integrated with peony-twitter to process the Twitter stream.
Finally, the command will create one event for each hashtag found in tweets returned by the Twitter stream.
The agent will process all events and store the hashtags counters in a tumbling window table.
This project will expose a few Faust views:
- Get all hashtags
- Get hashtag count
- Python 3.6+
pipenv
: https://docs.pipenv.org/en/latest/- Twitter developer account: https://developer.twitter.com/en.html
- Make copy of
.env.example
file and rename it as.env
- Set values from your developer account
pip install -U pipenv
pipenv sync
To install dependencies for development:
pipenv sync --dev
Check all env vars defined in the .env
file and set the corresponding values.
Note:
- Kafka connection string is the list of brokers with the port separated by semicolon.
You can use your own cluster or use one of the docker compose file provided in the docker
folder.
From docker
folder:
docker-compose up
This will run both Zookeeper and Kafka using the default ports 2181
and 9092
.
If you want to store the data, from docker
folder, run:
docker-compose -f docker-compose-with-storage.yml up
Just stop containers running pressing CTRL+C
, or from another window run:
docker-compose stop
From project's folder:
pipenv run faust -A commands.commands -l info hashtags_events_generator --track word1,hashtag1,word2
pipenv run faust -A commands.commands -l info hashtags_events_generator --help
In a different terminal:
pipenv run faust -A src.app worker -l info
Important:
- This will expose the views on the default port
6066
- The worker will store data in the default folder
This view is exposed at: http://localhost:6066/hashtags
This view is exposed at: http://localhost:6066/{hashtag}/count