Skip to content

Latest commit

 

History

History
69 lines (49 loc) · 2.66 KB

README.md

File metadata and controls

69 lines (49 loc) · 2.66 KB

News Stream Infrastructure

Stack version

News Stream Workflow

The following figure show the workflow of a News Stream Analyzer.

Data is initially stored in a mysql database and daily updated. Each time new data is inserted or updated in the mysql database:

  1. the Kafka Connect component polls it on a Kafka queue (on topic quickstart-jdbc-PRO_clip_repository). We set as polling strategy timestamp on the dataset attribute insertdate (for more information see here).
  2. The News Analyzer component enriches the input message with other information
  3. The enriched message is sent to a kafka (on topic news_genero);
  4. Finally the Kafka Connect component sends data having topic news_genero in the Elastic database.

Run Enviroment with Docker

The following commands run dockers with a mysql server with 200 data record, an ElasticSearch server, Kafka and Kafka connector

> cd ./kafka_infrastructure_poc/docker
> chmod +x launch_demo.sh
> ./launch_demo.sh

For running the news analyzer component:

> cd ./kafka_infrastructure_poc/src
> python -m ./consumers/news_analyzer

Web UIs

Kafka Topics UI

It allows user to : - browse Kafka topics and understand what's happening on the cluster; - find topics and their metadata; - browse kafka messages and download them.

Web UI will be available at localhost:8001

Kafka Schema Registry UI

It allow user to create, view, search and update Avro schemas of the Kafka cluster.

Web UI will be available at localhost:8000

Kafka Connect UI

This is a web tool for Kafka Connect for setting up and managing connectors for multiple connect clusters.

Web UI will be available at localhost:8002