Skip to content

"Data-Streaming-with-Kafka-and-PySpark" is a GitHub repository that provides concise guidance and examples for integrating Apache Kafka with PySpark for real-time data streaming and processing tasks.

Notifications You must be signed in to change notification settings

datamics/Data-Streaming-with-Kafka-and-PySpark

 
 

Repository files navigation

Data-Streaming-with-Kafka-and-PySpark

"Data-Streaming-with-Kafka-and-PySpark" is a GitHub repository that provides concise guidance and examples for integrating Apache Kafka with PySpark for real-time data streaming and processing tasks.

301461852-8ed50356-b54d-4f64-9e69-3b714c150298

kaggle link to the dataset: https://www.kaggle.com/datasets/marcpaulo/harry-potter-reviews

About

"Data-Streaming-with-Kafka-and-PySpark" is a GitHub repository that provides concise guidance and examples for integrating Apache Kafka with PySpark for real-time data streaming and processing tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%