[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

ParkGyeongTae · 2024-09-08T05:43:58Z

What is this PR for?

Provide a YAML file that creates Apache Zeppelin and Apache Spark containers together using the docker compose command.

What type of PR is it?

Improvement

Todos

- Update .gitignore
- Add docker-compose-with-spark.yml
- Update READMD.md

What is the Jira issue?

https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-6067

How should this be tested?

Install Spark binary

cd scripts/docker/zeppelin-quick-start
wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz
tar -xvf spark-3.5.2-bin-hadoop3.tgz

docker compose -f docker-compose-with-spark.yml up
Run paragraph

%spark.conf

SPARK_HOME /opt/spark
spark.master spark://spark-master:7077

Run paragraph

%spark

val sdf = spark.createDataFrame(Seq((0, "park", 13, 70, "Korea"), (1, "xing", 14, 80, "China"), (2, "john", 15, 90, "USA"))).toDF("id", "name", "age", "score", "country")
sdf.printSchema
sdf.show()

Result

root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- score: integer (nullable = false)
 |-- country: string (nullable = true)

+---+----+---+-----+-------+
| id|name|age|score|country|
+---+----+---+-----+-------+
|  0|park| 13|   70|  Korea|
|  1|xing| 14|   80|  China|
|  2|john| 15|   90|    USA|
+---+----+---+-----+-------+

Screenshots (if appropriate)

Zeppelin UI (http://localhost:8080)
Spark Master UI (http://localhost:18080)
Spark Worker UI (http://localhost:18081)

Questions:

Does the license files need to update? No.
Is there breaking changes for older versions? No.
Does this needs documentation? No.

Reamer

LGTM

pan3793 · 2024-09-19T06:58:45Z

scripts/docker/zeppelin-quick-start/README.md

+```bash
+cd scripts/docker/zeppelin-quick-start
+wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz
+tar -xvf spark-3.5.2-bin-hadoop3.tgz


why do we need to install spark out side of the container?

Oh, I see, you mount the Spark binary into the container later

@pan3793
I used the mounting method because I wanted to recommend running it according to the Spark version.

@pan3793 ping :)

ParkGyeongTae added 4 commits September 8, 2024 14:33

[ZEPPELIN-6067] Update .gitignore

d73f1cd

[ZEPPELIN-6067] Add docker-compose-with-spark.yml file

20cbdf3

[ZEPPELIN-6067] Update README.md

0c8ea68

[ZEPPELIN-6067] Update Spark version

38c8d74

Reamer approved these changes Sep 9, 2024

View reviewed changes

pan3793 reviewed Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

ParkGyeongTae commented Sep 8, 2024 •

edited

Loading

Reamer left a comment

pan3793 Sep 19, 2024

pan3793 Sep 19, 2024

ParkGyeongTae Sep 23, 2024

ParkGyeongTae Oct 12, 2024

[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

Are you sure you want to change the base?

[ZEPPELIN-6067] Add docker-compose file for running with Spark #4820

Conversation

ParkGyeongTae commented Sep 8, 2024 • edited Loading

What is this PR for?

What type of PR is it?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

Reamer left a comment

Choose a reason for hiding this comment

pan3793 Sep 19, 2024

Choose a reason for hiding this comment

pan3793 Sep 19, 2024

Choose a reason for hiding this comment

ParkGyeongTae Sep 23, 2024

Choose a reason for hiding this comment

ParkGyeongTae Oct 12, 2024

Choose a reason for hiding this comment

ParkGyeongTae commented Sep 8, 2024 •

edited

Loading