Google Summer of Code 2020 Final Report

SparkManager --- A Jupyter Notebook Extension

Introduction

Apache Spark which is used by people at Astronomy Commons enables large scale distributed computing via creating local and remote clusters. But the problem is that for creating and configuring a cluster one needs to write multiple lines of cumbersome codes. Additionally, if we are scheduling a job on Kubernetes, then we need to write more options and configs. In the long run , maintaining the clusters programmatically becomes inefficient, complicated and unmaintainable. For example: To customize a cluster’s executors and driver we need to write something like:

import pyspark
# Spark Session
spark = (
    pyspark
    .sql
    .SparkSession
    .builder
    .config('spark.driver.memory', '12g')
    .config("spark.driver.cores", "4")
    .config("spark.executor.instances", "24")
    .config("spark.executor.cores", "4")
    .config("spark.executor.memory", "57500m")
    .getOrCreate()
)
# Spark Context
sc = spark.sparkContext

The extension provides a well-designed interface within Jupyter Notebook that could replace the programmatic creation of (or connection to) a Spark cluster with simple UI elements.

This extension will thus enable all of the astronomy community to easily manage their Apache Spark clusters via an easy to use Jupyter Notebook extension called “SparkManager”

Features

1. Create a spark cluster by the click of a few buttons

With the extension one can create pyspark clusters locally or on kubernetes based on the config files that are selected. The drop down allows the user to select the config file he wants to load and based on that config file the cluster is created.

2. Configure or update a cluster.

Once the cluster is created with the existing configurations users can also update the memory/core and other configurations!

3. "spark" variable is injected into the jupyter notebook

The "spark" variable is injected into the kernel and now can be accessed and used.

4. The spark UI link is generated.

Once the user clicks on the "UI" button the user is taken to the spark web UI interface

How it works

Please visit: How It Works

Installation

Please visit Installation

Future work

Make the extension for Jupyter Lab
Integrate real time logs from the Kernel
Add tooltips for buttons

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
docs		docs
kube_cluster		kube_cluster
local_cluster		local_cluster
sparkmanager		sparkmanager
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Summer of Code 2020 Final Report

SparkManager --- A Jupyter Notebook Extension

Introduction

Features

1. Create a spark cluster by the click of a few buttons

2. Configure or update a cluster.

3. "spark" variable is injected into the jupyter notebook

4. The spark UI link is generated.

How it works

Installation

Future work

About

Releases

Packages

Contributors 2

Languages

License

astronomy-commons/sparkmanager

Folders and files

Latest commit

History

Repository files navigation

Google Summer of Code 2020 Final Report

SparkManager --- A Jupyter Notebook Extension

Introduction

Features

1. Create a spark cluster by the click of a few buttons

2. Configure or update a cluster.

3. "spark" variable is injected into the jupyter notebook

4. The spark UI link is generated.

How it works

Installation

Future work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages