spark-euca

Please read bellow to make sure these tools will work with your system

Has been tested with boto version 2.31.1 and 2.38.

Eucalyptus private cloud is API compatible with Amazon EC2. The scripts on this repo are a modified version of the spark-ec2 tools that you can find here: https://github.com/mesos/spark-ec2. The scripts can be used unmodified with Eucalyptus. If you want to use Amazon you should copy the connector from the spark-ec2 repo. The current git repo might contain code from the original spark tools that is not currently used for the installation to Eucalyptus.

spark-euca

This repository contains the set of scripts used to setup:

A Spark standalone cluster from scratch (emi or ami with Ubuntu 12.0.4) or
A Mesos cluster with Spark, Spark-Streaming, MapReduce, Storm, Kafka and Hama running on Eucalyptus. The deployment code supports installation both from a completely empty Ubuntu 12.04 precise emi or from a preconfigured emi with hdfs, mesos, hadoop and spark already installed.

In both cases I am assuming you already have a running eucalyptus cluster. The scripts require that Eycaluptus is already installed and then create the instances to run your cluster according to the EMI you specify.

Environment setup

If you haven't done already when setting up your eucalyptus environment you will need to create an ssh key in order to be able to ssh without password to the instances. This key will be on your -i and -k options used on the deployment code. euca-create-keypair keyname.key > keyname.key (or for older versions of eucalyptus: euca-add-keypair keyname.key > keyname.key) chmod 600 keyname.key
You need to source a eucarc file that looks like the following:

EUCA_KEY_DIR=$(cd $(dirname ${BASH_SOURCE:-$0}); pwd -P)
export WALRUS_IP=[WALRUS_IP]
export EC2_URL=http://[WALRUS_IP]:8773/services/Eucalyptus
export S3_URL=http://[WALRUS_IP]:8773/services/Walrus
export EUARE_URL=http://[WALRUS_IP]:8773/services/Euare
export TOKEN_URL=http://[WALRUS_IP]:8773/services/Tokens
export AWS_AUTO_SCALING_URL=http://[WALRUS_IP]:8773/services/AutoScaling
export AWS_CLOUDWATCH_URL=http://[WALRUS_IP]:8773/services/CloudWatch
export AWS_ELB_URL=http://[WALRUS_IP]:8773/services/LoadBalancing
export EUSTORE_URL=http://emis.eucalyptus.com/
export EC2_PRIVATE_KEY=${EUCA_KEY_DIR}/euca2-admin-cd349995-pk.pem
export EC2_CERT=${EUCA_KEY_DIR}/euca2-admin-cd349995-cert.pem
export EC2_JVM_ARGS=-Djavax.net.ssl.trustStore=${EUCA_KEY_DIR}/jssecacerts
export EUCALYPTUS_CERT=${EUCA_KEY_DIR}/cloud-cert.pem
export EC2_ACCOUNT_NUMBER='[YOUR_NUMBER]'
export EC2_ACCESS_KEY='[YOUR_ACCESS_KEY]'
export EC2_SECRET_KEY='[YOUR_SECRET_KEY]'
export AWS_ACCESS_KEY='[YOUR_ACCESS_KEY]'
export AWS_SECRET_KEY='[YOUR_SECRET_KEY]'
export AWS_ACCESS_KEY_ID='[YOUR_ACCESS_KEY]'
export AWS_SECRET_ACCESS_KEY='[YOUR_SECRET_KEY]'
export AWS_CREDENTIAL_FILE=${EUCA_KEY_DIR}/iamrc
export EC2_USER_ID='[YOUR_USER_ID]'
alias ec2-bundle-image="ec2-bundle-image --cert ${EC2_CERT} --privatekey ${EC2_PRIVATE_KEY} --user ${EC2_ACCOUNT_NUMBER} --ec2cert ${EUCALYPTUS_CERT}"
alias ec2-upload-bundle="ec2-upload-bundle -a ${EC2_ACCESS_KEY} -s ${EC2_SECRET_KEY} --url ${S3_URL}"

Modify /etc/ssh_config on a mac (or /etc/ssh/ssh_config on ubuntu) uncomment or add: StrictHostKeyChecking no UserKnownHostsFile=/dev/null
Make sure boto is installed.

Usage

To install Spark Standalone all you need to do is run something like this:

./spark-euca 
-i xxxx.pem 
-k xxxx 
-s n 
-emi-xyzzyzxy
-t x2.2xlarge 
--no-ganglia 
--os-type ubuntu
--user-data zzzz.sh  
launch spark-test

To install a mesos cluster all you need to do is run something like this:

./mesos-euca-emi -i xxxx.pem.pem
-k xxxx 
-s n
-emi-master emi-xyzzyzxy
-e emi-xyzzyzxy  
-t m2.2xlarge 
--no-ganglia 
-w 60 
--user-data-file zzzz.sh
--installation-type mesos-emi
--run-tests True
launch mesos-cluster-emi

You could use the mesos-euca-emi to also install a Spark Standalone cluster by specifying the correct arguments but the spark-euca script will work just fine for Spark-standalone

To install from an empty Ubuntu 12.04 precise emi use the option --installation-type=empty-emi and: on euca00 cluster: use emi-56CB3EE9 for both masters and slaves on euca eci cluster: use emi-DF913965 for both masters and slaves
for installation starting from an emi with pre-installed hdfs, mesos, hadoop and spark use the option --installation-type=mesos-emi and: on euca00: --emi-master emi-283B3B45 -e emi-35E93896 on euca eci: TBD

Examples

example for installation from a preconfigured emi on euca00:

./mesos-euca-generic -i ~/vagrant_euca/stratos.pem -k stratos --ft 3 -s 2 --emi-master emi-283B3B45 -e emi-35E93896 -t m2.2xlarge --no-ganglia --user-data-file clear-key-ubuntu.sh --installation-type mesos-emi --run-tests True --cohost --swap 4096 launch cluster-name

example for installation from am empty emi on euca00:

./mesos-euca-generic -i ~/vagrant_euca/stratos.pem -k stratos --ft 3 -s 2 --emi-master emi-283B3B45 -e emi-35E93896 -t m2.2xlarge --no-ganglia --user-data-file clear-key-ubuntu.sh --installation-type mesos-emi --run-tests True --cohost --swap 4096 launch cluster-name

Check scripts for more options

Details

The Spark cluster setup is guided by the values set in ec2-variables.sh.setup.sh first performs basic operations like enabling ssh across machines, mounting ephemeral drives and also creates files named /root/spark-euca/masters, and /root/spark-euca/slaves. Following that every module listed in MODULES is initialized.

To add a new module, you will need to do the following:

a. Create a directory with the module's name

b. Optionally add a file named init.sh. This is called before templates are configured and can be used to install any pre-requisites.

c. Add any files that need to be configured based on the cluster setup to templates/. The path of the file determines where the configured file will be copied to. Right now the set of variables that can be used in a template are

      {{master_list}}
      {{active_master}}
      {{slave_list}}
      {{zoo_list}}
      {{cluster_url}}
      {{hdfs_data_dirs}}
      {{mapred_local_dirs}}
      {{spark_local_dirs}}
      {{default_spark_mem}}
      {{spark_worker_instances}}
      {{spark_worker_cores}}
      {{spark_master_opts}}

You can add new variables by modifying deploy_templates.py

d. Add a file named setup.sh to launch any services on the master/slaves. This is called after the templates have been configured. You can use the environment variables $SLAVES to get a list of slave hostnames and /root/spark-euca/copy-dir to sync a directory across machines.

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
ami-list		ami-list
backup		backup
cloudera-hdfs		cloudera-hdfs
environment-setup		environment-setup
ephemeral-hdfs		ephemeral-hdfs
euca		euca
ganglia		ganglia
hadoop-on-mesos		hadoop-on-mesos
hama-on-mesos		hama-on-mesos
kafka		kafka
mapreduce		mapreduce
mesos		mesos
monit		monit
mpich2		mpich2
persistent-hdfs		persistent-hdfs
s3cmd		s3cmd
scala		scala
shark		shark
spark-on-mesos		spark-on-mesos
spark-standalone		spark-standalone
spark		spark
storm-on-mesos		storm-on-mesos
tachyon		tachyon
templates		templates
utilities		utilities
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
clear-key-ubuntu.sh		clear-key-ubuntu.sh
copy-dir		copy-dir
copy-dir-generic		copy-dir-generic
create-swap.sh		create-swap.sh
deploy_templates.py		deploy_templates.py
deploy_templates_mesos.py		deploy_templates_mesos.py
github.hostkey		github.hostkey
prepare-slaves-centos.sh		prepare-slaves-centos.sh
prepare-slaves-ubuntu.sh		prepare-slaves-ubuntu.sh
setup-mesos-emi-slave.sh		setup-mesos-emi-slave.sh
setup-mesos-emi.sh		setup-mesos-emi.sh
setup-mesos-generic.sh		setup-mesos-generic.sh
setup-mesos-slave.sh		setup-mesos-slave.sh
setup-mesos.sh		setup-mesos.sh
setup-slave.sh		setup-slave.sh
setup.sh		setup.sh
ssh-no-keychecking.sh		ssh-no-keychecking.sh
tachyon-0.5.0-bin.tar.gz		tachyon-0.5.0-bin.tar.gz
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-euca

Environment setup

Usage

Examples

Details

About

Releases 5

Packages

Languages

License

MAYHEM-Lab/spark-euca

Folders and files

Latest commit

History

Repository files navigation

spark-euca

Environment setup

Usage

Examples

Details

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages