How can I run code on IBM Cloud Watson Studio?
Here are the setup steps to to use the Machine Learing (ML) service. NB: These steps use a Command Line Interface (CLI). There is an alternative browser used interface
What we will do as a one time Setup:
- Confirm that you have an IBM Cloud userid
- Download CLI tools to access and manage resources in the IBM Cloud
- Login thru the CLI to your IBM Cloud account
- Configure your IBM Cloud account.
- Create a Watson ML Instance
- Define a Cloud Object Storage Instance to store your data.
Goto https://console.bluemix.net/ and login (If you are part of the IBM-MIT AI lab, but do NOT have a valid account, please contact noor.fairoza@ibm.com)
'bx' allows you to start and manage resources (e.g., applications, containers, services, ...) in the IBM cloud.
Download bx CLI and install it, following the instructions for your local machine operating system (OSX, Linux or Windows)
1.2. Install awscli
using pip
The aws CLI lets you setup and upload data to your buckets. (Will get to this later)
pip install awscli
The machine-learning
plugin for bx
lets you start, view, and stop your Machine Learning jobs on Watson.
bx plugin install machine-learning
Now we will create data and service resources in the IBM Cloud. First we login.
bx login
In order to run jobs on Watson, you need an organization
(also called org
) and a space
to hold your jobs.
Org
names are also globally unique.
The account owner should have already created an org
for you (and others) to share assets.
You can find out the organizations available for you with the command:
bx account orgs
The command will return something like:
Name Region Account owner Account ID Status
MITIBMWatsonAiLab us-south ailab@us.ibm.com 5eb998dd20e3d7fc0153329e32362d64 active
Select the correct Name
and save it in a variable, i.e.,
org_name="MITIBMWatsonAiLab"
bx target -o $org_name
Now lets find out the name of the space
for you under the org
.
bx account spaces
This will return something like:
Getting spaces under organization MITIBMWatsonAiLab in region us-south as myuserid@mit.edu...
OK
Name
dev
Select the correct space Name
and save it in a variable, e.g.,
space_name="dev"
bx target -o $org_name -s $space_name
bx service create pm-20 lite CLI_WML_Instance
bx service key-create CLI_WML_Instance cli_key_CLI_WML_Instance
instance_id=`bx service key-show CLI_WML_Instance cli_key_CLI_WML_Instance | grep "instance_id"| awk -F": " '{print $2}'| cut -d'"' -f2`
username=`bx service key-show CLI_WML_Instance cli_key_CLI_WML_Instance | grep "username"| awk -F": " '{print $2}'| cut -d'"' -f2`
password=`bx service key-show CLI_WML_Instance cli_key_CLI_WML_Instance | grep "password"| awk -F": " '{print $2}'| cut -d'"' -f2`
url=`bx service key-show CLI_WML_Instance cli_key_CLI_WML_Instance | grep "url"| awk -F": " '{print $2}'| cut -d'"' -f2`
export ML_INSTANCE=$instance_id
export ML_USERNAME=$username
export ML_PASSWORD=$password
export ML_ENV=$url
A bucket is a huge "folder" in the cloud. You use the bucket to put and get any file or folder (e.g., your datasets) using an api-style interface.
First, lets create your own personal cloud storage instance to hold your bucket(s) and name the instance my_instance
.
bx resource service-instance-create "my_instance" cloud-object-storage standard global
bx resource service-instance "my_instance"
We then create and get the credentials to my_instance
and naming it my_cli_key
so that you can create and access your bucket.
Create key, store it and print it:
bx resource service-key-create "my_cli_key" Writer --instance-name "my_instance" --parameters '{"HMAC":true}' > /dev/null 2>&1
access_key_id=`bx resource service-key my_cli_key | grep "access_key_id"| cut -d\: -f2`
secret_access_key=`bx resource service-key my_cli_key | grep "secret_access_key"| cut -d\: -f2`
echo ""; echo "Credentials:"; echo "access_key_id - $access_key_id"; echo "secret_access_key - $secret_access_key"; echo ""
Use aws
tool to add access_key_id
and secret_access_key
to a profile and name it my_profile
(leave the other fields as None).
aws configure --profile my_profile
export MY_BUCKET_KEY = access_key_id
export MY_BUCKET_SECRET_KEY = secret_access_key
Now, lets make a bucket and name it something unique! Buckets are named globally, which means that only one IBM Cloud account can have a bucket with a particular name. **NB: the bucket names may not contain upper-case, underscores, dashes, periods, etc. Just use simple text, e.g., below we call the bucket "mybucket".
bucket_name="mybucket"
aws --endpoint-url=http://s3-api.us-geo.objectstorage.softlayer.net s3api create-bucket --bucket $bucket_name --profile my_profile 2>&1
Now, to test that your setup is working, lets try a simple model.
For example, lets get the cifar10 dataset and do a little trainning.. You can get this dataset from the Internet, e.g., by doing:
mkdir cifar10
cd cifar10
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar xvf cifar-10-python.tar.gz
rm cifar-10-python.tar.gz
aws --endpoint-url=https://s3-api.us-geo.objectstorage.softlayer.net --profile my_profile s3 cp cifar10/ s3://$bucket_name/cifar10 --recursive
This yaml file should hold all the information needed for executing the job, including what bucket, ml framework, and computing instance to use.
cp pytorch-cifar-template.yml my-pytorch-cifar.yml
Add your author info and replace the values of aws_access_key_id
, aws_secret_access_key
, and bucket
in my-pytorch-cifar.yml
with your storage instance credentials and your chosen bucket name.
This should be done for both the data input reference (e.g., training_data_reference
) and the output reference (e.g., training_results_reference
). Notice that you may use the same bucket for both input and output, but this is not required.
model_definition:
framework:
#framework name and version (supported list of frameworks available at 'bx ml list frameworks')
name: pytorch
version: 0.3
#name of the training-run
name: MYRUN
#Author name and email
author:
name: JOHN DOE
email: JOHNDOE@MIT.EDU
description: This is running cifar training on multiple models
execution:
#Command to execute -- see script parameters in later section !!
command: python3 main.py --cifar_path ${DATA_DIR}/cifar10
--checkpoint_path ${RESULT_DIR} --epochs 10
compute_configuration:
#Valid values for name - k80/k80x2/k80x4/p100/p100x2/v100/v100x2
name: k80
training_data_reference:
name: training_data_reference_name
connection:
endpoint_url: "https://s3-api.us-geo.objectstorage.service.networklayer.com"
aws_access_key_id: < YOUR SAVED ACCESS KEY >
aws_secret_access_key: < YOUR SAVED SECRET ACCESS KEY >
source:
bucket: < mybucketname >
type: s3
training_results_reference:
name: training_results_reference_name
connection:
endpoint_url: "https://s3-api.us-geo.objectstorage.service.networklayer.com"
aws_access_key_id: < YOUR SAVED ACCESS KEY >
aws_secret_access_key: < YOUR SAVED SECRET ACCESS KEY >
target:
bucket: < mybucketname >
type: s3
Notice that under execution
in the yaml file, we specified a command that will be executed
when the job starts execution at the server.
python3 main.py --cifar_path ${DATA_DIR}/cifar10
--checkpoint_path ${RESULT_DIR} --epochs 10
This command will execute main.py
, which starts a training run of a specified model.
Since no model is specified, it will train the default model, vgg16
,
for 10 epochs using the dataset that we uploaded to the bucket.
zip model.zip main.py models/*
bx ml train model.zip pytorch-cifar.yml
That's it! The command should generate a training ID for you, meaning our model has started training on Watson!
bx ml list training-runs
bx ml monitor training-runs < trainingID >
As training proceeds, you should see results from the training process being copied to the results bucket specified in your training job yaml
file - training_results_references.target bucket
.
You can also inspect the status of training by downloading and viewing the training log file which has been copied to the result bucket. This is useful in debugging errors and failed jobs.
aws --endpoint-url=https://s3-api.us-geo.objectstorage.softlayer.net --profile my_profile s3 cp s3://my_bucket/ < trainingID Rig >/learner-1/training-log.txt -
Content derived from material provided by Hendrik Strobelt (IBM Research), Evan Phibbs (IBM Research), Victor C. Dibia (IBM Research)