From a30e724c5ade26e68a678b29a04b7ffc02a6f65f Mon Sep 17 00:00:00 2001 From: David Jurado Date: Fri, 2 Aug 2024 10:07:35 -0500 Subject: [PATCH 1/2] add minified benchmarks documentation --- docs/minified-benchmarks/3d-unet.md | 58 +++++++++ docs/minified-benchmarks/bert.md | 124 +++++++++++++++++++ docs/minified-benchmarks/introduction.md | 20 +++ docs/minified-benchmarks/llama2.md | 90 ++++++++++++++ docs/minified-benchmarks/object-detection.md | 69 +++++++++++ docs/minified-benchmarks/resnet.md | 51 ++++++++ docs/minified-benchmarks/stable-diffusion.md | 89 +++++++++++++ mkdocs.yml | 8 ++ 8 files changed, 509 insertions(+) create mode 100644 docs/minified-benchmarks/3d-unet.md create mode 100644 docs/minified-benchmarks/bert.md create mode 100644 docs/minified-benchmarks/introduction.md create mode 100644 docs/minified-benchmarks/llama2.md create mode 100644 docs/minified-benchmarks/object-detection.md create mode 100644 docs/minified-benchmarks/resnet.md create mode 100644 docs/minified-benchmarks/stable-diffusion.md diff --git a/docs/minified-benchmarks/3d-unet.md b/docs/minified-benchmarks/3d-unet.md new file mode 100644 index 00000000..971a647a --- /dev/null +++ b/docs/minified-benchmarks/3d-unet.md @@ -0,0 +1,58 @@ +# 3D Unet + +The benchmark reference for 3D Unet can be found in this [link](https://github.com/mlcommons/training/tree/master/retired_benchmarks/unet3d/pytorch), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/695). + +## Project setup + +An important requirement is that you must have Docker installed. + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker +# Fetch the implementation from GitHub +git clone https://github.com/mlcommons/training && cd ./training +git fetch origin pull/695/head:feature/mlcube_3d_unet && git checkout feature/mlcube_3d_unet +cd ./image_segmentation/pytorch/mlcube +``` + +Inside the mlcube directory run the following command to check implemented tasks. + +```shell +mlcube describe +``` + +### MLCube tasks + +Download dataset. + +```shell +mlcube run --task=download_data -Pdocker.build_strategy=always +``` + +Process dataset. + +```shell +mlcube run --task=process_data -Pdocker.build_strategy=always +``` + +Train SSD. + +```shell +mlcube run --task=train -Pdocker.build_strategy=always +``` + +### Execute the complete pipeline + +You can execute the complete pipeline with one single command. + +```shell +mlcube run --task=download_data,process_data,train -Pdocker.build_strategy=always +``` + +## Run a quick demo + +You can run a quick demo that first downloads a tiny dataset and then executes a short training workload. + +```shell +mlcube run --task=download_demo,demo -Pdocker.build_strategy=always +``` diff --git a/docs/minified-benchmarks/bert.md b/docs/minified-benchmarks/bert.md new file mode 100644 index 00000000..8853b310 --- /dev/null +++ b/docs/minified-benchmarks/bert.md @@ -0,0 +1,124 @@ +# Bert + +The benchmark reference for Bert can be found in this [link](https://github.com/mlcommons/training/tree/master/language_model/tensorflow/bert), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/632). + +## Project setup + +An important requirement is that you must have Docker installed. + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker +# Fetch the implementation from GitHub +git clone https://github.com/mlcommons/training && cd ./training/language_model/tensorflow/bert +``` + +Go to mlcube directory and study what tasks MLCube implements. + +```shell +cd ./mlcube +mlcube describe +``` + +### Demo execution + +These tasks will use a demo dataset to execute a faster training workload for a quick demo (~8 min): + +```bash +mlcube run --task=download_demo -Pdocker.build_strategy=always + +mlcube run --task=demo -Pdocker.build_strategy=always +``` + +It's also possible to execute the two tasks in one single instruction: + +```bash +mlcube run --task=download_demo,demo -Pdocker.build_strategy=always +``` + +### MLCube tasks + +Download dataset. + +```shell +mlcube run --task=download_data -Pdocker.build_strategy=always +``` + +Process dataset. + +```shell +mlcube run --task=process_data -Pdocker.build_strategy=always +``` + +Train SSD. + +```shell +mlcube run --task=train -Pdocker.build_strategy=always +``` + +Run compliance checker. + +```shell +mlcube run --task=check_logs -Pdocker.build_strategy=always +``` + +### Execute the complete pipeline + +You can execute the complete pipeline with one single command. + +```shell +mlcube run --task=download_data,process_data,train,check_logs -Pdocker.build_strategy=always +``` + +## TPU Training + +For executing this benchmark using TPU you will need access to [Google Cloud Platform](https://cloud.google.com/), then you can create a project (Note: all the resources should be created in the same project) and after that, you will need to follow the next steps: + +1. Create a TPU node + +In the Google Cloud console, search for the Cloud TPU API page, then click Enable. + +Then go to the virtual machine sections and select [TPUs](https://console.cloud.google.com/compute/tpus) + +Select create TPU node, fill in all the needed parameters, the recommended TPU type in the [readme](../README.md#on-tpu-v3-128) is v3-128 and the recommended TPU software version is 2.4.0. + +The 3 most important parameters you need to remember are: `project name`, `TPU name`, and `TPU Zone`. + +After creating, click on the TPU name to see the TPU details, and copy the Service account (should int the format: ) + +2. Create a Google Storage Bucket + +Go to [Google Storage](https://console.cloud.google.com/storage/browser) and create a new Bucket, define the needed parameters. + +In the bucket list select the checkbox for the bucket you just created, then click on permissions, after that click on add principal. + +In the new principals field paste the Service account from step 1, and then for the roles select, Storage Legacy Bucket Owner, Storage Legacy Bucket Reader and Storage Legacy Bucket Writer. Then click on save, this will allow the TPU to save the checkpoints during training. + +3. Create a VM instance + +The idea is to create a virtual machine instance containing all the code we will execute using MLCube. + +Go to [VM instances](https://console.cloud.google.com/compute/instances), then click on create instance and define all the needed parameters (No GPU needed). + +**IMPORTANT:** In the section Identity and API access, check the option `Allow full access to all Cloud APIs`, this will allow the connection between this VM, the Cloud Storage Bucket and the TPU. + +Start the VM, connect to it via SSH, then use this [tutorial](https://docs.docker.com/engine/install/debian/) to install Docker. + +After installing Docker, clone the repo and install MLCube and follow the to install MLCube, then go to the path: `training/language_model/tensorflow/bert/mlcube` + +There modify the file at `workspace/parameters.yaml` and replace it with your data for: + +```yaml +output_gs: your_gs_bucket_name +tpu_name: your_tpu_instance_name +tpu_zone: your_tpu_zone +gcp_project: your_gcp_project +``` + +After that run the command: + +```shell +mlcube run --task=train_tpu --mlcube=mlcube_tpu.yaml -Pdocker.build_strategy=always +``` + +This will start the MLCube task that internally in the host VM will send a gRPC with all the data to the TPU through gRPC, then the TPU will get the code to execute and the information of the Cloud Storage Bucket data and will execute the training workload. diff --git a/docs/minified-benchmarks/introduction.md b/docs/minified-benchmarks/introduction.md new file mode 100644 index 00000000..39396a97 --- /dev/null +++ b/docs/minified-benchmarks/introduction.md @@ -0,0 +1,20 @@ +# Minified Benchmarks + +## What is a Minified Benchmark? + +A minified benchmark is a reduced version of a MLCommons training benchmark designed to be easily reproduced using MLCube. It simplifies the benchmarking process by scaling down the dataset and training duration, also it has a simple installation and reproduction process. + +The main advantages of these minified benchmarks are: + +**Faster Execution**: Minified benchmarks are quicker to run (between 10 to 15 mintues), allowing for faster iteration and validation. +**Easier implementation**: By using MLCube users don't need to worry about installing everything from scratch. +**Reference preparation**: Minified benchmarks could be used as an introductory step for users interested in executing the MLCommons reference benchmarks. + +## List of Minified Benchmarks + +- [LLama 2](llama2.md) +- [Stable Diffusion](stable-diffusion.md) +- [3D Unet](3d-unet.md) +- [ResNet](resnet.md) +- [Bert](bert.md) +- [Object Detection](object-detection.md) diff --git a/docs/minified-benchmarks/llama2.md b/docs/minified-benchmarks/llama2.md new file mode 100644 index 00000000..4657c217 --- /dev/null +++ b/docs/minified-benchmarks/llama2.md @@ -0,0 +1,90 @@ +# LLama 2 + +The benchmark reference for LLama 2 can be found in this [link](https://github.com/mlcommons/training/tree/master/llama2_70b_lora), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/749). + +This video explains all the following steps: + +[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/1Y9q-nltI8U/0.jpg)](https://youtu.be/1Y9q-nltI8U) + +## Project setup + +An important requirement is that you must have Docker installed. + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker +# Fetch the implementation from GitHub +git clone https://github.com/mlcommons/training && cd ./training +git fetch origin pull/749/head:feature/mlcube_llama2 && git checkout feature/mlcube_llama2 +cd ./llama2_70b_lora/mlcube +``` + +Inside the mlcube directory run the following command to check implemented tasks. + +```shell +mlcube describe +``` + +### Extra requirements + +Install Rclone in your system, by following [these instructions](https://rclone.org/install/). + +MLCommons hosts the model for download exclusively by MLCommons Members. You must first agree to the [confidentiality notice](https://docs.google.com/forms/d/e/1FAIpQLSc_8VIvRmXM3I8KQaYnKf7gy27Z63BBoI_I1u02f4lw6rBp3g/viewform). + +When finishing the previous form, you will be redirected to a Drive folder containing a file called `CLI Download Instructions`, follow the instructions inside that file up to step: `#3 Authenticate Rclone with Google Drive`. + +When finishing this step a configuration file for Rclone will contain the necessary data to download the dataset and models. To check where this file is located run the command: + +```bash + rclone config file + ``` + + **Default:** `~/.config/rclone/rclone.conf` + +Finally copy that file inside the `workspace` folder that is located in the same path as this readme, it must have the name `rclone.conf`. + +### MLCube tasks + +* Core tasks: + +Download dataset. + +```shell +mlcube run --task=download_data -Pdocker.build_strategy=always +``` + +Train. + +```shell +mlcube run --task=train -Pdocker.build_strategy=always +``` + +* Demo tasks: + +Download demo dataset. + +```shell +mlcube run --task=download_demo -Pdocker.build_strategy=always +``` + +Train demo. + +```shell +mlcube run --task=demo -Pdocker.build_strategy=always +``` + +### Execute the complete pipeline + +You can execute the complete pipeline with one single command. + +* Core pipeline: + +```shell +mlcube run --task=download_data,train -Pdocker.build_strategy=always +``` + +* Demo pipeline: + +```shell +mlcube run --task=download_demo,demo -Pdocker.build_strategy=always +``` diff --git a/docs/minified-benchmarks/object-detection.md b/docs/minified-benchmarks/object-detection.md new file mode 100644 index 00000000..71a140ad --- /dev/null +++ b/docs/minified-benchmarks/object-detection.md @@ -0,0 +1,69 @@ +# Object Detection (Maskrcnn) + +The benchmark reference for Object Detection (Maskrcnn) can be found in this [link](https://github.com/mlcommons/training/tree/master/retired_benchmarks/maskrcnn), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/501). + +### Project setup + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker + +# Fetch the Object Detection workload +git clone https://github.com/mlcommons/training && cd ./training +git fetch origin pull/501/head:feature/object_detection && git checkout feature/object_detection +cd ./object_detection/mlcube +``` + +### Dataset + +The COCO dataset will be downloaded and extracted. Sizes of the dataset in each step: + +| Dataset Step | MLCube Task | Format | Size | +|--------------------------------|-------------------|----------------|----------| +| Download (Compressed dataset) | download_data | Tar/Zip files | ~20.5 GB | +| Extract (Uncompressed dataset) | download_data | Jpg/Json files | ~21.2 GB | +| Total | (After all tasks) | All | ~41.7 GB | + +### Tasks execution + +Parameters are defined at these files: + +* MLCube user parameters: mlcube/workspace/parameters.yaml +* Project user parameters: pytorch/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml +* Project default parameters: pytorch/maskrcnn_benchmark/config/defaults.py + +```bash +# Download COCO dataset. Default path = /workspace/data +mlcube run --task=download_data -Pdocker.build_strategy=always + +# Run benchmark. Default paths = ./workspace/data +mlcube run --task=train -Pdocker.build_strategy=always +``` + +### Demo execution + +These tasks will use a demo dataset (39M) to execute a faster training workload for a quick demo (~12 min): + +```bash +# Download subsampled dataset. Default path = /workspace/demo +mlcube run --task=download_demo -Pdocker.build_strategy=always + +# Run benchmark. Default paths = ./workspace/demo and ./workspace/demo_output +mlcube run --task=demo -Pdocker.build_strategy=always +``` + +It's also possible to execute the two tasks in one single instruction: + +```bash +mlcube run --task=download_demo,demo -Pdocker.build_strategy=always +``` + +### Aditonal options + +Parameters defined at **mculbe/mlcube.yaml** could be overridden using: `--param=input` + +We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this: + +```bash +mlcube run ... -Pdocker.build_strategy=always +``` diff --git a/docs/minified-benchmarks/resnet.md b/docs/minified-benchmarks/resnet.md new file mode 100644 index 00000000..b36ee8c5 --- /dev/null +++ b/docs/minified-benchmarks/resnet.md @@ -0,0 +1,51 @@ +# ResNet + +The benchmark reference for ResNet can be found in this [link](https://github.com/mlcommons/training/tree/master/retired_benchmarks/resnet-tf2), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/686). + +## Project setup + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker + +# Fetch the implementation from GitHub +git clone https://github.com/mlcommons/training && cd ./training/image_classification +git fetch origin pull/686/head:feature/resnet_mlcube && git checkout feature/resnet_mlcube +``` + +Go to mlcube directory and study what tasks MLCube implements. + +```shell +cd ./mlcube +mlcube describe +``` + +### MLCube tasks + +For the entire [IMAGENET](https://image-net.org/) dataset, you will need to download the complete dataset and place it in the workspace under the mlcube folder, then you can use the following tasks: + +Process dataset. + +```shell +mlcube run --task=process_data -Pdocker.build_strategy=always +``` + +Train RESNET. + +```shell +mlcube run --task=train -Pdocker.build_strategy=always +``` + +Run compliance checker. + +```shell +mlcube run --task=check_logs -Pdocker.build_strategy=always +``` + +### Running a small demo + +To download the susample dataset and run the demo use the following command: + +```shell +mlcube run --task=download_demo,demo -Pdocker.build_strategy=always +``` diff --git a/docs/minified-benchmarks/stable-diffusion.md b/docs/minified-benchmarks/stable-diffusion.md new file mode 100644 index 00000000..0cfe18bd --- /dev/null +++ b/docs/minified-benchmarks/stable-diffusion.md @@ -0,0 +1,89 @@ +# Stable Diffusion + +The benchmark reference for Stable Diffusion can be found in this [link](https://github.com/mlcommons/training/tree/master/stable_diffusion), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/696). + +This video explains all the following steps: + +[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/Aa__68lX9Ks/0.jpg)](https://youtu.be/Aa__68lX9Ks) + +## Project setup + +An important requirement is that you must have Docker installed. + +```bash +# Create Python environment and install MLCube Docker runner +virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 +pip install mlcube-docker +# Fetch the implementation from GitHub +git clone https://github.com/mlcommons/training && cd ./training +git fetch origin pull/696/head:feature/mlcube_sd && git checkout feature/mlcube_sd +cd ./stable_diffusion/mlcube +``` + +Inside the mlcube directory run the following command to check implemented tasks. + +```shell +mlcube describe +``` + +### MLCube tasks + +* Core tasks: + +Download dataset. + +```shell +mlcube run --task=download_data +``` + +Download models. + +```shell +mlcube run --task=download_models +``` + +Train. + +```shell +mlcube run --task=train +``` + +* Demo tasks: + +Download demo dataset. + +```shell +mlcube run --task=download_demo +``` + +Download models. + +```shell +mlcube run --task=download_models +``` + +Train demo. + +```shell +mlcube run --task=demo +``` + +### Execute the complete pipeline + +You can execute the complete pipeline with one single command. + +* Core pipeline: + +```shell +mlcube run --task=download_data,download_models,train +``` + +* Demo pipeline: + +Tested in an Nvidia A100 (40G) + +```shell +mlcube run --task=download_demo,download_models,demo +``` + +**Note**: To rebuild the image use the flag: `-Pdocker.build_strategy=always` during the `mlcube run` command. diff --git a/mkdocs.yml b/mkdocs.yml index af18e183..a0f2e3c9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,14 @@ nav: - Kubernetes Runner: runners/kubernetes.md - Kubeflow Runner: runners/kubeflow.md - GCP Runner: runners/gcp-runner.md + - Minified Benchmarks: + - Introduction: minified-benchmarks/introduction.md + - LLama 2: minified-benchmarks/llama2.md + - Stable Diffusion: minified-benchmarks/stable-diffusion.md + - 3D Unet: minified-benchmarks/3d-unet.md + - ResNet: minified-benchmarks/resnet.md + - Bert: minified-benchmarks/bert.md + - Object Detection: minified-benchmarks/object-detection.md theme: features: From cebad1c7ceec822953e7d4686b4778b77a3058e9 Mon Sep 17 00:00:00 2001 From: David Jurado Date: Fri, 2 Aug 2024 11:47:27 -0500 Subject: [PATCH 2/2] add minified benchmarks documentation --- docs/minified-benchmarks/introduction.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/minified-benchmarks/introduction.md b/docs/minified-benchmarks/introduction.md index 39396a97..23d0e96f 100644 --- a/docs/minified-benchmarks/introduction.md +++ b/docs/minified-benchmarks/introduction.md @@ -6,9 +6,9 @@ A minified benchmark is a reduced version of a MLCommons training benchmark desi The main advantages of these minified benchmarks are: -**Faster Execution**: Minified benchmarks are quicker to run (between 10 to 15 mintues), allowing for faster iteration and validation. -**Easier implementation**: By using MLCube users don't need to worry about installing everything from scratch. -**Reference preparation**: Minified benchmarks could be used as an introductory step for users interested in executing the MLCommons reference benchmarks. +- **Faster Execution**: Minified benchmarks are quicker to run (between 10 to 15 mintues), allowing for faster iteration and validation. +- **Easier implementation**: By using MLCube users don't need to worry about installing everything from scratch. +- **Reference preparation**: Minified benchmarks could be used as an introductory step for users interested in executing the MLCommons reference benchmarks. ## List of Minified Benchmarks