Skip to content

Latest commit

 

History

History
executable file
·
64 lines (46 loc) · 3.46 KB

General.md

File metadata and controls

executable file
·
64 lines (46 loc) · 3.46 KB

General Instruction on DL Workspace deployment.

These are the general steps to deploy a DL Workspace cluster.

  1. [Run Once] Setup development environment.

  2. Configuration the cluster, and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to Backup/Restore on instruction to backup/restore cluster configuration.

  3. Configure and setup the databased used in the cluster.

  4. Config shared file system to be used in the cluster, following instructions in Storage.md and the configuration.

  5. [Build]: build credentials.

python deploy.py -y build 
  1. Deploy base OS via USB, PXE server, on Azure or on a private Philly cluster. 1 If you would like to deploy a small cluster for testing, or your cluster doesn't have a VLan setup, we recommend the deployment procedure in USB.md.

    • If you would like to deply a production procedure, we recommend to set up a VLan for your cluster, and use a PXE server. The precedure are described in PXEServer.md.
    • If you would like to deploy a cluster on Azure, please follow the procedure in Azure
    • If you are using a private CoreOS cluster, please follow the procedure in CoreOS. If we plan to install a small number of machines (say 1-2), you may use ISO image. For deploying any larger cluster, PXE server is highly recommended.
  2. Start master and etcd servers. Please use '-public' option if you run command inside firewall, while the cluster is public (e.g., Azure, AWS).

    deploy.py -y deploy
    
  3. Start worker nodes. Please use '-public' option if you run command inside firewall, while the cluster is public (e.g., Azure, AWS).

    deploy.py -y updateworker
    

    If you stop here, you will have a fully functional kubernete cluster. Thus, part of DL Workspace setup can be considered automatic procedure to setup a kubernete cluster. You don't need shared file system or database for kubernete cluster operation.

  4. Static IP: Static IP/DNS name are strongly recommended for master and Etcd server, especially if you desire High Availability (HA) operation. Please contact your IT department to setup static IP for the master and Etcd server. With static IP, the DL workspace can operate uninterruptedly.

    Otherwise, each time master and Etcd server has been rebooted (the master and Etcd servers may obtain a new IP addresses), you will need to restart master, etcd and work nodes by repeating steps of 4 and 5.

  5. Set hostname of the cluster.

    deploy.py -y hostname set
    
  6. label nodes, so that DL workspace service can be deployed to the proper set of nodes.

    deploy.py -y kubernetes labels
    
  7. Build and mount shared file system

  8. Build and push docker images which are used by the cluster, start webUI service.

    deploy.py webui
    deploy.py docker push restfulapi
    deploy.py docker push webui
    deploy.py -y kubernetes start webportal
    deploy.py -y kubernetes start restfulapi
    deploy.py -y kubernetes start jobmanager
    
  9. If encounter problems, please check on known issues.