Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-prem Single-Ubuntu #193

Open
shizhaojingszj opened this issue Mar 31, 2018 · 6 comments
Open

On-prem Single-Ubuntu #193

shizhaojingszj opened this issue Mar 31, 2018 · 6 comments

Comments

@shizhaojingszj
Copy link

shizhaojingszj commented Mar 31, 2018

Hi guys, I want to test DLworkspace on my ubuntu server WITHOUT connecting with any azure cluster thing.

So I am following SingleUbuntu on-prem this document. Unfortunately I am stuck at the last line.

./deploy.py --verbose scriptblocks ubuntu_uncordon

My ubuntu server has the same setup as indicated in the document:

  1. ubuntu 16.04 x64

Things I have done:

  1. run src/ClusterBootstrap/install_prerequisites.sh as mentioned in DevEnvironment/Readme.md.
  2. I have manually installed docker-ce, build an new GPU-favored kubernetes binary and add the folder to PATH, also DevEnvironment/Readme.md.
  3. I create a pair of ssh-keys, and manually put them inside ./src/ClusterBootstrap/deploy/sshkey folder
  4. I also setup the mssqlserver docker container to run on my Ubuntu server at the port and setup authentication as shown in my config.yaml (docker-compose run db, I am assuming DLworkspace will find the database and setup everything it needs)
  5. I setup an email account at outlook.com and put info inside my config.yaml, follow Auth

Now my whole config.yaml file looks like this:
image

Now I keep getting the error shown below:
image

It looks like kubernetes cluster is not running. So what should I do to get it work on my own server?

@shizhaojingszj
Copy link
Author

Several things to clarify:

  1. My ubuntu server has domain-name GPU2
  2. cluster_name and clusterId are randomly chosen

@jinlccs
Copy link
Contributor

jinlccs commented Apr 3, 2018

Please uncomment:

useclusterfile : true

and try again.

@shizhaojingszj
Copy link
Author

shizhaojingszj commented Apr 5, 2018

Thanks for your suggestion, I have uncommented the line and tried again, but I keep getting another error: "curl error" as shown in this gist

Besides this curl error, I also found that my docker service was not working after running the deploy.py, when I tried to manually start it, the error message was "Failed to start docker.service: Unit flanneld.service not found."

I cannot get docker service work again, unless I reboot the Ubuntu server. In that gist mensioned above, I'm pretty sure my docker service is running before running deploy.py.

Any suggestions?

@hongzhili
Copy link
Member

@shizhaojingszj
Please make sure the flannel service is not required by docker service.
BTW, which version of codes are you using? It seems your codes are not up-to-date.

@shizhaojingszj
Copy link
Author

shizhaojingszj commented Apr 15, 2018

MY BIG MISTAKE I thought I'm using this repo but actually using another repo master branch, I'm looking forward to retrying the whole thing once the following problem is fixed.

"Please make sure the flannel service is not required by docker service."

** I don't know how to restore my docker service to stop making flanneld as a dependency. Could you explain more details?**

  1. I have tried several times to reinstall docker-ce, but failed everytime.
    image

When try to restart docker service
I keep getting this
image

But after a reboot, docker run --rm -it hello-world looks fine, but failed after running deploy.py again.

@shizhaojingszj
Copy link
Author

https://gist.github.com/shizhaojingszj/00ac3f7d57ce0b0aaf1886a906f5185c

The above gist showed my docker info results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants