Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Kenneth Hoste <kenneth.hoste@ugent.be>
  • Loading branch information
xerbalind and boegel authored Aug 14, 2023
1 parent 42d8543 commit fa72b4f
Showing 1 changed file with 26 additions and 24 deletions.
50 changes: 26 additions & 24 deletions mkdocs/docs/HPC/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Welcome to the "Getting Started" guide. This chapter will lead you through the initial steps of logging into the {{hpcinfra}} and submitting your very first job. We'll also walk you through the process step by step using a practical example.

In addition to this chapter, you might find this [video](https://www.ugent.be/hpc/en/training/introhpcugent-recording) to be a useful resource.
In addition to this chapter, you might find the [recording of the *Introduction to HPC-UGent* training session](https://www.ugent.be/hpc/en/training/introhpcugent-recording) to be a useful resource.

Before proceeding, read [the introduction to HPC](introduction.md) to gain an understanding of the {{ hpcinfra }} and its terminology.
Before proceeding, read [the introduction to HPC](introduction.md) to gain an understanding of the {{ hpcinfra }} and related terminology.

### Getting Access

To get access, visit [Getting an HPC Account](account.md).
To get access to the {{hpcinfra}}, visit [Getting an HPC Account](account.md).

If you have not used Linux before,
{%- if site == 'Gent' %}
Expand All @@ -27,22 +27,22 @@ please learn some basics first before continuing. (see [Appendix C - Useful Linu
6. Study the results generated by your jobs, either on the cluster or
after downloading them locally.

We will walk through an illustrative workload to get you started. In this example, our objective is to train a deep learning model for recognizing hand-written digits (MNIST dataset).
See the [example](https://github.com/hpcugent/vsc_user_docs/tree/main/examples/tensorflow_mnist).
We will walk through an illustrative workload to get you started. In this example, our objective is to train a deep learning model for recognizing hand-written digits (MNIST dataset) using [TensorFlow](https://www.tensorflow.org/);
see the [example scripts](https://github.com/hpcugent/vsc_user_docs/tree/main/examples/tensorflow_mnist).

### Getting Connected

There are two options to connect

- Using a terminal to connect via SSH (for power users) (see [First Time connection to the {{ hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure))
- [Using the HPC-Ugent web portal](web_portal.md#using-the-hpc-ugent-web-portal)
- [Using the web portal](web_portal.md#using-the-hpc-ugent-web-portal)

Considering your operating system is **{{OS}}**,

{%- if OS == linux %}
it's recommented to make use of a terminal with ssh to get the most flexibility.
it's recommented to make use of the `ssh` command in a terminal to get the most flexibility.

Assuming you've already generated SSH keys in the previous step ([Getting Access](#getting-access)), you should now be able to login by running the following command:
Assuming you have already generated SSH keys in the previous step ([Getting Access](#getting-access)), and that they are in a default location, you should now be able to login by running the following command:

<pre><code>ssh {{userid}}@{{loginnode}}</code></pre>

Expand All @@ -56,14 +56,14 @@ Assuming you've already generated SSH keys in the previous step ([Getting Acces

{%- else %}
{%- if OS == windows %} it's recommended to use the web portal.
{%- else %} it should be easy to make use of a terminal with ssh, but the web portal will to the trick too: {%- endif %}
{%- else %} it should be easy to make use of the `ssh` command in a terminal, but the web portal will work too: {%- endif %}

This platform offers a convenient way to upload files and gain shell access to the {{hpcinfra}} from a standard web browser (no software installation of configuration required).
This platform offers a convenient way to upload files and gain shell access to the {{hpcinfra}} from a standard web browser (no software installation or configuration required).

See [shell access](web_portal.md#shell-access) when using the web portal, or
[connection to the {{hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure) when using a terminal.

Make sure you can get to a shell before proceeding with the next steps.
Make sure you can get to a shell access to the {{hpcinfra}} before proceeding with the next steps.

{%- endif %}

Expand All @@ -74,17 +74,17 @@ Make sure you can get to a shell before proceeding with the next steps.

### Transfer your files

Now that you can login, it's time to transfer files from your local computer to your **Home Directory** on a cluster.
Now that you can login, it's time to transfer files from your local computer to your **home directory** on the {{hpcinfra}}.

Download [tensorflow_mnist.py](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/examples/tensorflow_mnist/tensorflow_mnist.py)
and [run.sh](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/examples/tensorflow_mnist/run.sh) from the [example](https://github.com/hpcugent/vsc_user_docs/tree/main/examples/tensorflow_mnist) to your computer.
and [run.sh](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/examples/tensorflow_mnist/run.sh) example scripts to your computer (from [here](https://github.com/hpcugent/vsc_user_docs/tree/main/examples/tensorflow_mnist)).

{%- if OS == windows %}

The [HPC-Ugent web portal](https://login.hpc.ugent.be) provides a file browser that allows uploading files.
The [HPC-UGent web portal](https://login.hpc.ugent.be) provides a file browser that allows uploading files.
For more information see the [file browser section](web_portal.md#file-browser).

Upload both files (`run.sh` and `tensorflow-mnist.py`) to your **Home Directory** and go back to your shell.
Upload both files (`run.sh` and `tensorflow-mnist.py`) to your **home directory** and go back to your shell.

!!! Info

Expand Down Expand Up @@ -116,7 +116,7 @@ run.sh tensorflow_mnist.py

!!! Warning

When you do not see these files, make sure you uploaded the files to your **Home Directory**.
When you do not see these files, make sure you uploaded the files to your **home directory**.

### Submitting a job

Expand All @@ -138,7 +138,7 @@ module load TensorFlow/2.11.0-foss-2022a
time python tensorflow_mnist.py

```
<sub>As you can see this job script will run the Python script: **tensorflow_mnist.py**</sub>
<sub>As you can see this job script will run the Python script named **tensorflow_mnist.py**.</sub>


The jobs you submit are per default executed on **cluser/{{defaultcluster}}**, you can swap to another cluster by issuing the following command.
Expand Down Expand Up @@ -168,18 +168,18 @@ This command returns a job identifier (*{{jobid}}*) on the HPC cluster. This is

!!! Warning

Please take note that the module commands exclusively modify environment variables. For instance, using `module swap cluster/{{othercluster}}` will instruct `qsub` to submit the job to the {{othercluster}} cluster,
Note that the module commands only modify environment variables. For instance, running `module swap cluster/{{othercluster}}` will update your shell environment so that `qsub` submits a job to the {{othercluster}} cluster,
but our active shell session is still running on the login node.

It is important to understand that while `module` commands affect your session environment, they do <b style="color:orange">not</b> change where the commands your are running are being executed: they will still be run on the login node you are on.
It is important to understand that while `module` commands affect your session environment, they do ***not*** change where the commands your are running are being executed: they will still be run on the login node you are on.

But when submitting a job script, the commands <b style="color:orange">in</b> the job script will be run on a workernode of the cluster the job was submitted to (like `{{othercluster}}`).
When you submit a job script however, the commands ***in*** the job script will be run on a workernode of the cluster the job was submitted to (like `{{othercluster}}`).

For detailed information about `module` commands, read the [running batch jobs](running_batch_jobs.md) chapter.

### Wait for job to be executed

Your job is first put into a queue before being executed, so it can take a while.
Your job is put into a queue before being executed, so it may take a while before it actually starts.
(see [when will my job start?](running_batch_jobs.md#when-will-my-job-start) for scheduling policy)

You can get an overview of the active jobs using the `qstat` command:
Expand Down Expand Up @@ -219,8 +219,8 @@ By default located in the directory where you issued `qsub`.

In our example when running <code>ls</code> in the current directory you should see 2 new files:

- **run.sh.o{{jobid}}**: normal output messages
- **run.sh.e{{jobid}}**: error and warning messages
- **run.sh.o{{jobid}}**, containing normal output messages produced by job {{jobid}};
- **run.sh.e{{jobid}}**, containing error and warning messages produced by job {{jobid}}.

!!! Info

Expand Down Expand Up @@ -248,7 +248,9 @@ Hurray 🎉, we trained a deep learning model and achieved 97,64 percent accurac

!!! Warning

You should run tensorflow calculations on GPUs on a GPU cluster for better performance, see [GPU clusters](gpu.md).
When using TensorFlow specifically, you should actually submit jobs to a GPU cluster for better performance, see [GPU clusters](gpu.md).

For the purpose of this example, we are running a very small TensorFlow workload on a CPU-only cluster.

### Next steps

Expand Down

0 comments on commit fa72b4f

Please sign in to comment.