Skip to content

Commit

Permalink
Merge pull request #412 from uvarc/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
rsdmse authored Aug 23, 2023
2 parents bc51f3d + d5fc089 commit 22509c3
Show file tree
Hide file tree
Showing 10 changed files with 10 additions and 45 deletions.
2 changes: 1 addition & 1 deletion content/userinfo/computing-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ aliases = [ "/facilities" ]

# Rivanna

The primary vehicle for high-performance computing since 2014 has been the Rivanna cluster. Rivanna is a heterogenous system with a total of {{< rivanna-node-count >}} nodes and {{< rivanna-core-count >}} cpu cores. It consists of 527 nodes with 20-40 cores and 128-768GB of RAM each, 11 large memory nodes with 16-48 cores and 1-1.5TB of RAM each, and 34 nodes with a total of 228 NVIDIA GPU accelerators (K80, P100, V100, A100, RTX2080Ti, RTX3090). These nodes are partitioned for various types of workloads, but include development, parallel, HTC and instructional partitions. All nodes are supported by a high-performance EDR/FDR Infiniband network using Mellanox hardware. The Rivanna cluster also provides approximately {{< rivanna-scratch-capacity >}} of scratch (temporary) storage on a high-speed Lustre filesystem. Users may also lease space on “Reseaarch Project” and “Research Standard” storage that is mounted to Rivanna, as well as elsewhere.
The primary vehicle for high-performance computing since 2014 has been the Rivanna cluster. Rivanna is a heterogenous system with a total of {{< rivanna-node-count >}} nodes and {{< rivanna-core-count >}} cpu cores. It consists of 527 nodes with 20-40 cores and 128-768GB of RAM each, 11 large memory nodes with 16-48 cores and 1-1.5TB of RAM each, and 22 nodes with a total of 148 NVIDIA GPU accelerators (A100, V100, RTX3090, RTX2080Ti). These nodes are partitioned for various types of workloads, but include development, parallel, HTC and instructional partitions. All nodes are supported by a high-performance EDR/FDR Infiniband network using Mellanox hardware. The Rivanna cluster also provides approximately {{< rivanna-scratch-capacity >}} of scratch (temporary) storage on a high-speed Lustre filesystem. Users may also lease space on “Reseaarch Project” and “Research Standard” storage that is mounted to Rivanna, as well as elsewhere.

Rivanna is allocated by service units (SUs) and is managed under the “hotel” model in which researchers buy SUs rather than physical nodes. Service units generally correspond to core-hours, but in the case of very large memory jobs users must pay at least partially for cores not scheduled by the resource manager on that node due to the additional memory usage. We have also begun implementing a partial “condo” model, in which researchers can purchase time tied to cores or can purchase their own hardware to add to the cluster. This model coexists with the SU model for users who do not wish to make a large expenditure for time, or who can successfully manage their projects through free allocations from RC or from their Dean’s office.

Expand Down
4 changes: 2 additions & 2 deletions content/userinfo/rivanna/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,15 +486,15 @@ The following example runs a total of 32 MPI processes, 8 on each node, with eac

## GPU Computations

The `gpu` queue provides access to compute nodes equipped with RTX2080Ti, RTX3090, K80, P100, V100, and A100 NVIDIA GPU devices.
The `gpu` queue provides access to compute nodes equipped with RTX2080Ti, RTX3090, V100, and A100 NVIDIA GPU devices.

{{< highlight >}}
In order to use GPU devices, the jobs must to be submitted to the <b>gpu</b> partition and must include the <b>--gres=gpu</b> option.</alert>
{{< /highlight >}}

{{< pull-code file="/static/scripts/gpu_job.slurm" lang="no-hightlight" >}}

The second argument to `gres` can be `rtx2080`, `rtx3090`, `k80`, `p100`, `v100`, or `a100` for the different GPU architectures. The third argument to `gres` specifies the number of devices to be requested. If unspecified, the job will run on the first available GPU node with a single GPU device regardless of architecture.
The second argument to `gres` can be `rtx2080`, `rtx3090`, `v100`, or `a100` for the different GPU architectures. The third argument to `gres` specifies the number of devices to be requested. If unspecified, the job will run on the first available GPU node with a single GPU device regardless of architecture.

### NVIDIA GPU BasePOD™ Now Available for Rivanna Users

Expand Down
2 changes: 1 addition & 1 deletion content/userinfo/rivanna/software/julia.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ Hello! I am 0 of 8 on udc-ba34-10c8

[General guidelines on requesting GPUs on Rivanna](https://www.rc.virginia.edu/userinfo/rivanna/slurm/#gpu-intensive-computation)

The following slurm script is for submitting a Julia job that uses 1 of the K80 GPUs. For each GPU requested, the script requests one cpu (ntasks-per-node). The article [An Introduction to GPU Programming in Julia](https://nextjournal.com/sdanisch/julia-gpu-programming) provides more details to get started.
The following slurm script is for submitting a Julia job that uses 1 GPU. For each GPU requested, the script requests one cpu (ntasks-per-node). The article [An Introduction to GPU Programming in Julia](https://nextjournal.com/sdanisch/julia-gpu-programming) provides more details to get started.

{{< pull-code file="/static/scripts/julia_gpu.slurm" lang="no-hightlight" >}}

Expand Down
2 changes: 1 addition & 1 deletion content/userinfo/rivanna/software/matlab.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ end

Once your job has been granted its allocated GPUs, you can use the gpuDevice function to initialize a specific GPU for use with Matlab functions that can utilize the architecture of GPUs. For more information see the [MathWorks documentation](https://www.mathworks.com/help/parallel-computing/gpu-computing-in-matlab.html) on GPU Computing in Matlab.

The following slurm script is for submitting a Matlab job that uses 4 of the K80 GPUs in a `parfor` loop. For each GPU requested, the script requests one cpu (ntasks-per-node).
The following slurm script is for submitting a Matlab job that uses 4 GPUs in a `parfor` loop. For each GPU requested, the script requests one cpu (ntasks-per-node).

{{< pull-code file="/static/scripts/matlab_gpu.slurm" lang="no-hightlight" >}}

Expand Down
4 changes: 1 addition & 3 deletions content/userinfo/rivanna/software/nvhpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@ Please use the following values when compiling CUDA code on Rivanna.

| Type | GPU | Architechture | Compute Capability | CUDA Version |
| --- | --- | --- | --- | --- |
| Data Center |K80 | Kepler | 3.7 | 5 - 11 |
| |P100 | Pascal | 6.0 | 8+ |
| |V100 | Volta | 7.0 | 9+ |
| Data Center |V100 | Volta | 7.0 | 9+ |
| |A100 | Ampere | 8.0 | 11+ |
| GeForce |RTX2080Ti | Turing | 7.5 | 10+ |
| |RTX3090 | Ampere | 8.6 | 11+ |
Expand Down
16 changes: 1 addition & 15 deletions content/userinfo/rivanna/software/pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,6 @@ module spider {{% module-firstversion %}}

{{< module-versions >}}

## Compatibility Issues

### A100
Versions 1.6 and older are not compatible with the A100 GPU. Deprecated containers are hosted in `/share/resources/containers/singularity/archive`. You may continue to use them on other GPUs by excluding the A100 via the Slurm option
{{< code-snippet >}}
-x udc-an28-[1,7],udc-an34-[1,7,13,19],udc-an36-[1,13,19],udc-an37-[1,7,13,19]
{{< /code-snippet >}}

### K80
Version 1.8.1 is not compatible with the K80 GPU. You may use it on other GPUs by excluding all K80s via the Slurm option
{{< code-snippet >}}
-x udc-ba25-2[3,7,8],udc-ba26-2[3-6],udc-ba27-2[3-4]
{{< /code-snippet >}}

# PyTorch Jupyter Notebooks
Jupyter Notebooks can be used for interactive code development and execution of Python scripts and several other codes. PyTorch Jupyter kernels are backed by containers in the corresponding modules.

Expand All @@ -64,7 +50,7 @@ Jupyter Notebooks can be used for interactive code development and execution of
To start a JupyterLab session, fill out the resource request webform. To request access to a GPU, verify the correct selection for the following parameters:

1. Under Rivanna Partition, choose "GPU".
2. Under Optional GPU Type, choose "NVIDIA K80", "NVIDIA P100", "NVIDIA V100", "NVIDIA A100", "NVIDIA RTX2080", "NVIDIA RTX3090" or leave it as "default".
2. Under Optional GPU Type, choose "NVIDIA V100", "NVIDIA A100", "NVIDIA RTX2080", "NVIDIA RTX3090" or leave it as "default".
Click `Launch` to start the session.

## Editing and Running the Notebook
Expand Down
19 changes: 0 additions & 19 deletions content/userinfo/rivanna/software/rapidsai.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,22 +39,3 @@ module spider {{% module-firstversion %}}
```

{{< module-versions >}}

# Exclude K80 GPU nodes

RAPIDS requires compute capability 6.0+, which means it cannot work on K80 nodes. To exclude them from JupyterLab,
fill out the form as you normally would and under
```
Optional: Slurm Option
```
put
{{< code-snippet >}}
-x udc-ba25-2[3,7,8],udc-ba26-2[3-6],udc-ba27-2[3-4]
{{< /code-snippet >}}

If you are using a Slurm script, add this line:
{{< code-snippet >}}
#SBATCH -C "p100|v100|rtx2080|rtx3090|a100"
{{< /code-snippet >}}

(The constraint argument requires the `"` character which is not yet supported on the JupyterLab form.)
2 changes: 1 addition & 1 deletion content/userinfo/rivanna/software/tensorflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Jupyter Notebooks can be used for interactive code development and execution of
To start a JupyterLab session, fill out the resource request webform. To request access to a GPU, verify the correct selection for the following parameters:

1. Under Rivanna Partition, choose "GPU".
2. Under Optional GPU Type, choose "NVIDIA K80", "NVIDIA P100", "NVIDIA V100", "NVIDIA A100", "NVIDIA RTX2080", "NVIDIA RTX3090", or leave it as "default".
2. Under Optional GPU Type, choose "NVIDIA V100", "NVIDIA A100", "NVIDIA RTX2080", "NVIDIA RTX3090", or leave it as "default".
3. Click `Launch` to start the session.

Review our [Jupyer Lab documentation](/userinfo/rivanna/software/jupyterlab) for more details..
Expand Down
2 changes: 1 addition & 1 deletion static/scripts/gpu_job.slurm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
#SBATCH -A mygroup
#SBATCH --partition=gpu
#SBATCH --gres=gpu:p100:1
#SBATCH --gres=gpu:1
#SBATCH --ntasks=1
#SBATCH --time=12:00:00

Expand Down
2 changes: 1 addition & 1 deletion static/scripts/julia_gpu.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@
echo 'slurm allocates gpus ' $CUDA_VISIBLE_DEVICES

module purge
module load julia/1.5.0 cuda/10.2.89 cudatoolkit/10.1.168-py3.6
module load julia/1.5.0 cuda/10.2.89

julia gpuTest1.jl

0 comments on commit 22509c3

Please sign in to comment.