Skip to content

jobs conda RAPIDS

Rene Richard edited this page Sep 25, 2020 · 2 revisions

This examples will show you how to setup and prepare an environment for RAPID jobs using conda on Trixie:

1. Create a rapids miniconda environment. Copy and paste the next 3 sections in a terminal. Execute individually:

# ONLY USE IF YOU NEED TO REMOVE THE Python ENV (commented for safety) && \
# ==================================================================== && \
#conda deactivate && \
#conda remove --name rapids --all
# Load Modules and setup Python ENV && \
# ================================= && \
# Some of these modules may not be necessary && \
# You will have to do your own testing if you want a minimal && \
# module selection && \
module purge  && \
module load miniconda3-4.8.2-gcc-9.2.0-sbqd2xu && \
module load python-3.8.3-gcc-9.2.0-qxa3ikk && \
module load cuda-10.1.168-gcc-9.2.0-xyykr4t && \
module load gcc-9.2.0-gcc-9.2.0-vzr5o5q && \
module load autoconf-2.69-gcc-9.2.0-qbibewz && \
module load automake-1.16.2-gcc-9.2.0-6m76nfe && \
conda create -n rapids python=3.8  && \
conda activate rapids
# Install RAPIDS in rapids ENV && \
# ============================ && \
conda install -c rapidsai -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.1

2. Create a test RAPIDS python script: test_rapids.py

import cudf
import numpy as np

s = cudf.Series([1,2,3,None,4])
print(s)

df = cudf.DataFrame({'a': list(range(20)),
                     'b': list(reversed(range(20))),
                     'c': list(range(20))
                    })
print(df.head(20))


print()
print()

# Simple UDF
def simple_udf(x):
    if x > 0:
        return x + 5
    else:
        return x - 5


# Should run on GPU
print(df['a'].applymap(simple_udf))

print('... done ...')

3. Create a job submission script: test_rapids.slurm

#!/bin/bash
# This script was modified for RAPIDS testing purposes... 
# Original script : https://github.com/ai4d-iasc/trixie/wiki/Jobs-conda-pytorch
# Specify the partition of the cluster to run on
#SBATCH --partition=JobTesting
# Add your project account code using -A or --account
#SBATCH --account ai4d-bio-04c
# Specify the time allocated to the job. (30 mins just for kicks)
#SBATCH --time=00:30:00
# Request GPUs for the job. In this case 1 GPU
#SBATCH --gres=gpu:1
# Print out the hostname that the jobs is running on
hs=`hostname`
echo -e "host name : $hs"
echo -e "\n"
echo -e "\n"

# Run nvidia-smi to ensure that the job sees the GPUs
/usr/bin/nvidia-smi

echo -e "\n"
echo -e "\n"

# Load the miniconda module on the compute node
module load miniconda3-4.8.2-gcc-9.2.0-sbqd2xu
module load python-3.8.3-gcc-9.2.0-qxa3ikk
module load cuda-10.2.89-gcc-9.2.0-fa5atrg
module load gcc-9.2.0-gcc-9.2.0-vzr5o5q
module load autoconf-2.69-gcc-9.2.0-qbibewz
module load automake-1.16.2-gcc-9.2.0-6m76nfe 
source activate rapids

# Launch our test python file
python test_rapids.py

Output will be 'Submitted batch job XXXXX'

5. Confirm execution results:

Local directory will contain a file 'slurm-XXXXX.out' which is the output of the job (stdout).

Output should be:

host name : cn135




Thu Sep 24 17:00:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   31C    P0    42W / 300W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+




0       1
1       2
2       3
3    <NA>
4       4
dtype: int64

     a   b   c
0    0  19   0
1    1  18   1
2    2  17   2
3    3  16   3
4    4  15   4
5    5  14   5
6    6  13   6
7    7  12   7
8    8  11   8
9    9  10   9
10  10   9  10
11  11   8  11
12  12   7  12
13  13   6  13
14  14   5  14
15  15   4  15
16  16   3  16
17  17   2  17
18  18   1  18
19  19   0  19


0     -5
1      6
2      7
3      8
4      9
5     10
6     11
7     12
8     13
9     14
10    15
11    16
12    17
13    18
14    19
15    20
16    21
17    22
18    23
19    24
Name: a, dtype: int64
... done ...