-
Notifications
You must be signed in to change notification settings - Fork 3
jobs conda jupyterlab
ppaul72 edited this page Jun 24, 2020
·
9 revisions
At the end of this tutorial you will have:
- created a "env" directory inside a project directory with:
- an environment running keras using CPU(s)
- an environment running keras using GPU(s)
- defined 2 slurm jobs starting jupyter making use of respective environment
#load miniconda module
module load miniconda3-4.7.12.1-gcc-9.2.0-j2idqxp
#define a project variable
export MY_PROJECT_ROOT=$HOME/sample_project
#go to project root
cd $MY_PROJECT_ROOT
#create a directory for all the environments
mkdir env
#create the CPU variant
conda create -c conda-forge -p $MY_PROJECT_ROOT/env/Covid-Net-cpu python=3 jupyterlab imutils opencv
matplotlib keras scikit-learn pandas
#create the GPU variant
conda create -c conda-forge -p $MY_PROJECT_ROOT/env/Covid-Net-gpu python=3 jupyterlab imutils opencv
matplotlib keras scikit-learn pandas tensorflow-gpu
#create a directory to hold jobfiles
mkdir $MY_PROJECT_ROOT/jobs
#!/bin/bash -l
#SBATCH -p TrixieMain
#SBATCH --time=04:00:00 ####MAXIMUM 48:00:00 on Trixie
#SBATCH --job-name=My_Awesome_Jupyter.cpu ####Try to be a bit descriptive or use the comment if you prefer shorter job
names
##SBATCH --comment="Comment on job" ####Optional comment
#SBATCH --mem=5G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --output=%x-%j.out
##### To help debugging
#set -x
export MY_PROJECT_ROOT=$HOME/sample_project
module load miniconda3-4.7.12.1-gcc-9.2.0-j2idqxp
source activate $MY_PROJECT_ROOT/env/Covid-Net-cpu
jupyter-lab --ip=*
#!/bin/bash -l
#SBATCH -p TrixieMain
#SBATCH --gres=gpu:1
#SBATCH --time=04:00:00 ####MAXIMUM 48:00:00 on Trixie
#SBATCH --job-name=My_Awesome_Jupyter.gpu ####Try to be a bit descriptive or use the comment if you prefer shorter job
names
##SBATCH --comment="Comment on job" ####Optional comment
#SBATCH --mem=5G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --output=%x-%j.out
##### To help debugging
#set -x
export MY_PROJECT_ROOT=$HOME/sample_project
module load miniconda3-4.7.12.1-gcc-9.2.0-j2idqxp
source activate $MY_PROJECT_ROOT/env/Covid-Net-gpu
jupyter-lab --ip=*
#starting a GPU enabled Jupyter notebook
sbatch $MY_PROJECT_ROOT/jobs/jupyter-gpu.job
Submitted batch job 4502
#you will get a jobid look for your log file names based on the job-name and the jobid
tail -f My_Awesome_Jupyter.gpu-4502.out
#look for the Jupyter notebook output you will get the node it is running on and the port number it is
listening on. for example:
#[...]
To access the notebook, open this file in a browser:
file:///gpfs/home/paulp/.local/share/jupyter/runtime/nbserver-24429-open.html
Or copy and paste one of these URLs:
http://cn122:8888/?token=388199bb6ef0cad54ef195f1286301548fc15ec2a39eee3c
or http://127.0.0.1:8888/?token=388199bb6ef0cad54ef195f1286301548fc15ec2a39eee3c
#At that moment a GPU enabled Jupyter notebook is running on compute node "cn122" on port 8888
#Using an ssh tunnel 8888:cn122:8888 would allow you to access the remote jupyter notebook by connecting to http://127.0.0.1:8888/?token=388199bb6ef0cad54ef195f1286301548fc15ec2a39eee3c
#Don't forget to release resources when done by canceling your job
scancel 4502
import tensorflow as tf
#display tensorflow version
print(tf.__version__)
from tensorflow.python.client import device_lib
#output tensorflow devices
print(device_lib.list_local_devices())
💡💡💡
- The Terminal within jupyter-Lab runs within your job so it is safe to use to perform monitoring or other tasks that could be bothersome to others if ran on the head node. If you intend to use a lot of resource like this consider raising how many cores/cpus you request and maybe the RAM also...
- a nice one liner to run a monitoring task for the GPU, this command will run indefinately each second and output the listed parameters.
nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
💡💡💡