F.A.Q.#

(Frequently Asked Questions / Foire Aux Questions)


File copying from/to Lab-IA#

If you can log-in into Lab-IA via ssh, then you can use rsync over ssh or scp to transfer files.


Access to Internet from Lab-IA#

The cluster in in a protected network, accessing Internet from within Lab-IA hosts requires to go through a specific proxy host, which supports http and https transfers.

The proxy host is webproxy.lab-ia.fr. You can define environment variables (from within your ~/.bashrc file by example), in lowercase and in uppercase:

export http_proxy=http://webproxy.lab-ia.fr:8080
export https_proxy=http://webproxy.lab-ia.fr:8080
export HTTP_PROXY=http://webproxy.lab-ia.fr:8080
export HTTPS_PROXY=http://webproxy.lab-ia.fr:8080

Be sure to write =http://webproxy.lab-ia.fr:8080 on all lines (not =https...)

Note: on slurm.lab-ia.fr host these environment variables are automatically setup for bash.

For specific applications#

Some applications use their own way to provide proxy settings, see their respective documentation. Examples:

git#

git config --global http.proxy http://webproxy.lab-ia.fr:8080

curl#

curl --proxy http://webproxy.lab-ia.fr:8080 my.page.com

wget#

Use environment variables http_proxy/https_proxy.

pip#

Package installer for Python.

Use environment variables http_proxy/https_proxy or command-line option:

python3 -m pip install --proxy=http://webproxy.lab-ia.fr:8080 [...]

conda#

Package manager for Anaconda and Miniconda.

Use environment variables HTTP_PROXY/HTTPS_PROXY, or .condarc file:

    proxy_servers:
        http: http://webproxy.lab-ia.fr:8080
        https: http://webproxy.lab-ia.fr:8080

Python installation#

We recommend to use miniconda. You will be able to install libraries that you need using its conda package manager, and use dedicated environments. Installing it into your Lab-IA home directory make the installation available to all computing nodes.

This tool is not specific to Lab-IA, you can also install it on your common computers for developments or tests.

Note: Python2 is no longer developed/supported since 1/1/2020. You should strongly go to Python3.


Pytorch installation#

First, install conda and check how to create and activate environments.

Pytorch 2.0.0 and cuda v11.8 installation#

conda create -n pytorch2+cuda118
conda activate pytorch2+cuda118
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Pytorch 1.13.1 and cuda v11.7 installation#

conda create -n pytorch1-13-1+cuda117
conda activate pytorch1-13-1+cuda117
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Pytorch with native Lab-IA cuda v11.2#

conda create -n pytorch
conda activate pytorch
export CUDA_PATH=/usr/local/cuda-11.2
export PATH=$PATH:/usr/local/cuda-11.2/bin
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/targets/x86_64-linux/lib
conda install -c conda-forge gcc gxx cmake python=3.8 pytorch-gpu

If you want to remove an environment, you can use

conda deactivate
conda remove --name YOUR_ENV --all

Tensorflow#

First, install conda and check how to create and activate environments.

Tensorflow v1.12#

conda create -n tf1-12
conda activate tf1-12
conda install tensorflow-gpu==1.12.0

Tensorflow v2.4.1#

conda create -n tf2-4.1
conda activate tf2-4.1
conda install tensorflow-gpu

Need singularity / docker#

Some computing solutions are provided as prepared installations for Docker or Singularity.

These tools need to provide a root access to hosts. They are not installed (and will not be installed) on Lab-IA. You must find an alternative installation way.


Slurm job script.sh failed to execute#

You may have an error like:

slurmstepd: error: execve(): script.sh: Permission denied
srun: error: n51: task 0: Exited with exit code 13

This is a known slurm error. Rename your job script with a name which is not a unix command.


Emails reception at end of jobs#

In your job script you provide your email address, and specify for which events you want an email:

#SBATCH --mail-user=myemail@mydomainname.fr
#SBATCH --mail-type=ALL

Emails will come from sender:

slurm-job.lab-ia at universite-paris-saclay.fr.

Mail type events can be BEGIN, END, FAIL, REQUEUE, STAGE_OUT and ALL (=all previous). They also can be TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and ARRAY_TASKS. Multiple events can be specified with coma separator.


Quota#

Quotas are enforced on Lab-IA so that every user can have enough space to work. There are two kind of quotas :

  • User quotas,
  • Project quotas.

Disk quotas are set on home and projects directories. Each user has an initial 50GB quota on his home directory and a 500GB quota on each projects.

Quotas can be adapted on justified request sent to Lab-IA staff.

User vs Project quota#

Quotas depends on the group ID (gid) of the files.

On Lab-IA, files created in the user home directory belong to the user. Files created in any project directory belong to the project.

If a user copies a file or a folder from his home directory to a project directory, the copied file will belong to the project.

However, if a user moves a file from his home directory to a project directory, the moved file will still belong to the user and count for his quota. To transfer files quota count to the project, the user must change the group ownership of the files :

chgrp -R project_name project_files_or_directories

Cuda, libcudnn, libnccl#

On a node, you can list all available cuda versions with this command

ls -ld /usr/local/cuda*

You will find:

  • /usr/local/cuda → cuda-11.2
  • /usr/local/cuda-11.2
  • /usr/local/cuda-11.1
  • /usr/local/cuda-11.0
  • /usr/local/cuda-11 → cuda-11.2
  • /usr/local/cuda-10.0
  • /usr/local/cuda-10.1
  • /usr/local/cuda-10.2
  • /usr/local/cuda-9.2

If you don't want to use the default version, set your environments variables accordingly.

Nota: Check sub-directories for libcudnn and libnccl libs.


Visual studio#

You can follow this guide


Jupyter Notebook#

You must have Jupyter Notebook installed within your conda Python installation setup, in your account do once (in the example we create a notebook conda environment):

conda create -n notebook
conda install -c conda-forge notebook

Open an interactive session onto a node. You can use sgpu command to check available GPUs:

$ sgpu
LIST OF AVAILABLE GPU PER NODE
NAME | AVAIL. GPU | TOTAL GPU COUNT
(...)
 n53  | 0          | 3
 n54  | 0          | 3
 n55  | 1          | 3
 n101 | 0          | 4
 n102 | 0          | 4

In this example, there's a free GPU on n55. To run a 10 hours interactive bash session on it (eventually from a tmux terminal).

srun --time=10:00:00 --gres=gpu:1 --nodelist=n55 --pty bash

Once slurm selected your job and the bash session has started on the node, activate the conda environment, then start the jupyter notebook (select an unused port in range 1024-65535):

conda activate notebook
jupyter notebook --no-browser --port=8889 --ip=0.0.0.0

When your notebook job is running, setup a tunnel between a port on your computer (here 8888) and the remote notebook process (here n55:8889)

ssh -N -L localhost:8888:n55:8889 LOGIN@slurm.lab-ia.fr

Note: To run this command, your ssh configuration file to must be properly set. Check Getting started - Lab-IA user documentation

Finally, open a browser to the remote process via the tunnel: http://localhost:8888

External references: