05-Containers

The use of containers is very convenient while working in HPC environments. Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.

In this tutorial we are going to make use of containers that meet our needs in terms of software distribution. These can be run in the UI or in the Worker Nodes by means of HTCondor submission.

Note that we are using the former sintaxis for Apptainer, Singularity, for the sake of clarity with the HTCondor submission description files. But, as specified here, everything in this document that pertains to Singularity also is true for the Apptainer container runtime.

The current version installed in Artemisa is:

$ singularity --version
apptainer version 1.1.9-1.el7

Preparation

One of the main advantages while working with containers is the possibility of using already available containers. We need the required space to download and unpack the container, so we build the necessary directory structure and redirect tmp.

$ mkdir new_tmp
$ mkdir sing_image
$ mkdir sing_cache
$ export TMPDIR=`pwd`/new_tmp
$ export SINGULARITY_CACHEDIR=`pwd`/sing_cache

Then download and build the container

$ cd sing_image
$ singularity build tensorflow-latest-gpu  docker://tensorflow/tensorflow:latest-gpu
INFO:    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Starting build...
Getting image source signatures
Copying blob de96f27d9487 done
Copying blob 20d547ab5eb5 done
Copying blob aa315b7808f0 done
Copying blob 56e0351b9876 done
...

Run container in UI

We can run a program in the container making use of the UI GPU. We are going to use the same python file (simple_convnet.py) that implements the task from the previous tutorial. Before executing the command, make sure that you are in the correct pwd

$ ls
new_tmp  simple_convnet.py  sing_cache  sing_image
$ gpurun singularity run --nv -c -H $PWD:/home sing_image/tensorflow-latest-gpu python3 simple_convnet.py
First nonoption argument is "singularity" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:-4 Running:2 Estimated waiting time:-1500 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: singularity
INFO:    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (452) bind mounts
2023-09-21 10:59:09.710173: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
2023-09-21 10:59:15.871153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
Model: "sequential"
_________________________________________________________________
Layer (type)                Output Shape              Param #
=================================================================
conv2d (Conv2D)             (None, 26, 26, 32)        320

max_pooling2d (MaxPooling2  (None, 13, 13, 32)        0
D)

conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496

max_pooling2d_1 (MaxPoolin  (None, 5, 5, 64)          0
g2D)

flatten (Flatten)           (None, 1600)              0

dropout (Dropout)           (None, 1600)              0

dense (Dense)               (None, 10)                16010

=================================================================
Total params: 34826 (136.04 KB)
Trainable params: 34826 (136.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/15
2023-09-21 10:59:17.641533: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600
2023-09-21 10:59:19.305949: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6d53b90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-09-21 10:59:19.306147: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2023-09-21 10:59:19.489174: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-21 10:59:20.503596: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
422/422 [==============================] - 8s 5ms/step - loss: 0.3692 - accuracy: 0.8871 - val_loss: 0.0802 - val_accuracy: 0.9785
Epoch 2/15
422/422 [==============================] - 2s 4ms/step - loss: 0.1125 - accuracy: 0.9654 - val_loss: 0.0619 - val_accuracy: 0.9835
Epoch 3/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0857 - accuracy: 0.9740 - val_loss: 0.0479 - val_accuracy: 0.9882
Epoch 4/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0726 - accuracy: 0.9781 - val_loss: 0.0432 - val_accuracy: 0.9892
Epoch 5/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0634 - accuracy: 0.9803 - val_loss: 0.0391 - val_accuracy: 0.9893
Epoch 6/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0577 - accuracy: 0.9826 - val_loss: 0.0361 - val_accuracy: 0.9903
Epoch 7/15
422/422 [==============================] - 2s 5ms/step - loss: 0.0506 - accuracy: 0.9844 - val_loss: 0.0360 - val_accuracy: 0.9898
Epoch 8/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0477 - accuracy: 0.9849 - val_loss: 0.0334 - val_accuracy: 0.9905
Epoch 9/15
422/422 [==============================] - 2s 5ms/step - loss: 0.0446 - accuracy: 0.9858 - val_loss: 0.0313 - val_accuracy: 0.9913
Epoch 10/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0426 - accuracy: 0.9860 - val_loss: 0.0364 - val_accuracy: 0.9902
Epoch 11/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0394 - accuracy: 0.9876 - val_loss: 0.0290 - val_accuracy: 0.9925
Epoch 12/15
422/422 [==============================] - 1s 4ms/step - loss: 0.0371 - accuracy: 0.9886 - val_loss: 0.0290 - val_accuracy: 0.9920
Epoch 13/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0373 - accuracy: 0.9881 - val_loss: 0.0286 - val_accuracy: 0.9918
Epoch 14/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0354 - accuracy: 0.9886 - val_loss: 0.0286 - val_accuracy: 0.9925
Epoch 15/15
422/422 [==============================] - 2s 4ms/step - loss: 0.0333 - accuracy: 0.9896 - val_loss: 0.0309 - val_accuracy: 0.9920
Test loss: 0.023518385365605354
Test accuracy: 0.9923999905586243

From the last command:

  • gpurun : access the UI GPU.

  • singularity : accepts several commands like ‘run’ to execute within the container sing_image/tensorflow-latest-gpu the command: python3 simple_convnet.py.

  • --nv : enables NVIDIA GPU support in Singularity. The option will setup the container’s environment to use an NVIDIA GPU and the basic CUDA libraries to run a CUDA enabled application

  • -c : use minimal /dev and empty other directories (e.g. /tmp and $HOME) instead of sharing filesystems from your host.

  • -H : home directory specification. Can either be a src path or src:dest pair. src is the source path of the home directory outside the container and dest overrides the home directory within the container. (default “/lhome/ific/a/artemisa_user”)

Run container in WN

Now we are going to run the same task but in a WN. First, we have to prepare the submission description file

universe = vanilla

executable              = /usr/bin/python3
arguments               = "$ENV(PWD)/simple_convnet.py"

log                     = condor_logs/container_job.log
output                  = condor_logs/container_job.outfile.$(Cluster).$(Process).out
error                   = condor_logs/container_job.errors.$(Cluster).$(Process).err

+SingularityImage = "$ENV(PWD)/sing_image/tensorflow-latest-gpu"
+SingularityBind = "/lustre:/home"

request_gpus = 2

queue

We introduced new commands:

  • +SingularityImage : container image.

  • +SingularityBind : allows to mount other paths. In this case /lustre, which contains the project disk space. Home disk space /lhome is mounted by default.

  • Note also that arguments command variables are referenced with the full path.

Finally, submit the job as usual

$ condor_submit container_wn.sub
Submitting job(s).
1 job(s) submitted to cluster 578960.

When the job is finished, the content of the output should be the same as in the UI.

Summary

Recap

  • Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.

  • Apptainer(former Singularity) is installed in Artemisa.

  • Container can be run in both UI and Worker nodes.