05-Containers

The use of containers is very convenient while working in HPC environments. Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.

In this tutorial we are going to make use of containers that meet our needs in terms of software distribution. These can be run in the UI or in the Worker Nodes by means of HTCondor submission.

Note that we are using the former sintaxis for Apptainer, Singularity, for the sake of clarity with the HTCondor submission description files. But, as specified here, everything in this document that pertains to Singularity also is true for the Apptainer container runtime.

The current version installed in Artemisa is:

$ singularity --version
apptainer version 1.2.4-1.el7

Preparation

One of the main advantages while working with containers is the possibility of using already available containers. We need the required space to download and unpack the container, so we build the necessary directory structure and redirect tmp.

$ mkdir new_tmp
$ mkdir sing_image
$ mkdir sing_cache
$ export TMPDIR=`pwd`/new_tmp
$ export SINGULARITY_CACHEDIR=`pwd`/sing_cache

Then download and build the container

$ cd sing_image
$ singularity build tensorflow-latest-gpu  docker://tensorflow/tensorflow:latest-gpu
INFO:    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Starting build...
Getting image source signatures
Copying blob de96f27d9487 done
Copying blob 20d547ab5eb5 done
Copying blob aa315b7808f0 done
Copying blob 56e0351b9876 done
...

Run container in UI

We can run a program in the container making use of the UI GPU. We are going to use the same python file (simple_convnet.py) that implements the task from the previous tutorial. Before executing the command, make sure that you are in the correct pwd

$ ls
new_tmp  simple_convnet.py  sing_cache  sing_image
$ gpurun singularity run --nv -c -H $PWD:/home sing_image/tensorflow-latest-gpu python simple_convnet.py

First nonoption argument is "singularity" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:0 Running:0 Estimated waiting time:0 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: singularity
INFO:    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
INFO:    underlay of /etc/localtime required more than 50 (92) bind mounts
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (457) bind mounts
2024-06-27 14:17:53.202899: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
2024-06-27 14:18:01.170144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                      │ (None, 26, 26, 32)          │             320 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 13, 13, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D)                    │ (None, 11, 11, 64)          │          18,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 5, 5, 64)            │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten)                    │ (None, 1600)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 1600)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 10)                  │          16,010 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 34,826 (136.04 KB)
 Trainable params: 34,826 (136.04 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/15
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1719490683.216546  168056 service.cc:145] XLA service 0x7fb764008bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1719490683.216684  168056 service.cc:153]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2024-06-27 14:18:03.418383: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-06-27 14:18:03.973745: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8906
I0000 00:00:1719490689.519179  168056 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
422/422 ━━━━━━━━━━━━━━━━━━━━ 11s 10ms/step - accuracy: 0.7706 - loss: 0.7673 - val_accuracy: 0.9790 - val_loss: 0.0798
Epoch 2/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9643 - loss: 0.1196 - val_accuracy: 0.9858 - val_loss: 0.0556
Epoch 3/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9737 - loss: 0.0854 - val_accuracy: 0.9878 - val_loss: 0.0456
Epoch 4/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9785 - loss: 0.0699 - val_accuracy: 0.9882 - val_loss: 0.0423
Epoch 5/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9818 - loss: 0.0603 - val_accuracy: 0.9895 - val_loss: 0.0399
Epoch 6/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9832 - loss: 0.0539 - val_accuracy: 0.9910 - val_loss: 0.0355
Epoch 7/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9840 - loss: 0.0493 - val_accuracy: 0.9903 - val_loss: 0.0352
Epoch 8/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9865 - loss: 0.0431 - val_accuracy: 0.9915 - val_loss: 0.0326
Epoch 9/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9873 - loss: 0.0398 - val_accuracy: 0.9907 - val_loss: 0.0319
Epoch 10/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9874 - loss: 0.0425 - val_accuracy: 0.9903 - val_loss: 0.0318
Epoch 11/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9874 - loss: 0.0389 - val_accuracy: 0.9917 - val_loss: 0.0334
Epoch 12/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9891 - loss: 0.0348 - val_accuracy: 0.9922 - val_loss: 0.0315
Epoch 13/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9899 - loss: 0.0314 - val_accuracy: 0.9917 - val_loss: 0.0328
Epoch 14/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9891 - loss: 0.0326 - val_accuracy: 0.9922 - val_loss: 0.0318
Epoch 15/15
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9895 - loss: 0.0324 - val_accuracy: 0.9927 - val_loss: 0.0302
Test loss: 0.024153150618076324
Test accuracy: 0.9911999702453613

From the last command:

  • gpurun : access the UI GPU.

  • singularity : accepts several commands like ‘run’ to execute within the container sing_image/tensorflow-latest-gpu the command: python3 simple_convnet.py.

  • --nv : enables NVIDIA GPU support in Singularity. The option will setup the container’s environment to use an NVIDIA GPU and the basic CUDA libraries to run a CUDA enabled application

  • -c : use minimal /dev and empty other directories (e.g. /tmp and $HOME) instead of sharing filesystems from your host.

  • -H : home directory specification. Can either be a src path or src:dest pair. src is the source path of the home directory outside the container and dest overrides the home directory within the container. (default “/lhome/ific/a/artemisa_user”)

Run container in WN

Now we are going to run the same task but in a WN. First, we have to prepare the submission description file

universe = vanilla

executable              = python
arguments               = "$ENV(PWD)/simple_convnet.py"

log                     = condor_logs/container_job.log
output                  = condor_logs/container_job.outfile.$(Cluster).$(Process).out
error                   = condor_logs/container_job.errors.$(Cluster).$(Process).err

+SingularityImage = "$ENV(PWD)/sing_image/tensorflow-latest-gpu"
+SingularityBind = "/lustre:/home"

request_gpus = 2

queue

We introduced new commands:

  • +SingularityImage : container image.

  • +SingularityBind : allows to mount other paths. In this case /lustre, which contains the project disk space. Home disk space /lhome is mounted by default.

  • Note that the executable is the simple_convnet.py, hence the shebang line #!/usr/bin/env python in the python script.

Caution

As the python file is going to be executable, it must be given execution permits: chmod +x simple_convnet.py

Finally, submit the job as usual

$ condor_submit container_wn.sub
Submitting job(s).
1 job(s) submitted to cluster 578960.

When the job is finished, the content of the output should be the same as in the UI.

Summary

Recap

  • Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.

  • Apptainer(former Singularity) is installed in Artemisa.

  • Container can be run in both UI and Worker nodes.