05-Containers

The use of containers is very convenient while working with HPC environments. Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.

In this tutorial we are going to make use of containers that meet our needs in terms of software distribution. These can be run in the UI or in the Worker Nodes by means of HTCondor submission.

Note that we are using the former sintaxis for Apptainer, Singularity, for the sake of clarity with the HTCondor submission description files. But, as specified here, everything in this document that pertains to Singularity also is true for the Apptainer container runtime.

The current version installed in Artemisa is:

$ singularity --version
apptainer version 1.4.1-1.el9

Preparation

One of the main advantages while working with containers is the possibility of using already available containers. It is required space to download and unpack the container, so we should build the necessary directory structure and redirect tmp.

$ mkdir new_tmp
$ mkdir sing_image
$ mkdir sing_cache
$ export TMPDIR=`pwd`/new_tmp
$ export SINGULARITY_CACHEDIR=`pwd`/sing_cache

A Singularity Definition File (or “def file” for short) is like a set of blueprints explaining how to build a custom container. It includes specifics about the base OS to build or the base container to start from, as well as software to install, environment variables to set at runtime, files to add from the host system, and container metadata. More on Singularity definition files

A convenient bootstrap agent is docker. So our definition file should look like this:

Bootstrap: docker
From: tensorflow/tensorflow:latest-gpu

%post
    pip install --upgrade pip
    pip install tensorflow[and-cuda]

Then download and build the container

$ singularity build sing_image/tensorflow-latest-gpu  artemisatf.def
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    The %post section will be run under the fakeroot command
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Starting build...
INFO:    Fetching OCI image...
54.4MiB / 54.4MiB [==========================================================================] 100 % 52.4 MiB/s 0s
706.6MiB / 706.6MiB [========================================================================] 100 % 52.4 MiB/s 0s
4.4MiB / 4.4MiB [============================================================================] 100 % 52.4 MiB/s 0s
28.2MiB / 28.2MiB [==========================================================================] 100 % 52.4 MiB/s 0s
82.2MiB / 82.2MiB [==========================================================================] 100 % 52.4 MiB/s 0s
2.7GiB / 2.7GiB [============================================================================] 100 % 52.4 MiB/s 0s
37.2MiB / 37.2MiB [==========================================================================] 100 % 52.4 MiB/s 0s
INFO:    Extracting OCI image...
INFO:    Inserting Apptainer configuration...
INFO:    Running post scriptlet
+ pip install --upgrade pip
...
Successfully installed pip-25.1.1
...
+ pip install tensorflow[and-cuda]
...
Downloading nvidia_cublas_cu12-12.5.3.2-py3-none-manylinux2014_x86_64.whl (363.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.3/363.3 MB 80.8 MB/s eta 0:00:00
...
Installing collected packages: nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-nvcc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12
Successfully installed nvidia-cublas-cu12-12.5.3.2 nvidia-cuda-cupti-cu12-12.5.82 nvidia-cuda-nvcc-cu12-12.5.82 nvidia-cuda-nvrtc-cu12-12.5.82 nvidia-cuda-runtime-cu12-12.5.82 nvidia-cudnn-cu12-9.3.0.75 nvidia-cufft-cu12-11.2.3.61 nvidia-curand-cu12-10.3.6.82 nvidia-cusolver-cu12-11.6.3.83 nvidia-cusparse-cu12-12.5.1.3 nvidia-nccl-cu12-2.23.4 nvidia-nvjitlink-cu12-12.5.82
..
INFO:    Creating SIF file...
[=======================================================================================================] 100 % 0s
INFO:    Build complete: sing_image/tensorflow-latest-gpu

Run container in UI

A program can be run in the container making use of the UI GPU. We are going to use the same python file (simple_convnet.py) that implements the task from the previous tutorial. Before executing the command, make sure that you are in the correct pwd

$ ls
new_tmp  simple_convnet.py  sing_cache  sing_image
$ gpurun singularity run --nv -c -H $PWD:/home sing_image/tensorflow-latest-gpu python simple_convnet.py

First nonoption argument is "singularity" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:0 Running:0 Estimated waiting time:0 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: singularity
...
3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
I0000 00:00:1753700269.722412 3506926 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11430 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                      │ (None, 26, 26, 32)          │             320 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 13, 13, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D)                    │ (None, 11, 11, 64)          │          18,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 5, 5, 64)            │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten)                    │ (None, 1600)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 1600)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 10)                  │          16,010 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 34,826 (136.04 KB)
 Trainable params: 34,826 (136.04 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/10
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1753700272.405153 3507180 service.cc:152] XLA service 0x7f135c005180 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1753700272.405186 3507180 service.cc:160]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2025-07-28 12:57:52.470133: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1753700272.675756 3507180 cuda_dnn.cc:529] Loaded cuDNN version 90300
2025-07-28 12:57:53.513894: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[128,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[128,32,13,13]{3,2,1,0} %bitcast.4508, f32[64,32,3,3]{3,2,1,0} %bitcast.4056, f32[64]{0} %bitcast.4568), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
I0000 00:00:1753700274.955826 3507180 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
405/422 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7524 - loss: 0.78362025-07-28 12:57:56.423790: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[112,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[112,32,13,13]{3,2,1,0} %bitcast.4508, f32[64,32,3,3]{3,2,1,0} %bitcast.4056, f32[64]{0} %bitcast.4568), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
422/422 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7577 - loss: 0.76702025-07-28 12:57:57.836284: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[128,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[128,32,13,13]{3,2,1,0} %bitcast.497, f32[64,32,3,3]{3,2,1,0} %bitcast.504, f32[64]{0} %bitcast.506), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-28 12:57:58.264804: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[112,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[112,32,13,13]{3,2,1,0} %bitcast.497, f32[64,32,3,3]{3,2,1,0} %bitcast.504, f32[64]{0} %bitcast.506), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
422/422 ━━━━━━━━━━━━━━━━━━━━ 7s 9ms/step - accuracy: 0.7580 - loss: 0.7661 - val_accuracy: 0.9778 - val_loss: 0.0873
Epoch 2/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9618 - loss: 0.1245 - val_accuracy: 0.9853 - val_loss: 0.0592
Epoch 3/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9732 - loss: 0.0858 - val_accuracy: 0.9863 - val_loss: 0.0517
Epoch 4/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9764 - loss: 0.0752 - val_accuracy: 0.9890 - val_loss: 0.0457
Epoch 5/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9796 - loss: 0.0637 - val_accuracy: 0.9890 - val_loss: 0.0414
Epoch 6/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9821 - loss: 0.0571 - val_accuracy: 0.9895 - val_loss: 0.0387
Epoch 7/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9837 - loss: 0.0522 - val_accuracy: 0.9897 - val_loss: 0.0366
Epoch 8/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9847 - loss: 0.0477 - val_accuracy: 0.9903 - val_loss: 0.0334
Epoch 9/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9855 - loss: 0.0461 - val_accuracy: 0.9910 - val_loss: 0.0347
Epoch 10/10
422/422 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9864 - loss: 0.0438 - val_accuracy: 0.9905 - val_loss: 0.0326
2025-07-28 12:58:10.653726: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[32,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,32,13,13]{3,2,1,0} %bitcast.497, f32[64,32,3,3]{3,2,1,0} %bitcast.504, f32[64]{0} %bitcast.506), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-28 12:58:11.409734: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{} for conv %cudnn-conv-bias-activation.7 = (f32[16,64,11,11]{3,2,1,0}, u8[0]{0}) custom-call(f32[16,32,13,13]{3,2,1,0} %bitcast.497, f32[64,32,3,3]{3,2,1,0} %bitcast.504, f32[64]{0} %bitcast.506), window={size=3x3}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
Test loss: 0.02863783948123455
Test accuracy: 0.9904999732971191

From the last command:

gpurun : access the UI GPU.
singularity : accepts several commands like ‘run’ to execute within the container sing_image/tensorflow-latest-gpu the command: python3 simple_convnet.py.
--nv : enables NVIDIA GPU support in Singularity. The option will setup the container’s environment to use an NVIDIA GPU and the basic CUDA libraries to run a CUDA enabled application
-c : use minimal /dev and empty other directories (e.g. /tmp and $HOME) instead of sharing filesystems from your host.
-H : home directory specification. Can either be a src path or src:dest pair. src is the source path of the home directory outside the container and dest overrides the home directory within the container. (default “/lhome/ific/a/artemisa_user”)

Run container in WN

The same task can be run in a WN. First, we have to prepare the submission description file

universe = vanilla

executable              = simple_convnet.py

log                     = condor_logs/container_job.log
output                  = condor_logs/container_job.outfile.$(Cluster).$(Process).out
error                   = condor_logs/container_job.errors.$(Cluster).$(Process).err

+SingularityImage = "$ENV(PWD)/sing_image/tensorflow-latest-gpu"
+SingularityBind = "/lustre:/home"

request_gpus = 1

queue

We introduced new commands:

+SingularityImage : container image
+SingularityBind : allows to mount other paths. In this case /lustre, which contains the project disk space. Home disk space /lhome is mounted by default.

Caution

Don´t forget to give execution permits: chmod +x simple_convnet.py

Finally, submit the job as usual

$ condor_submit container_wn.sub
Submitting job(s).
1 job(s) submitted to cluster 578960.

When the job is finished, the content of the output should be the same as in the UI.

Summary

Recap

Containers can be used to package and run an application, along with its dependencies, in an isolated, predictable and repeatable way.
Apptainer(former Singularity) is installed in Artemisa.
Container can be run in both UI and Worker nodes.