01-User Interface development

This tutorial illustrates hands-on development on a User Interface(UI) running a small test program using TensorFlow, with interactive execution. gpurun tool is used to gain exclusive access to the UI GPU. As well, Python custom virtual environments are employed.

Virtual environment setup

From the official Python documentation:

A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.

In a nutshell, Python virtual environments help decouple and isolate Python installs and associated pip packages. This allows end-users to install and manage their own set of packages that are independent of those provided by the system or used by other projects.

There are different options to manage virtual environments:

  • venv : Python default virtual environment management tool. Virtual environments come each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the “base” environment, so only those explicitly installed in the virtual environment are available.

  • Conda : Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments.

We are going to use conda in this tutorial following the instructions from this page:

$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~/miniconda3/miniconda.sh

After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:

$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh

We are ready to create our virtual environment:

$ conda create -y -n artuto

Activate the environment:

$ conda activate artuto
(artuto) $

Now, our virtual environment is activated, as the prefix in the prompt (artuto) shows. Let’s install Python in our virtual environment:

(artuto) $ conda install python=3.9

Install the NVIDIA CUDA Deep Neural Network library (cuDNN)

(artuto) $ conda install cudnn

Also, install Nvidia CUDA Compiler

(artuto) $ conda install -c nvidia cuda-nvcc

Through pip, install TensorFlow

(artuto) $ pip install tensorflow[and-cuda]

Matrix operation on local GPU

Create the following python file matmul.py

from __future__ import print_function
import tensorflow as tf
from tensorflow.python.client import device_lib

print('You are using Artemisa!')

#Tensor allocations or operations to be printed
tf.debugging.set_log_device_placement(True)

# Tensorflow version
print("This is Tensorflow: ", tf.__version__)

# Check GPUs
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
local_device_protos = device_lib.list_local_devices()
print("Lista:",[x.name for x in local_device_protos if x.device_type =='GPU'])

# Creates some tensors
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Display
tf.print(c)

Try to run it

(artuto) $ python matmul.py
2025-07-23 12:16:57.204282: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
...
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
You are using Artemisa!
This is Tensorflow:  2.19.0
Num GPUs Available:  2
Traceback (most recent call last):
  File "/lhome/ific/u/user/Artemisa_tutorials/01_ui_dev/matmul.py", line 15, in <module>
    local_device_protos = device_lib.list_local_devices()
  File "/lhome/ific/u/user/.conda/envs/artuto/lib/python3.9/site-packages/tensorflow/python/client/device_lib.py", line 41, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: Bad StatusOr access: INTERNAL: CUDA error: : CUDA_ERROR_DEVICE_UNAVAILABLE: CUDA-capable device(s) is/are busy or unavailable

We get CUDA_ERROR_DEVICE_UNAVAILABLE error. Access to the UI GPU is granted thorugh the gpurun command

(artuto) $ gpurun python matmul.py
First nonoption argument is "python" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:0 Running:0 Estimated waiting time:0 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python
2025-07-23 12:22:32.369148: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
...
You are using Artemisa!
This is Tensorflow:  2.19.0
Num GPUs Available:  1
I0000 00:00:1753266156.356457 2865785 gpu_device.cc:2019] Created device /device:GPU:0 with 11430 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
Lista: ['/device:GPU:0']
I0000 00:00:1753266156.357597 2865785 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11430 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
2025-07-23 12:22:36.379720: I tensorflow/core/common_runtime/placer.cc:162] input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
2025-07-23 12:22:36.379748: I tensorflow/core/common_runtime/placer.cc:162] _EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
...
2025-07-23 12:22:36.509174: I tensorflow/core/common_runtime/placer.cc:162] input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
PrintV2: (PrintV2): /job:localhost/replica:0/task:0/device:CPU:0
2025-07-23 12:22:36.509187: I tensorflow/core/common_runtime/placer.cc:162] PrintV2: (PrintV2): /job:localhost/replica:0/task:0/device:CPU:0
2025-07-23 12:22:36.509524: I tensorflow/core/common_runtime/eager/execute.cc:1754] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
[[22 28]
 [49 64]]

We can see the resut of the matrix multiplication at the end of the output.

[[22 28]
[49 64]]

Inspecting the log messages we can see the GPU used (Tesla P100-PCIE-12GB) identified later as GPU:0.

If there are no other user petitions for the local GPU, the program will be executed immediately. Otherwise, gpurun will print the expected waiting time.

UI GPUs have exclusive access, only one program can be executed at a time. As they are meant for testing and to encourage fair use of these GPUs, time cap is limited to 5 minutes per request.

Simple classification

Now we are going to run a simple classification task in the well-known MNIST dataset.

First, let’s install pandas and matplotlib, two widely used Python libraries:

(artuto) $ pip install matplotlib pandas

Create the following python file mnist_class.py

# Library import
import tensorflow as tf
import pandas as pd
from tensorflow import keras
import matplotlib as plt

# Load MNIST dataset. Convert integer to float.
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a tf.keras.model with layers.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Choose an optimizer and loss function to train the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train and evaluate the model  
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    min_delta=0.001,
    restore_best_weights=True,
)

history = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    batch_size=512,
    epochs=500,
    callbacks=[early_stopping],
    verbose=0, # hide the output
)

history_df = pd.DataFrame(history.history)
ax = history_df.loc[0:, ['loss']].plot()
ax.figure.savefig('./loss.png')
ax = history_df.loc[0:, ['accuracy']].plot()
ax.figure.savefig('./accuracy.png')

print(("Best Validation Loss: {:0.4f}" +\
      "\nBest Validation Accuracy: {:0.4f}")\
      .format(history_df['val_loss'].min(),
              history_df['val_accuracy'].max()))

model.evaluate(x_test,  y_test, verbose=2)

Now run it in the local GPU with the gpurun command

(artuto) $ gpurun python mnist_class.py
First nonoption argument is "python" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:0 Running:0 Estimated waiting time:0 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python
...
I0000 00:00:1753267274.309325 2905055 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1753267275.454327 2905055 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Best Validation Loss: 0.0617
Best Validation Accuracy: 0.9817
313/313 - 1s - 3ms/step - accuracy: 0.9806 - loss: 0.0623

Results are shown at the end of the output

Best Validation Loss: 0.0617
Best Validation Accuracy: 0.9817
313/313 - 1s - 3ms/step - accuracy: 0.9806 - loss: 0.0623

And also produced figures

_images/accuracy.png _images/loss.png

Summary

Recap

  • Virtual environments are available in both user and project workspaces.

  • The gpurun command is needed to gain access to the UI GPU.