01-User Interface development

This tutorial illustrates hands-on development on a User Interface(UI) running a small test program using TensorFlow, with interactive execution on the UI. gpurun tool is used to gain exclusive access to the UI GPU. As well, Python custom virtual environments are employed.

Virtual environment setup

From the official Python documentation:

A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.

In a nutshell, Python virtual environments help decouple and isolate Python installs and associated pip packages. This allows end-users to install and manage their own set of packages that are independent of those provided by the system or used by other projects.

There are different options to manage virtual environments:

  • venv : Python default virtual environment management tool. Virtual environments come each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the base environment, so only those explicitly installed in the virtual environment are available.

  • Conda : Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer.

We are going to use conda in this tutorial following the instructions from this page:

$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~/miniconda3/miniconda.sh

After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:

$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh

We are ready to create our virtual environment:

$ conda create -n artemisa-tuto

Activate the environment:

$ conda activate artemisa-tuto
(artemisa-tuto) $

Now we are in our virtual environment artemisa-tuto. Let’s install Python for our virtual environment

(artemisa-tuto) $ conda install python=3.6.8

Install the NVIDIA CUDA Deep Neural Network library (cuDNN)

(artemisa-tuto) $ conda install cudnn

Also, install Nvidia CUDA Compiler

(artemisa-tuto) $ conda install -c nvidia cuda-nvcc

Through pip, install TensorFlow

(artemisa-tuto) $ pip install tensorflow

Matrix operation on local GPU

Create the following python file matmul.py

from __future__ import print_function
import tensorflow as tf

print('You are using Artemisa!')

#Tensor allocations or operations to be printed
tf.debugging.set_log_device_placement(True)

# Tensorflow version
print("This is Tensorflow: ", tf.__version__)

# Check GPUs
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Creates some tensors
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Display
tf.print(c)

Try to run it

(artemisa-tuto) $ python matmul.py
2023-09-06 13:47:46.973134: I tensorflow/core/platform/cpu_feature_guard.cc:182] This
TensorFlow binary is optimized to use available CPU instructions in
performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild
TensorFlow with the appropriate compiler flags.
2023-09-06 13:47:49.524484: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38]
TF-TRT Warning: Could not find TensorRT
You are using Artemisa!
This is Tensorflow:  2.13.0
Num GPUs Available:  2
2023-09-06 13:47:53.904380: F tensorflow/tsl/platform/statusor.cc:33] Attempting to
fetch value instead of handling error INTERNAL: failed initializing StreamExecutor for
CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain:
CUDA_ERROR_DEVICE_UNAVAILABLE: CUDA-capable device(s) is/are busy or unavailable
Aborted

We get a CUDA_ERROR_DEVICE_UNAVAILABLE error. Access to the UI GPU has to be performed thorugh the gpurun command

(artemisa-tuto) $ gpurun python matmul.py
First nonoption argument is "python" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:-22 Running:2 Estimated waiting time:-6900 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python
You are using Artemisa!
This is Tensorflow:  2.6.2
Num GPUs Available:  1
2024-06-27 10:53:56.518799: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-27 10:53:56.665732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
Lista: ['/device:GPU:0']
2024-06-27 10:53:56.668126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
2024-06-27 10:53:57.017207: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.018977: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.019282: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.019489: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.019635: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.019717: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.020000: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
2024-06-27 10:53:57.074219: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
2024-06-27 10:53:57.074528: I tensorflow/core/common_runtime/eager/execute.cc:1161] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
[[22 28]
 [49 64]]

At the end of the output, we can see the resut of the matrix multiplication

[[22 28]
[49 64]]

Note that, inspecting the log messages, we can see the GPU used (Tesla P100-PCIE-12GB), identified later as GPU:0.

If there are no other user petitions for the local GPU, the program will be executed immediately, otherwise gpurun will print the expected waiting time.

UI GPUs have exclusive access, only one program can be executed at a time. To encourage fair use of these GPUs, programs’ processing time cap is limited to 5 minutes.

Simple classification

Now we are going to perform a simple classification task in the well-known MNIST dataset.

First, let’s install pandas and matplotlib, two widely used Python libraries:

(artemisa-tuto) $ pip install matplotlib pandas

Create the following python file mnist_class.py

# Library import
import tensorflow as tf
import pandas as pd
from tensorflow import keras
import matplotlib as plt

# Load MNIST dataset. Convert integer to float.
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a tf.keras.model with layers.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Choose an optimizer and loss function to train the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train and evaluate the model  
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    min_delta=0.001,
    restore_best_weights=True,
)

history = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    batch_size=512,
    epochs=500,
    callbacks=[early_stopping],
    verbose=0, # hide the output
)

history_df = pd.DataFrame(history.history)
ax = history_df.loc[0:, ['loss']].plot()
ax.figure.savefig('./loss.png')
ax = history_df.loc[0:, ['accuracy']].plot()
ax.figure.savefig('./accuracy.png')

print(("Best Validation Loss: {:0.4f}" +\
      "\nBest Validation Accuracy: {:0.4f}")\
      .format(history_df['val_loss'].min(),
              history_df['val_accuracy'].max()))

model.evaluate(x_test,  y_test, verbose=2)

Now run it in the local GPU with the gpurun command

(artemisa-tuto) $ gpurun python mnist_class.py
First nonoption argument is "python" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:-23 Running:2 Estimated waiting time:-7200 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python
2024-06-27 10:58:47.409372: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-27 10:58:47.524967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
2024-06-27 10:58:48.487201: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Best Validation Loss: 0.0618
Best Validation Accuracy: 0.9825
313/313 - 0s - loss: 0.0626 - accuracy: 0.9813

Results are shown at the end of the output

Best Validation Loss: 0.0618
Best Validation Accuracy: 0.9825
313/313 - 0s - loss: 0.0626 - accuracy: 0.9813

But the executed code also demonstrates the possibility of producing results in the form of plots

_images/accuracy.png _images/loss.png

Summary

Recap

  • Virtual environments are available in both user and project workspaces.

  • The gpurun command is needed to gain access to the UI GPU.