01-User Interface development

This tutorial illustrates hands-on development on a User Interface(UI) running a small test program using TensorFlow, with interactive execution on the UI. gpurun tool is used to gain exclusive access to the UI GPU. As well, Python custom virtual environments are employed.

Virtual environment setup

From the official Python documentation:

A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.

In a nutshell, Python virtual environments help decouple and isolate Python installs and associated pip packages. This allows end-users to install and manage their own set of packages that are independent of those provided by the system or used by other projects.

There are different options to manage virtual environments:

  • venv : Python default virtual environment management tool. Virtual environments come each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the base environment, so only those explicitly installed in the virtual environment are available.

  • Conda : Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer.

We are going to use conda in this tutorial following the instructions from this page:

$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~/miniconda3/miniconda.sh

After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:

$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh

Now we can create our virtual environment:

$ conda create -n artemisa-tuto

Activate the environment:

$ conda activate artemisa-tuto
(artemisa-tuto) $

Now we are in our virtual environment artemisa-tuto. Install the NVIDIA CUDA Deep Neural Network library (cuDNN)

(artemisa-tuto) $ conda install cudnn

Also, install Nvidia CUDA Compiler

(artemisa-tuto) $ conda install -c nvidia cuda-nvcc

Install pip package manager in the conda environment

(artemisa-tuto) $ conda install pip

Through pip, install TensorFlow

(artemisa-tuto) $ pip install tensorflow

Matrix operation on local GPU

Create the following python file matmul.py

from __future__ import print_function
import tensorflow as tf

print('You are using Artemisa!')

#Tensor allocations or operations to be printed
tf.debugging.set_log_device_placement(True)

# Tensorflow version
print("This is Tensorflow: ", tf.__version__)

# Check GPUs
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Creates some tensors
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

# Display
tf.print(c)

Try to run it

(artemisa-tuto) $ python3 matmul.py
2023-09-06 13:47:46.973134: I tensorflow/core/platform/cpu_feature_guard.cc:182] This
TensorFlow binary is optimized to use available CPU instructions in
performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild
TensorFlow with the appropriate compiler flags.
2023-09-06 13:47:49.524484: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38]
TF-TRT Warning: Could not find TensorRT
You are using Artemisa!
This is Tensorflow:  2.13.0
Num GPUs Available:  2
2023-09-06 13:47:53.904380: F tensorflow/tsl/platform/statusor.cc:33] Attempting to
fetch value instead of handling error INTERNAL: failed initializing StreamExecutor for
CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain:
CUDA_ERROR_DEVICE_UNAVAILABLE: CUDA-capable device(s) is/are busy or unavailable
Aborted

We get a CUDA_ERROR_DEVICE_UNAVAILABLE error. Access to the UI GPU has to be performed thorugh the gpurun command

(artemisa-tuto) $ gpurun python3 matmul.py
First nonoption argument is "python3" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:-4 Running:2 Estimated waiting time:-1500 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python3
2023-09-06 14:07:01.288113: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 14:07:03.368833: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
You are using Artemisa!
This is Tensorflow:  2.13.0
Num GPUs Available:  1
2023-09-06 14:07:07.209176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.262116: I tensorflow/core/common_runtime/placer.cc:114] input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.262174: I tensorflow/core/common_runtime/placer.cc:114] _EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.262203: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.700513: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.703438: I tensorflow/core/common_runtime/placer.cc:114] input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.703487: I tensorflow/core/common_runtime/placer.cc:114] _EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.703524: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.705416: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
tensor: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.706480: I tensorflow/core/common_runtime/placer.cc:114] tensor: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
shape: (_DeviceArg): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.706519: I tensorflow/core/common_runtime/placer.cc:114] shape: (_DeviceArg): /job:localhost/replica:0/task:0/device:CPU:0
Reshape: (Reshape): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.706549: I tensorflow/core/common_runtime/placer.cc:114] Reshape: (Reshape): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.706723: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.709486: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.710985: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.711493: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.711858: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.717145: I tensorflow/core/common_runtime/placer.cc:114] a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.717244: I tensorflow/core/common_runtime/placer.cc:114] b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.717304: I tensorflow/core/common_runtime/placer.cc:114] MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.717383: I tensorflow/core/common_runtime/placer.cc:114] product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
2023-09-06 14:07:07.718987: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
inputs_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.771976: I tensorflow/core/common_runtime/placer.cc:114] inputs_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
StringFormat: (StringFormat): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.772064: I tensorflow/core/common_runtime/placer.cc:114] StringFormat: (StringFormat): /job:localhost/replica:0/task:0/device:CPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.772104: I tensorflow/core/common_runtime/placer.cc:114] output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.773398: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.774200: I tensorflow/core/common_runtime/placer.cc:114] input: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0
PrintV2: (PrintV2): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.774285: I tensorflow/core/common_runtime/placer.cc:114] PrintV2: (PrintV2): /job:localhost/replica:0/task:0/device:CPU:0
2023-09-06 14:07:07.775173: I tensorflow/core/common_runtime/eager/execute.cc:1678] Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0
[[22 28]
[49 64]]

Now we can see the resut of the matrix multiplication

[[22 28]
[49 64]]

Note that, inspecting the log messages, we can see the GPU used (Tesla P100-PCIE-12GB), identified later as GPU:0.

If there are no other user petitions for the local GPU, the program will be executed immediately, otherwise gpurun will print the expected waiting time.

UI GPUs have exclusive access, only one program can be executed at a time. To encourage fair use of these GPUs, programs’ processing time cap is limited to 5 minutes.

Simple classification

Now we are going to perform a simple classification task in the well-known MNIST dataset.

First, let’s install pandas and matplotlib, two widely used Python libraries:

(artemisa-tuto) $ pip install matplotlib pandas

Create the following python file mnist_class.py

# Library import
import tensorflow as tf
import pandas as pd
from tensorflow import keras
import matplotlib as plt

# Load MNIST dataset. Convert integer to float.
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a tf.keras.model with layers.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Choose an optimizer and loss function to train the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train and evaluate the model  
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    min_delta=0.001,
    restore_best_weights=True,
)

history = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    batch_size=512,
    epochs=500,
    callbacks=[early_stopping],
    verbose=0, # hide the output
)

history_df = pd.DataFrame(history.history)
ax = history_df.loc[0:, ['loss']].plot()
ax.figure.savefig('./loss.png')
ax = history_df.loc[0:, ['accuracy']].plot()
ax.figure.savefig('./accuracy.png')

print(("Best Validation Loss: {:0.4f}" +\
      "\nBest Validation Accuracy: {:0.4f}")\
      .format(history_df['val_loss'].min(),
              history_df['val_accuracy'].max()))

model.evaluate(x_test,  y_test, verbose=2)

Now run it in the local GPU with the gpurun command

(artemisa-tuto) $ gpurun python3 mnist_class.py
First nonoption argument is "python3" at argv[1]
Connected
Info: OK 0. Tesla P100-PCIE-12GB [00000000:5E:00.0]
1. Tesla P100-PCIE-12GB [00000000:86:00.0]

Total clients:-4 Running:2 Estimated waiting time:-1500 seconds
GPU reserved:300 seconds granted
GPUID reserved:0 Details: - Device 0. Tesla P100-PCIE-12GB [00000000:5E:00.0] set to compute mode:Exclusive Process
Info: Executing program: python3
2023-09-07 11:49:43.784418: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-07 11:49:45.724061: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-07 11:49:49.691217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:5e:00.0, compute capability: 6.0
2023-09-07 11:49:52.055475: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f0870059d30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-09-07 11:49:52.055551: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2023-09-07 11:49:52.067102: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-07 11:49:52.125806: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8902
2023-09-07 11:49:52.557063: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
Best Validation Loss: 0.0630
Best Validation Accuracy: 0.9817
313/313 - 0s - loss: 0.0637 - accuracy: 0.9806 - 378ms/epoch - 1ms/step

Results are shown at the end of the output

Best Validation Loss: 0.0630
Best Validation Accuracy: 0.9817
313/313 - 0s - loss: 0.0637 - accuracy: 0.9806 - 378ms/epoch - 1ms/step

But the executed code also demonstrates the possibility of producing results in the form of plots

_images/accuracy.png _images/loss.png

Summary

Recap

  • Virtual environments are available in both user and project workspaces.

  • The gpurun command is needed to gain access to the UI GPU.