02-CPU jobs
ARTEMISA has some resources reserved for CPU-only jobs.
Modern frameworks like TensorFlow or Keras allow the creation of data pipelines that benefit from parallel processing in CPU (eg. data augmentation) and GPU (eg. training). However, some tasks like data preparation and augmentation can be done in parallel using only CPUs, speeding up the process. This example proposes a basic example to perform data preparation requesting CPU-only resources, in order to build an image classifier. We will do so using the more challenging CIFAR-10 dataset. It consists of 32x32 pixel images with 10 classes. The data is split into 50k training and 10k test images.
Local CPU execution
First, activate the conda environment created in the first tutorial
$ conda activate artemisa-tuto
We will need the following packages
(artemisa-tuto) $ pip install matplotlib tensorflow-datasets scipy
We are going to run the following python code: augment_data_cpu.py
#!/bin/env python3
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
# https://stepup.ai/train_data_augmentation_keras/
import os
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
# Helper function to inspect the first images in a dataset
def visualize_data(images, categories, class_names, file_name):
fig = plt.figure(figsize=(14, 6))
fig.patch.set_facecolor('white')
for i in range(3 * 7):
plt.subplot(3, 7, i+1)
plt.xticks([])
plt.yticks([])
plt.imshow(images[i])
class_index = categories[i].argmax()
plt.xlabel(class_names[class_index])
fig.savefig(file_name)
# Set CPU as only available physical device (avoid GPU)
print("Set CPU as only available device")
my_devices = tf.config.list_physical_devices(device_type='CPU')
tf.config.set_visible_devices(devices= my_devices)
# CIFAR-10 Dataset
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(class_names)
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train / 255.0
y_train = to_categorical(y_train, num_classes)
x_test = x_test / 255.0
y_test = to_categorical(y_test, num_classes)
# The above code first downloads the dataset. The included preprocessing rescales the images into the range
# between [0, 1] and converts the label from the class index (integers 0 to 10) to a one-hot encoded
# categorical vector.
# Show the first images of the training set
visualize_data(x_train, y_train, class_names, 'train_samples.png')
# Specification of the augmentation parameters:
# width shift: Randomly shift the image left and right by 3 pixels
# height shift: Randomly shift the image up and down by 3 pixels
# horizontal flip: Randomly flip the image horizontally.
width_shift = 3/32
height_shift = 3/32
flip = True
datagen = ImageDataGenerator(
horizontal_flip=flip,
width_shift_range=width_shift,
height_shift_range=height_shift,
)
datagen.fit(x_train)
# output directory
path = "./augmented_data"
try:
os.mkdir(path)
except OSError:
print ("Creation of the directory %s failed" % path)
else:
print ("Successfully created the directory %s " % path)
my_batch_size=32
it = datagen.flow(x_train, y_train, shuffle=False, batch_size=my_batch_size,
save_to_dir=path, save_prefix='artemisa_ex2')
# If we want to iterate and create more batches of images
# num_images_generated = batch_size * repetitions
# repetitions = 100
# for x in range(repetitions):
batch_images, batch_labels = next(it)
# Show samples augmented data
visualize_data(batch_images, batch_labels, class_names, 'augmented_samples.png')
Now we can run it in the UI without the gpurun tool, forcing the usage of the CPU.
(artemisa-tuto) $ python augment_data_cpu.py
The code above implements data augmentation. We use the CIFAR-10 . It consists of 32x32 pixel images with 10 classes. The data is split into 50k training and 10k test images.
The script also generates two figures with samples: a sample set from the original images and the same images augmented, to illustrate the transformations performed
Original images
Augmented set
Execution in a Worker Node
Now we will run our augmentation process in a remote Worker Node. Thus, we have to prepare the following
submit description file
universe = vanilla
executable = augment_data_cpu.sh
arguments =
log = condor_logs/test.log
output = condor_logs/outfile.$(Cluster).$(Process).out
error = condor_logs/errors.$(Cluster).$(Process).err
# Needed to read .bashrc and conda environment
getenv = True
# TestJob CPU
+testJob = True
queue
Note the ‘+testJob = True’ to use exclusively CPU resources
also the executable
.sh
script referenced in the .sub
file
#!/bin/bash
EXERCISE_ENVIRONMENT="artemisa-tuto"
eval "$(conda shell.bash hook)"
conda activate $EXERCISE_ENVIRONMENT
python augment_data_cpu.py
Caution
Don’t forget to give .sh files execution permits:
chmod +x augment_data_cpu.sh
Since the job will run on a remote Worker Node, we need to set up the same environment (conda). The beggining of the script does it before invoking the python executable
And prepare the directory for the output files
(artemisa-tuto) $ mkdir condor_logs
Finally launch the job through HTCondor.
(artemisa-tuto) $ condor_submit augment_data_cpu.sub
It is possible to check the results generated in the augmented_data
directory. train_samples.png
and augmented_samples.png
figures illustrate the augmentation made with some samples.
Summary
Recap
ARTEMISA provides also CPU-only resources.
Use the
+testJob = True
command use CPU-only slots.Use
getenv = true
to keep the current virtual environment when submitting with HTCondor.Don’t forget to initialize the required virtual environment(ie:
conda activate
) before submitting the job.