02-CPU jobs
ARTEMISA has some resources reserved for CPU-only jobs.
Modern frameworks like TensorFlow or Keras allow the creation of data pipelines that benefit from parallel processing in CPU (eg. data augmentation) and GPU (eg. training). However, some tasks like data preparation and augmentation can be done in parallel taking advantage of the number of CPUs, speeding up the process.
We propose a basic example of data preparation requesting only CPU resources, to be used in an image classifier. The target dataset is the more challenging CIFAR-10 image dataset.
Local CPU execution
First, activate the conda environment created in the first tutorial
$ conda activate artuto
We will need the following packages
(artuto) $ pip install tensorflow-datasets scipy
We are going to run the following python code: augment_data_cpu.py
#!/usr/bin/env python3
# https://stepup.ai/train_data_augmentation_keras/
import os
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
# Helper function to inspect the first images in a dataset
def visualize_data(images, categories, class_names, file_name):
fig = plt.figure(figsize=(14, 6))
fig.patch.set_facecolor('white')
for i in range(3 * 7):
plt.subplot(3, 7, i+1)
plt.xticks([])
plt.yticks([])
plt.imshow(images[i])
class_index = categories[i].argmax()
plt.xlabel(class_names[class_index])
fig.savefig(file_name)
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
# CIFAR-10 Dataset
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(class_names)
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train / 255.0
y_train = to_categorical(y_train, num_classes)
x_test = x_test / 255.0
y_test = to_categorical(y_test, num_classes)
# The above code first downloads the dataset. The included preprocessing rescales the images into the range
# between [0, 1] and converts the label from the class index (integers 0 to 10) to a one-hot encoded
# categorical vector.
# Show the first images of the training set
visualize_data(x_train, y_train, class_names, 'train_samples.png')
# Specification of the augmentation parameters:
# width shift: Randomly shift the image left and right by 3 pixels
# height shift: Randomly shift the image up and down by 3 pixels
# horizontal flip: Randomly flip the image horizontally.
width_shift = 3/32
height_shift = 3/32
flip = True
datagen = ImageDataGenerator(
horizontal_flip=flip,
width_shift_range=width_shift,
height_shift_range=height_shift,
)
datagen.fit(x_train)
# output directory
path = "./augmented_data"
try:
os.mkdir(path)
except OSError:
print ("Creation of the directory %s failed" % path)
else:
print ("Successfully created the directory %s " % path)
my_batch_size=32
it = datagen.flow(x_train, y_train, shuffle=False, batch_size=my_batch_size,
save_to_dir=path, save_prefix='artemisa_ex2')
# If we want to iterate and create more batches of images
# num_images_generated = batch_size * repetitions
# repetitions = 100
# for x in range(repetitions):
batch_images, batch_labels = next(it)
# Show samples augmented data
visualize_data(batch_images, batch_labels, class_names, 'augmented_samples.png')
Now we can run it in the UI without the gpurun tool, forcing the usage of the CPU.
(artuto) $ python augment_data_cpu.py
The code above implements data augmentation. We use the CIFAR-10 . It consists of 32x32 pixel images with 10 classes. The data is split into 50k training and 10k test images.
The data generated can be found under the created directory ./augmented_data/.
The script also generates two figures with sets of samples: the augmented and corresponding original images,
illustrating the transformations.
Original images
Augmented set
Execution in a Worker Node
Now we will run our augmentation process in a remote Worker Node. We will make use of the following
submit description file
universe = vanilla
executable = augment_data_cpu.py
log = condor_logs/test.log
output = condor_logs/outfile.$(Cluster).$(Process).out
error = condor_logs/errors.$(Cluster).$(Process).err
getenv = True
queue
and the executable .sh script referenced in the .sub file
Caution
Don’t forget to give the file referenced by executable execution permits:
chmod +x augment_data_cpu.py
And prepare the directory for the output files
(artuto) $ mkdir condor_logs
Finally launch the job through HTCondor.
(artuto) $ condor_submit augment_data_cpu.sub
Submitting job(s).
1 job(s) submitted to cluster 904340.
You can check the status of this and the rest of your launched jobs with condor_q:
(artuto) [artemisa_user@mlui01 02_cpu]$ condor_q
-- Schedd: ----.----.--.-- : <---.---.---.---:----?... @ 07/23/25 14:43:35
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
user ID: 904340 7/23 14:43 _ 1 _ 1 904340.0
Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for user: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 79 jobs; 0 completed, 0 removed, 30 idle, 32 running, 17 held, 0 suspended
Again, it is possible to check the results generated in the ./augmented_data/ directory, along with train_samples.png
and augmented_samples.png figures.
Summary
Recap
ARTEMISA provides also CPU-only resources.
If we don´t request a GPU explicitly, only CPU resources are used.
If
getenv = truethe job will therefore execute with the same set of environment variables that the user had at submit time.Don’t forget to activate the required virtual environment(ie:
conda activate) before submitting the job.