06-Natural Language Processing
This tutorial demonstrates how you can perform Natural Language Processing tasks in ARTEMISA. It is focused on the operational part and it is heavily based on this tutorial from Hugging Face, in case more material is needed.
Setup
First, make sure your conda environment is activated
$ conda activate artemisa-tuto
Now, install the development version of transformers
(artemisa-tuto) $ pip install "transformers[sentencepiece]"
Pipelines
Pipelines allow us to connect a model with its necessary preprocessing and
postprocessing steps, enabling us to directly input any text and get an intelligible answer.
Here a simple script simple_pipeline.py
:
#!/usr/bin/env python
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
res1 = classifier("I've been waiting for a HuggingFace course my whole life.")
res2 = classifier(
["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)
print("Result 1:", res1)
print("Result 2:", res2)
And the corresponding submission file:
universe = vanilla
executable = simple_pipeline.py
log = condor_logs/log.log
output = condor_logs/outfile.out
error = condor_logs/errors.err
notify_user = artemisa.user@ific.uv.es
notification = always
getenv = True
request_gpus = 1
queue
Caution
As the python file is going to be executable, it must be given execution permits:
chmod +x simple_pipeline.py
Now you can submit the job:
(artemisa-tuto) $ condor_submit simple_pipeline.sub
In the output log we can see the results from the sentiment analysis:
Result 1: [{'label': 'POSITIVE', 'score': 0.9598046541213989}]
Result 2: [{'label': 'POSITIVE', 'score': 0.9598046541213989}, {'label': 'NEGATIVE', 'score': 0.9994558691978455}]
Another available pipeline is text generation. The process is analogue.
First, the python script to perform our task text_gen.py
:
#!/usr/bin/env python
from transformers import pipeline
generator = pipeline("text-generation")
res = generator("I hope ARTEMISA helps me in")
print(res)
Give execution permits with chmod +x text_gen.py
And the submission file text_gen.sub
universe = vanilla
executable = text_gen.py
log = condor_logs/log.log
output = condor_logs/outfile.out
error = condor_logs/errors.err
notify_user = artemisa.user@ific.uv.es
notification = always
getenv = True
request_gpus = 1
queue
When the task is finished, enjoy the randomness of the outcome:
[{'generated_text': 'I hope ARTEMISA helps me in my career," he said.\n\nParsons said he is
disappointed with Parma\'s decision to delay the transfer for two months. "I will never play in a club for two months in any other'}]