9. Simple parallelization: farm task and gnu parallel or xargs#
Sometimes you do not need to actually parallelize your code, but to run
it with many parameters combination. Let’s assume that we have a task
that depend on one parameter and can be executed independent of other
parameters. It can be a very complex program, but for now it will be
just a very simple bash instructions that prints a value. Save the
following code in a bash script (like script.sh
) that will use the
stress
command to stress a single core
# file: script.sh
echo "First arg: ${1}"
stress -t 10 -c 1 # stress one core
echo "Stress test done"
When it is executed, it just prints the first argument
bash codes/script.sh 23
What if we want to do execute this task for 4 different arguments? we will just do it sequentially:
date +"%H-%M-%S"
bash codes/script.sh 23
bash codes/script.sh 42
bash codes/script.sh 10
bash codes/script.sh 57
date +"%H-%M-%S"
40 seconds in total. Remember that this example is very simple, but
assume that the script is a very large task. Then, the previous task
will take four times the time of a simple task. What if we have a
machine with four possible threads? it will be useful to run all the
commands in parallel. To do so you might just put them in the background
with the &
character at the end. But what will happen if you need to
run 7 different arguments and you have only 4 threads? then it would be
not optimal to have all of them running at tha same time with less than
100% of cpu usage. It would be better to run 4 of them and when one of
the finishes then launch the next one and so on. To do this
programatically, you can use gnu parallel
,
https://www.gnu.org/software/parallel/ (check the tutorial in the
documentation section, or the cheatsheet,
https://www.gnu.org/software/parallel/parallel_cheat.pdf). You can
install as spack info parallel
, or load it with spack load parallel
if it not installed already. For our case, it would be very useful
date +"%H-%M-%S"
parallel 'bash codes/script.sh {} ' ::: 23 42 10 57
date +"%H-%M-%S"
08-12-25 |
||||||||||||
First |
arg: |
23 |
||||||||||
stress: |
info: |
[83775] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
42 |
||||||||||
stress: |
info: |
[83779] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
10 |
||||||||||
stress: |
info: |
[83781] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
57 |
||||||||||
stress: |
info: |
[83785] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
08-12-36 |
Around 10 seconds now! Gnu parallel will detect the number of cores and launch the process accodingly taking care of jobs distribution. Read the manual for the many options of this powerful tool that is used even on large clusters. For instance, try to run 7 processes:
date +"%H-%M-%S"
parallel 'bash codes/script.sh {} ' ::: 23 42 10 57 21 8 83
date +"%H-%M-%S"
08-13-20 |
||||||||||||
First |
arg: |
23 |
||||||||||
stress: |
info: |
[84082] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
42 |
||||||||||
stress: |
info: |
[84086] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
10 |
||||||||||
stress: |
info: |
[84088] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
57 |
||||||||||
stress: |
info: |
[84091] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
21 |
||||||||||
stress: |
info: |
[84161] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
8 |
||||||||||
stress: |
info: |
[84165] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
First |
arg: |
83 |
||||||||||
stress: |
info: |
[84168] |
dispatching |
hogs: |
1 |
cpu, |
0 |
io, |
0 |
vm, |
0 |
hdd |
Stress |
test |
done |
||||||||||
08-13-41 |
You can play with the -j n
flag to control how many jobs to run with
parallel. By default it uses all possible threads