If you love CLI and run a lot of commands, write a lot of bash scripts, you
probably know bash for loop. For example to count from 1 to 10, we can write
for i in {1..10}; do echo $i; done
. Or if we have an array of items, and
want to iterate over it, we can write
items=(1 2 3 4 5)
for item in ${items[@]}; do
echo $item;
done
Easy-breezy.
Even though it takes a little bit time to write down, it still helps reducing
amount of time repeating the same command.
But normally our command or commands does not finish immediately. Sometimes, they
can take several minutes to complete, but one does not depend on the other. So we
want to make them run in parallel or concurrently.
For example, if we run this
items=({10..15})
for item in ${items[@]}; do
sleep $item
done
This would take us more than 60 seconds to complete. Can we achieve the same
result with less time? Of course we can, by adding &
at the end of our command and
let each job run in background.
items=({10..15})
for item in ${items[@]}; do
sleep $item &
done
[1] 9286
[2] 9287
[3] 9288
[4] 9289
[5] 9290
[6] 9291
But we lost control when we let it go. The shell return with success status code immediately even when our commands haven’t finished yet. To fix that, we can ask bash wait for all the child processes to finish.
items=({10..15})
for item in ${items[@]}; do
sleep $item &
pids[${i}]=$!
done
for pid in ${pids[@]}; do
wait $pid
done
[1] 10348
[2] 10349
[3] 10350
[4] 10351
[5] 10352
[6] 10353
[1] Done sleep $item
[2] Done sleep $item
[3] Done sleep $item
[4] Done sleep $item
[5]- Done sleep $item
[6]+ Done sleep $item
Now it’s better, we can see the shell is waiting for all jobs to finish.
But sometimes, we realize the command we written is incorrect, and we want to
stop them immediately. With previous code, if we send a SIGINT
by pressing
CTRL-C
, it only tell wait
to stop waiting, instead of stopping the actual
jobs. The jobs are still running in the background.
If you ran into these situations before, just stop wasting your time writing
for
loop with bash. Try this instead.
parallel sleep ::: {10..15}
On the left hand side of :::
is the command we want to run. And on the right
hand side of :::
is our list of items.
It only takes about 15 seconds to complete all the tasks.
Try adding a --bar
option to previous command, you can see we even
have a progress bar without having to implement is ourself.
delays=({1..10})
parallel --bar sleep ::: ${delays[@]}
Now we just need to parallel
every commands and enjoy a cup of ☕ and let
parallel
cook.
We might want to use output of some arguments with parallel
, instead of
declaring an array, we can simply pipe the output to parallel
.
echo '
1
2
3
5' | parallel echo this is
By default, parallel
will treat a full line as a single argument. If you have
a list separated by a different character, you can add -d <delim>
option.
One catch in this case is if your string end with a new line, that new line
character will also be included into the last item.
echo {1..5} | parallel -d' ' echo this is {}.
this is 1.
this is 2.
this is 3.
this is 4.
this is 5
.
So you should make sure you input is trimmed before piping it to parallel
in
this case.
echo -n {1..5} | parallel -d' ' echo hello {} .
hello 1 .
hello 2 .
hello 3 .
hello 4 .
hello 5 .
You might have noticed that I use a {}
. If we don’t want to place our item at
the end of the command but somewhere in the middle and then follow by some other
options, how can we achieve that? The answer is by using {}
placeholder.
parallel echo This is {} item ::: {1..5}
But I have more, parallel
also supports some useful replacement strings.
Remove the extension:
parallel echo {.} ::: A/B.C
Output
A/B
Remove the path:
parallel echo {/} ::: A/B.C
Output
B.C
Keep only the path:
parallel echo {//} ::: A/B.C
Output
A
There are more parallel
’s features to explorer at
https://www.gnu.org/software/parallel/parallel_tutorial.html
.
Hope you find this command useful and let me know how you use it to replace
for
loop.