Bash for loop is useful to reduce repetitive tasks, but is it the best tool yet?

If you love CLI and run a lot of commands, write a lot of bash scripts, you probably know bash for loop. For example to count from 1 to 10, we can write for i in {1..10}; do echo $i; done . Or if we have an array of items, and want to iterate over it, we can write

items=(1 2 3 4 5) 
for item in ${items[@]}; do 
    echo $item; 
done

Easy-breezy.

Even though it takes a little bit time to write down, it still helps reducing amount of time repeating the same command.
But normally our command or commands does not finish immediately. Sometimes, they can take several minutes to complete, but one does not depend on the other. So we want to make them run in parallel or concurrently.

For example, if we run this

items=({10..15})
for item in ${items[@]}; do
    sleep $item
done

This would take us more than 60 seconds to complete. Can we achieve the same result with less time? Of course we can, by adding & at the end of our command and let each job run in background.

items=({10..15})
for item in ${items[@]}; do
    sleep $item &
done
[1] 9286
[2] 9287
[3] 9288
[4] 9289
[5] 9290
[6] 9291

But we lost control when we let it go. The shell return with success status code immediately even when our commands haven’t finished yet. To fix that, we can ask bash wait for all the child processes to finish.

items=({10..15})
for item in ${items[@]}; do
    sleep $item &
    pids[${i}]=$!
done

for pid in ${pids[@]}; do
    wait $pid
done
[1] 10348
[2] 10349
[3] 10350
[4] 10351
[5] 10352
[6] 10353
[1]   Done                    sleep $item
[2]   Done                    sleep $item
[3]   Done                    sleep $item
[4]   Done                    sleep $item
[5]-  Done                    sleep $item
[6]+  Done                    sleep $item

Now it’s better, we can see the shell is waiting for all jobs to finish.

But sometimes, we realize the command we written is incorrect, and we want to stop them immediately. With previous code, if we send a SIGINT by pressing CTRL-C, it only tell wait to stop waiting, instead of stopping the actual jobs. The jobs are still running in the background.

If you ran into these situations before, just stop wasting your time writing for loop with bash. Try this instead.

parallel sleep ::: {10..15}

On the left hand side of ::: is the command we want to run. And on the right hand side of ::: is our list of items. It only takes about 15 seconds to complete all the tasks. Try adding a --bar option to previous command, you can see we even have a progress bar without having to implement is ourself.

delays=({1..10})
parallel --bar sleep ::: ${delays[@]}

Now we just need to parallel every commands and enjoy a cup of ☕ and let parallel cook.

We might want to use output of some arguments with parallel , instead of declaring an array, we can simply pipe the output to parallel.

echo '
1
2
3
5' | parallel  echo this is

By default, parallel will treat a full line as a single argument. If you have a list separated by a different character, you can add -d <delim> option. One catch in this case is if your string end with a new line, that new line character will also be included into the last item.

echo {1..5} | parallel -d' ' echo this is {}.
this is 1.
this is 2.
this is 3.
this is 4.
this is 5
.

So you should make sure you input is trimmed before piping it to parallel in this case.

echo -n {1..5}  | parallel -d' ' echo hello {} .
hello 1 .
hello 2 .
hello 3 .
hello 4 .
hello 5 .

You might have noticed that I use a {}. If we don’t want to place our item at the end of the command but somewhere in the middle and then follow by some other options, how can we achieve that? The answer is by using {} placeholder.

parallel echo This is {} item ::: {1..5}

But I have more, parallel also supports some useful replacement strings.

Remove the extension:

parallel echo {.} ::: A/B.C

Output

A/B

Remove the path:

parallel echo {/} ::: A/B.C

Output

B.C

Keep only the path:

parallel echo {//} ::: A/B.C

Output

A

There are more parallel’s features to explorer at https://www.gnu.org/software/parallel/parallel_tutorial.html . Hope you find this command useful and let me know how you use it to replace for loop.