Saturday, January 29, 2011

Parallel and Bar

I recently discovered two UNIX command line tools that are really useful. For both tools, I had simple solutions that I've been using for a long time to provide similar functionality. The first tool is GNU Parallel. As the name implies, this tool provides a way to run many commands in parallel. Whats more is that it works in a manner that is consistent with xargs. To get an idea of all the cool things you can do with this tool, I highly recommend looking through the examples in the man page.

Before finding GNU Parallel, I used a simple shell script to accomplish similar tasks. Of course, my version is much more limited in what it can do, it reads one command per line from stdin and runs it in the background. The number of parallel jobs is controlled by using the bash jobs builtin to determine how many tasks are already running. The complete script is shown below.
#!/bin/bash

# Print usage
function usage {
    local prg=`basename $0`
    cat <<USAGE

Usage: $prg [options]

    Options:
    -h                   Print this help message
    -v                   Be verbose with log messages
    -n <tasks>           The number of tasks to run at a time.

USAGE
    exit 1
}

# Check arguments
numProcs=4
verbose="false"
while getopts "n:vh" option; do
    case $option in
        (n) numProcs="$OPTARG" ;;
        (v) verbose="true" ;;
        (h) usage ;;
    esac
done
shift $(($OPTIND-1))

# Read commands from stdin
while read line; do

    # Can we run another?
    while [ "$(jobs -r | wc -l)" -ge "$numProcs" ]; do
        sleep 1
    done

    # Run task
    sh -c "$line" &
done

# Wait for jobs to finish
wait
The second command is the command line progress bar. This command writes progress information to stderr while copying data from stdin to stdout. If you ever need to copy a big file and want some basic stats so you can see that things are still working, this is the tool for you. The output and options are certainly much nicer than the crude C program I was using:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>

#define PAGESIZE 4096
#define STDIN    0
#define STDOUT   1

int
main() {
    char buffer[PAGESIZE];
    ssize_t length;
    int counter;
    uint64_t bytesWritten = 0;

    while ((length = read(STDIN, buffer, PAGESIZE)) > 0) {
        write(STDOUT, buffer, length);
        bytesWritten += length;
        ++counter;
        if (counter % 10 == 0) {
            fprintf(stderr, "%"PRIu64" bytes\r", bytesWritten);
        }
    }
    fprintf(stderr, "%"PRIu64" bytes\n", bytesWritten);

    return EXIT_SUCCESS;
}

2 comments:

  1. How does bar compare to pv?
    http://www.ivarch.com/programs/pv.shtml

    ReplyDelete
  2. I haven't used pv before, but at first glance there appears to be a lot of overlap. To my knowledge bar doesn't support multiple instances working in tandem which seems quite useful.

    ReplyDelete