Monday, February 21, 2011

zfcat

While messing with java, I have frequently wanted an easy way to cat a file inside of a zip archive such as a jar, war, ear, and whatever other names they have cooked up for a zip file with a manifest. As one example, when trying to setup maven to generate an OSGi bundle, I want to generate the jar file and then look at the generated manifest to see if it is correct. There is probably an existing command line tool that provides this functionality, but I couldn't find one with the little bit of searching I tried and it is trivial to write such a tool. So I wrote a quick "zip file cat" tool called zfcat in Scala:
#!/bin/sh
exec scala $0 $@
!#

import java.io._
import java.util.zip._

object zfcat {
    def copy(in: InputStream, out: OutputStream): Unit = {
        val buffer = new Array[Byte](4096)
        var length = in.read(buffer)
        while (length != -1) {
            out.write(buffer, 0, length)
            length = in.read(buffer)
        }
    }

    def main(args: Array[String]): Unit = {
        if (args.length < 2) {
            System.err.printf("Usage: zfcat <archive> [file ...]%n")
            exit(1)
        }

        val zfile = new ZipFile(new File(args(0)))
        args.tail.foreach(file => {
            val entry = zfile.getEntry(file)
            if (entry != null) {
                val in = zfile.getInputStream(entry)
                copy(in, System.out)
                in.close
            } else {
                System.err.printf("Warning: zfcat: %s: no such file%n", file)
            }
        })
    }
}

zfcat.main(args)
This tool worked ok, but it just felt really slow compared to other command line tools. Java in general, and languages that run off of the JVM such as Scala, seems to be a terrible choice for writing command line tools. The problem is that the JVM is designed for long running tasks. For command line tools where most of the time it will be a very short lived job, programs based on the JVM are just too slow. So I wrote a new version of zfcat in Go:
package main

import (
    "archive/zip"
    "fmt"
    "io"
    "os"
)

func die(err os.Error) {
    if err != nil {
        fmt.Fprintf(os.Stderr, "Error: %s\n", err)
        os.Exit(1)
    }
}

func copy(r io.Reader, w io.Writer) os.Error {
    const BUFFER_SIZE = 4096
    var buffer [BUFFER_SIZE]byte
    for {
        switch nr, er := r.Read(buffer[:]); true {
            case nr <  0: return er
            case nr == 0: return nil
            case nr >  0: if nw, ew := w.Write(buffer[0:nr]); nw != nr {
                return ew
            }
        }
    }
    return nil
}

func main() {
    if len(os.Args) < 3 {
        fmt.Fprintf(os.Stderr, "Usage: %s <archive> [file ...]\n", os.Args[0])
        os.Exit(1)
    }

    reader, err := zip.OpenReader(os.Args[1])
    die(err)

    var zfiles = make(map[string] *zip.File)
    for i := range reader.File {
        file := reader.File[i]
        zfiles[file.FileHeader.Name] = file
    }

    files := os.Args[2:]
    for i := range files {
        file := zfiles[files[i]]
        if file != nil {
            r, err := file.Open()
            die(err)
            die(copy(r, os.Stdout))
            die(r.Close())
        } else {
            fmt.Fprintf(os.Stderr, "Warning: %s: %s: no such file\n",
                os.Args[0], files[i])
        }
    }
}
The Go version is fast for short lived jobs and seems to work great. I compared the times for three versions 1) running Scala as a script, 2) Scala pre-compiled, and 3) compiled Go version. The times to cat a small manifest file:
$ time ./zfcat.scala test.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Created-By: 1.6.0_22 (Apple Inc.)


real    0m1.588s
user    0m1.076s
sys     0m0.083s
$ time scala zfcat test.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Created-By: 1.6.0_22 (Apple Inc.)


real    0m0.635s
user    0m0.822s
sys     0m0.070s
$ time ./zfcat test.jar META-INF/MANIFEST.MF
Manifest-Version: 1.0
Created-By: 1.6.0_22 (Apple Inc.)


real    0m0.028s
user    0m0.003s
sys     0m0.018s
Pre-compiling more than doubles the speed of the Scala version, but it still takes over half a second. The Go version takes less than 30ms. I'm not sure I like the use of the error return values in Go and having to explicitly check them all the time. The decision to limit the use of exceptions as a control structure was deliberate, but I haven't bothered to read through the arguments for their proposal and it is a minor annoyance right now. I was also a little disappointed that I couldn't run valgrind, apparently the 6g compiler generates an executable that valgrind doesn't understand. Valgrind might work with executables generated by gccgo, but I didn't try it.
$ valgrind ./zfcat test.jar META-INF/MANIFEST.MF
bad executable (__PAGEZERO is not 4 GB)
valgrind: ./zfcat: cannot execute binary file

No comments:

Post a Comment