Wednesday, December 22, 2010

Winter Solstice Lunar Eclipse

It was too cloudy where I was at to get a decent view of the recent lunar eclipse. Luckily William Castleman put up a nice time lapse video of the event:

Winter Solstice Lunar Eclipse from William Castleman on Vimeo.

Monday, December 6, 2010

Cloudstock Review

I just got home from attending Cloudstock and thought I would write up a brief review. Since the event was free and sponsored by a large number of companies, I was a little concerned that it would be a bunch of companies hocking their crap. There was a fair amount of selling and overall the conference was light on technical information, but a few of the presentations were interesting nonetheless. A full list of sessions is available on the cloudstock page though I don't see anyway to get access to the slides that were used. I think they were recording the sessions so maybe they'll be made available at some point.

Building a Scalable Geospatial Database on top of Apache Cassandra - Mike Malone (SimpleGeo)
This talk will explore the real world technical challenges we overcame at SimpleGeo while building a spatial database on top of Apache Cassandra. Cassandra offers simple decentralized operations, no single point of failure, and near-linear horizontal scalability. But Cassandra fell far short of providing the sort of sophisticated spatial queries we need. Our challenge was to bridge that gap.
This was the most interesting talk I attended. The main part of the talk was on how to use a distributed hash table, in particular Cassandra, as a spatial database. The key problem is how to support the needed types of queries including:
  • Exact match: find a particular key
  • Range: find all keys in some interval
  • Proximity: find the nearest neighbors to a key
  • Misc others: reasonable expectation of being able to adapt to new use cases
A typical distributed hash table works well for exact match on a given key, however, it is not particularly well suited to the other use cases. Cassandra uses a partitioning scheme similar to Amazon's Dynamo with the keys positioned along a ring. Furthermore, the partition function can be customized so that the keys will be ordered. For SimpleGeo they used a partition function that provided a Z-order curve. This approach allowed for simple range queries and preserves the locality for some points. They experienced two big problems though:
  • Poor locality for some points. This is a general problem of the space-filling curves that some points in the n-D space will be close, but when following the curve will be much further apart. In practice, this means that some searches will be much more expensive than they should be.
  • Non-random distribution of data. The default partition function will randomly spread out the data which avoids hotspots where many keys fall in the same bucket. By customizing the partition function to provide order it also led to a problem that the skew inherent in the dataset became a problem. In the presentation he showed a photo showing the distribution of lights and the clusters around cities. A similar photo is shown below of Egypt with obvious clustering around the Nile river.
To solve this problem they moved away from space-filling curves to something that looked like a kd-tree stored in the distributed hash. If I understood correctly each node in the tree is stored as an entry in Cassandra using the standard partitioning scheme. Exact match and range queries can be performed by standard tree searching and traversal with some caching to avoid problems with hot spots, in particular, the root node of the tree. Data skew can be accommodated by splitting a node when it gets to full. The nodes are stored using standard Cassandra so it avoids the customization that caused tricky problems with the ordering. Proximity queries are handled by first searching for the exact match and then checking the node to restrict the bound further. If the nearest neighbor is found within the same node and the radius is such that neighboring spaces could not have a closer neighbor, then we are done. If it is possible that a neighboring space has points that could match the query, then we go to the parent node and then to siblings until satisfied.

Overall, nice presentation and good progression though their various attempts and explaining the issues they encountered.

Teach a Dog to REST - Brian Mulloy (apigee)
It's been 10 years since Fielding first defined REST. So, where are all the elegant REST APIs? While many claim REST has arrived, many APIs in the wild exhibit arbitrary, productivity-killing deviations from true REST. We'll start with a typical poorly-designed API and iterate it into a well-behaved RESTful API.
Nothing spectacular, but he did have some reasonable advice for constructing APIs and some of the common problems they have seen. This presentation also had more of the sales element with the speaker frequently mentioning the apigee console for learning and playing around with APIs for popular services such as Twitter and LinkedIn. I personally found the speaker to be annoying, e.g. he had a schtick about not knowing how to pronounce idempotent methods that I'm pretty sure was an attempt at self-deprecation to help make the talk more appealing to a non-technical audience. The brief summary is:
  • Be RESTful. The speaker seems to prefer RESTful interfaces over traditional RPC interfaces such as SOAP or JSON-RPC. The primary reasoning is that it leads to greater simplicity and fewer endpoints for the developer. His preferred interface is two URLs per resource: one for a collection, such as /dogs; and one for a specific element, such as /dogs/cujo. I liked his focus on APIs that are easy for developers to understand and to push for conventions that make it easier to reason about how APIs should work. If done right you can guess what the API will be without ever having to look at the documentation.
  • Verbs are bad. Nouns are good. At first you might think he is a subject of Evil King Java, but it is not quite the same. The RESTful model is about managing resources and the argument is that the verbs are already provided as part of the HTTP Protocol. So really it is verbs as part of the URL are bad. URLs should refer to a noun.
  • Plurals are better. Here he is referring to the name for collections and clearly stated that this point was just his opinion. I don't really have a strong preference, but I do agree with him that if a widely used convention was present, it would be much easier to guess what the URL should be for a given API. Plurals also do seem to make it clearer that the response would be a collection instead of a single item.
  • Move complexity after the question mark. The basic idea here was that the messy parts of the API should be made query parameters to the URL. The justification is that there will be some mess and that other locations, such as HTTP headers, are more obscure and difficult to quickly hack together in a browser. Another good point I think he had is that you should try to make the API trivial to start using. The easier it is to play around with an API the more likely it is to get used.
  • Borrow from leading APIs. This goes back to his theme about convention. By following other popular APIs it is more likely your API will be familiar to new developers looking at your system. He also mentioned that in his opinion LinkedIn was currently doing the best at designing clean easy to use APIs for their offerings.
One shortcoming that was brought out and emphasized during the questions at the end was he made no mention of error handling. Overall, ok but a 45 minute session was too long for this talk.

Your API Sucks - Marsh Gardiner (apigee)
We've learned the hard way that websites need great user experiences to survive. So why aren't we being this aggressive with API design? What are the deeper reasons behind why REST killed SOAP? And why aren't all API providers thinking about the truly important issues, making APIs that will be used by people? Come for the hall of shame and stay for the wake-up call.
Boring series of "don't do this" examples. At least the previous speaker bothered to explain why he was pushing for APIs to be a certain way. The speaker reminded me of John Hodgman, but without the humor. Waste of time.

Lunch

They had some pre-made sandwiches for the lunch. I don't make it into San Francisco that often so I decided to eat out instead.

Scaling Your Web App - Sebastian Stadil (Scalr)
Got app? Learn to scale it, with tricks for creating and managing scalable infrastructure on EC2 or elsewhere.
I came in late to this talk. The part I saw was him showing off their UI. Complete waste of time, I might as well have flipped through the tour on their website.

Inside MongoDB - Alvin Richards (mongoDB)
In this talk we'll describe and discuss MongoDB's data format (BSON), the insert path, the query optimizer, auto-sharding, replication, and more. The talk will be of interest to developers interested in MongoDB and looking to learn more about what's going on "under the hood", as well as anyone interested in distributed systems and the design decisions that go into creating a system like MongoDB.
Not a bad introductory overview. You could probably get the same information by spending an hour reading through the mongoDB documentation, but you wouldn't have easy access to someone for questions.

AWS Feedback Session - Jeffrey Barr (Amazon Web Services)
If you are an AWS user and want to ask questions or provide feedback, here's your chance. Senior AWS Evangelist Jeff Barr will be conducting an interactive feedback session on EC2, S3, RDS, and the other services. All of the feedback will be routed directly to the product teams.
This session was only really useful as a more direct way to communicate issues to Amazon. The speaker was quite knowledgable about the Amazon stack and its good to see they are eager to get customer feedback. One aspect that came up several times was the poor support for Windows. The two issues I remember were the long delay until new versions of Windows are available as to use and, one I found quite amusing, that if you create a VM snapshot of a Windows VM then apparently the admin password is changed in the original VM.

Hackathon

I skipped the hackathon.

Summary

Not bad for a free event. I heard from others that some of the sessions were worse about just being sales pitches than the ones I attended. Very little technical depth in most of the presentations.

Sunday, November 28, 2010

Automatic Generation of Color Palettes

A problem I've had several times is how to automatically generate colors for use in graphs. The user will be able to select some set of items that should be included and I need to select colors for each item. The requirements are:
  • The color for an item should be easy to distinguish from other items. Of course, it would also be nice if the colors looked decent, but from a functional perspective, the requirement is to be able to distinguish the items in the graph.
  • I need to be able to generate an arbitrary number of colors and avoid a fixed color palette with a predefined number. For practical purposes the number will be limited by the ability to distinguish different colors, but it would be nice for the mechanism to scale gracefully as the number of items increases.
  • The color of the background, for my purposes white, cannot be used.
When I examined a few tools, I found that most worked by having a fixed set of colors. My first attempt was to perform a naive increment of RGB pixel values. A trivial increment works well for shades of gray. This produces a palette like:

Hex24816
000000                                                                
0E0E0E                                                                   
1C1C1C                                                                    
2A2A2A                                                                    
383838                                                                    
464646                                                                    
545454                                                                    
626262                                                                    
707070                                                                    
7E7E7E                                                                    
8C8C8C                                                                    
9A9A9A                                                                    
A8A8A8                                                                    
B6B6B6                                                                    
C4C4C4                                                                    
D2D2D2                                                                    

The problem is that grayscale can be difficult to distinguish with more than a few colors. That is why tools like gnuplot use line patterns and shapes. However, most of my use cases are for graphs shown on a color monitor so there is no need to limit to grayscale. What happens if we try a naive increment with color? My first attempt was to treat the color as a three byte integer and simply divide the desired number of colors to get the increment value. Looking at the palette below you can see the results are poor:

Hex24816
000000                                                                    
0FFFF0                                                                    
1FFFE0                                                                    
2FFFD0                                                                    
3FFFC0                                                                    
4FFFB0                                                                    
5FFFA0                                                                    
6FFF90                                                                    
7FFF80                                                                    
8FFF70                                                                    
9FFF60                                                                    
AFFF50                                                                    
BFFF40                                                                    
CFFF30                                                                    
DFFF20                                                                    
EFFF10                                                                    

After looking around for a bit I found that the HSV representation is fairly well suited for this problem. HSV stands for hue, saturation, and value. The color space is represented as a cylinder:

For more background the paper Color Spaces for Computer Graphics gives a good overview and discusses how the various color spaces were designed with respect to human perception of color. To generate a palette the saturation and value settings can be fixed. The 360o for the hue can be divided by the desired number of colors and then we just increment the angle for each color. This technique gives a nice palette, but for more than around 8 colors it will be difficult for a person to distinguish some shades.

Hex24816
FF0000                                                                    
FF5F00                                                                    
FFBF00                                                                    
DFFF00                                                                    
7FFF00                                                                    
1FFF00                                                                    
00FF3F                                                                    
00FF9F                                                                    
00FFFF                                                                    
009FFF                                                                    
003FFF                                                                    
1F00FF                                                                    
7F00FF                                                                    
DF00FF                                                                    
FF00BF                                                                    
FF005F                                                                    

The Scala code I used for generating the palettes is shown below.
object Colors {

    import java.awt.Color

    def grayscale(num: Int): Seq[Color] = {
        // Truncate the full range of values to make sure we can distinguish
        // from the color white, i.e., (256, 256, 256).
        val range = 256 - 32

        // Determine how much to increment for each color.
        val delta = range / num
        if (delta == 0) {
            throw new IllegalArgumentException(
                "grayscale can support at most " + range + " colors")
        }

        // Generate the sequence of colors
        (0 until num).map(n => {
            val value = n * delta
            new Color(value, value, value)
        })
    }

    def naiveIncrement(num: Int): Seq[Color] = {
        // Truncate the full range of values to make sure we can distinguish
        // from the color white, i.e., (256, 256, 256).
        val range = 0xFFFFFF - 0xFF

        // Determine how much to increment for each color.
        val delta = range / num
        if (delta == 0) {
            throw new IllegalArgumentException(
                "naive increment can support at most " + range + " colors")
        }

        // Generate the sequence of colors
        (0 until num).map(n => {
            val value = n * delta
            new Color((value >> 16) & 0xFF, (value >> 8) & 0xFF, value & 0xFF)
        })
    }

    def hsv(num: Int): Seq[Color] = {
        // Range is 360 degrees for the hue
        val range = 360.0

        // Determine how much to increment for each color.
        val delta = range / num
        if (delta < 1.0) {
            throw new IllegalArgumentException(
                "hsv can support at most " + range + " colors")
        }
        // Generate the sequence of colors
        (0 until num).map(n => {
            val hue = n * delta
            val h = hue / 60.0
            val x = ((1 - Math.abs(h % 2 - 1)) * 255).toInt
            val c = h match {
                case h if 0.0 <= h && h < 1.0 => (255, x, 0)
                case h if 1.0 <= h && h < 2.0 => (x, 255, 0)
                case h if 2.0 <= h && h < 3.0 => (0, 255, x)
                case h if 3.0 <= h && h < 4.0 => (0, x, 255)
                case h if 4.0 <= h && h < 5.0 => (x, 0, 255)
                case h if 5.0 <= h && h < 6.0 => (255, 0, x)
                case _ => (0, 0, 0)
            }
            new Color(c._1, c._2, c._3)
        })
    }

    def main(args: Array[String]): Unit = {
        if (args.length < 2) {
            println("Usage: scala Colors <palette> <num>")
            exit(1)
        }

        // Supported palettes
        val palettes = Map(
            "grayscale" -> grayscale _,
            "naive"     -> naiveIncrement _,
            "hsv"       -> hsv _
        )

        // Generate colors and print
        palettes(args(0))(args(1).toInt).foreach(c => {
            println(c.getRGB.toHexString.toUpperCase.substring(2))
        })
    }
}

Montana Thunderstorm

Cool photo of a supercell in Montana:

Saturday, November 27, 2010

How Cats Lap

It's a bit humbling how little we know about common activities. A recent article in Science talks about how cats drink. This is a topic I have never given much thought, but it seems like something that would have been studied and understood a long time ago. It turns out it has been an open question for some time. A 1940's short film called Quicker'n a Wink captured a cat drinking using one of the earliest high speed cameras. Luckily the video is now on YouTube:



Before reading the paper or seeing the video included above, I tried to think about how lapping would work. I guessed that a cat would curve the tongue and make a cup to carry the water into the mouth. It turns out this is how dogs drink as shown in the video below:



Cats use a different technique. Like dogs they curve the tongue back, but cats barely touch the surface of the water with their tongue and then quickly withdraw the tongue letting the inertia create a stream of water into the mouth. For more information see:

Friday, November 12, 2010

Jefferson Memorial: panel three contextomy

A friend on facebook recently posted the following quote attributed to Thomas Jefferson:
God who gave us life gave us liberty. Can the liberties of a nation be secure when we have removed a conviction that these liberties are the gift of God? Indeed I tremble for my country when I reflect that God is just, that his justice cannot sleep forever.
Given Jefferson's record on religion, I was curious what the context was for this quote. I quickly found that this was a truncated version of the quote from panel three of the Jefferson memorial. The full quote is:
God who gave us life gave us liberty. Can the liberties of a nation be secure when we have removed a conviction that these liberties are the gift of God? Indeed I tremble for my country when I reflect that God is just, that his justice cannot sleep forever. Commerce between master and slave is despotism. Nothing is more certainly written in the book of fate than that these people are to be free. Establish a law for educating the common people. This it is the business of the state and on a general plan.
However, what I found really surprising was how this quote was created. It was created by taking snippets from 5 different documents authored by Jefferson including: A Summary View of the Rights of British America, Notes on the State of Virginia Query XVIII, Jefferson's Autobiography, a letter to George Wythe, and a letter to George Washington (toward the bottom of image 21, though the snippet is often shown as a quote I couldn't find a text version of the full letter so I linked to the scanned version from the Library of Congress). The other panels do not seem to be quite as bad, but are also quote mined from various sources.

Why? Quote mining to come up with some new statement doesn't serve as a memorial to Jefferson or his ideas. I could comb through his writings and combine a collection of snippets to express just about any view. It's possible he would have agreed with the sentiment, but the actual statement only reflects the views of whomever cobbled it together. Truly disappointing.

Sunday, October 24, 2010

Apocalypse in 2012! Wait, shouldn't it be 4772?

After quickly losing interest in the movie 2012, I found myself reading about some of the claims for the 2012 phenomenon instead of paying attention to the film. For me the most interesting part was learning more about the Mayan long count calendar. The long count works in a similar way to Unix time in that it is a count from a fixed starting point known as the epoch. Unix time counts the number of seconds since January 1, 1970. The Mayan system counts the number of days since August 11, 3114 BCE.

I had always been under the impression that the 2012 fears were because it represented the end of the Mayan calendar. It turns out this isn't true. Some do seem to think it is the end of the calendar, but others just describe it as the end of a period. It is true that on December 21, 2012 the most significant digit, called the b'ak'tun, will increase by one and the least significant digits will be zero just like hitting 100,000 miles on the odometer of a car. Regardless of whether they think it is the end of the calendar or just of a given cycle, many do go on to claim the date represents the time when some catastrophic event will occur such as galactic alignment, solar flares causing geomagnetic reversal, Nibiru crashing into Earth, etc. Insane nonsense aside, the interesting thing to me was there is an overflow problem with the way the Mayan calendar, at least as I saw it described on Wikipedia, represents dates. Considering the recent problems with overflow such as Y2K and the coming year 2038 problem the obvious question is when will it overflow?

Overflow occurs because of limits in how the values are represented. For Y2K the problem was that many systems used two digits to represent the year. So the year 1900 would be recorded as "00" and 1970 would be recorded as "70". The problem is then how do you represent the year 2000? For Unix systems the time is typically stored as a signed 32-bit integer value. With one bit used for the sign, this means that there are 31-bits for the value to represent the number of seconds since the epoch. Do the math and you find: 231 / (60 s/min * 60 min/hr * 24 hr/day * 365.25 day/yr) = 68.05 years. With an epoch of January 1, 1970 the overflow will occur in 2038.

So how are Mayan dates represented? The Mayan date is represented with five digits. Each digit is base 20, except for the middle digit that is base 18. The Mayan's had symbols for representing quantities from 0 to 19, similar to how we use 0 to 9 with the decimal system. Do the math and you find that the Mayan encoding can represent 2,880,000 distinct values. As mentioned earlier it was used to count the number of days since the epoch of August 11, 3114 BCE. Using 365.25 days per year the encoding will overflow after around 7885 years, on October 13, 4772 CE. For the purposes of media fear mongering I suppose the more imminent date is useful, but you would think for a movie they could exploit interesting aspects of the actual calendar system for the plot. Of course, with my particular proposal you would have to find some reason why the date rolling over and going back to zero in a calendaring system that nobody uses would cause harm.

Thursday, October 14, 2010

Happy Birthday C++

Twenty-five years ago today the first reference guide for C++ was published. Wired has an interview with Bjarne Stroustrup to celebrate the occasion. C++ was the programming language I was taught in high school and was the first language I learned for programming a desktop computer. On an irrelevant tangent, the first programming language that I learned was UserRPL for the HP 48G calculator. I competed in UIL calculator competitions using the HP 32SII and became addicted to RPN making most other calculators unusable. That and my focus on engineering in college made the HP 48G a natural choice when I needed a graphing calculator. As it turns out, one of my first C++ programs was a simple calculator that accepted a postfix expression as input (much easier than processing infix expressions).

Fax to Email

Hilarious video of a guy seeking venture capital for a business that would accept faxes and then manually send out emails based on the form that was faxed. This is 2010 for crying out loud. Better yet you can fax them your username and password and they will fax your emails back to you.

I had to check, motel 6 lists wifi internet access as an amenity that that is available at all of their locations.

Saturday, October 9, 2010

The Go Programming Language, for every page

I have been playing around with the Go Programming Language recently, and one really annoying aspect is that every page seems to have the same title: The Go Programming Language. With many tabs open to different pages from the package documentation, this can make it difficult to see which tab has the content you want.
Luckily, there is already bug 1158 to address the issue. We'll see how long it takes to get fixed.

Hijacking Error Messages

When running tests for a previous post, I was at first surprised that I didn't get an error about not being able to resolve the host. Poking at it, I found that bad host names all resolved to 208.68.139.38. It turns out this is a "feature" from Comcast called Domain Helper. They return an IP to a Comcast search service when a domain name doesn't resolve. Fortunately Comcast does have an easy way to opt out. Even if you grant that it is useful to normal users just running a browser, it can cause numerous problems for other tools that rely on DNS. Sadly, it seems many ISPs perform this kind of hijacking now.

Another example of this sort are Soft 404s. Some sites will return a custom result pages with an HTTP 200 code instead of the proper error code. The rationale for site owners is that it can provide a better user experience than a generic error page from the web server. However, this is a very poor excuse as you can return a custom payload even with HTTP error responses. Having the proper error code means that automated tools and programs can correctly interpret the result. Hosting providers can also exploit HTTP errors by configuring the web server to return error pages similar to the Comcast search page to provide advertising and direct traffic back to the provider.

Damn Java Socket Exception Messages

When creating an error message you should think about what information would be useful for understanding what went wrong. This should especially be true if you are creating a library that is likely to be used by many other systems. Providing good error messages up front means that even if the programmers using the library do not check and customize the messages, the user will still get a reasonable result. For some use cases, such as scripting, it can also be useful because it may be a quick one-off program where the goal is to quickly automate a repetitive task or perform some analysis. In this case, having useful default error messages can speed up the initial development so you get your answer faster.

In my case, I was checking logs for a system that crawls pages and hence attempts to resolve and connect to thousands of hosts. This system logs the exceptions, but unfortunately did not provide a customized message when doing so. Analyzing the logs showed that the two most common error messages were: 1) a failure to resolve the hostname and 2) failing to connect to an HTTP server on the host. An example error message for the first case is java.net.UnknownHostException: some-host-that-does-not-exist. This message is quite useful as the exception name explains the problem and the message tells me the name of the host that could not be resolved. An example message for the second case is java.net.SocketTimeoutException: connect timed out. This message is explains the problem, but doesn't given me the crucial information of what it was trying to connect to.

Though this can easily be fixed in the application code, it is disappointing that the default message is so bad. I have noticed that my opinion of a programming language or technology seems to go down steadily the longer I am forced to use it at work. Is the grass greener on one of the other sides? How do other languages, or rather the networking libraries they provide, fair for this use case? I looked at 13 options to see how many would give a decent error message for both use cases. The results were not very encouraging. Only one option, Go, had reasonable messages for both. For the host not found case 4 options included the hostname. Only two options provided the host and port in the failed to connect case. The results are summarized in the table below the fold with links to the source code and raw error messages.

Sunday, October 3, 2010

Family Planning

It was a popular week for family planning. First there was a TED talk by Mechai Viravaidya discussing the work he has done in Thailand to encourage family planning, in particular, the use of condoms. How Mr. Condom made Thailand a better place:

It is too bad the religious nuts in the U.S. aren't swayed by evidence showing that abstinence education does not work. Then I came across an article in Science, Has China Outgrown The One-Child Policy?, about the effects of the one-child policy on the Chinese population. One issue is a rapidly aging population:
The country has benefited from a "demographic dividend"—a surfeit of young workers born during a 1960s baby boom—that will dry up as China gets old before it gets rich. From 2010 to 2020, the number of Chinese aged 20 to 24 will drop by a whopping 45%, from 125 million to 68 million.
If you look at an age pyramid for China you can see that the population is starting to contract:

I'm not sure what caused the sharp reduction of individuals in their twenties compared to teens and those in their thirties. Here is a similar pyramid for to the United States:

The pyramid for Afghanistan shows a more typical pyramid shape:

However, if you look carefully at the age pyramid for China there is another problem. There are considerably more males than females:
China's ratio of male to female births—now 119 boys born for every 100 girls—has been "really intensified by the family-planning policy," says Shuzhuo Li, a demographer at Xi'an Jiaotong University. The gender imbalance is projected to yield 30 million more men than women by 2030, heightening the risk of social instability.
The skew seems to be the result of cultural preferences for a male child leading to practices such as sex-selective abortions. These problems are further compounded by the complexity of the laws:
That decentralized structure, which still stands, has yielded a clunky policy that is comparable in complexity to the U.S. tax code, says Wang. To discourage sex-selective abortion, many provinces allow rural parents whose first child is a girl to try again for a boy, an exception sometimes called the "1.5-child policy." All told, there are 22 exceptions qualifying a couple for more children, ranging from one partner being disabled to one being a miner.
Another similarity to the U.S. tax industry is the huge bureaucracy that has been built up around the policy:
As of 2005, the family-planning bureaucracy had swollen to 509,000 employees, along with 6 million workers who help with implementation. Those stakeholders are "risk-averse," says Wang. "They pay no cost for doing nothing."
It'll be interesting to see how China reacts to these problems in the coming decades.

Thursday, September 30, 2010

Religious Knowledge Survey

The Pew Forum's U.S. Religious Knowledge Survey has been in the news lately. I was disappointed as the results aren't the least bit surprising. The shocking conclusions seem to be:
The survey results are clear: People with higher levels of education tend to be more knowledgeable about religion.
The survey shows that reading and talking about religion are related to higher levels of religious knowledge. People who say they read Scripture at least once a week, for instance, get significantly more questions right on average than those who read Scripture less often. The same pattern is seen in frequency of reading books (besides Scripture) about one’s own faith.
I did find one amusing bit:
And only about one-third of those polled know which famous court trial dealt with whether evolution could be taught in public schools; 31% know this was the Scopes trial, while 36% say it was Brown vs. Board of Education and 3% name the Salem witch trials.
I would really like to see a breakdown of the 3% that thought the Salem witch trials were related to evolution. Unfortunately the raw data doesn't seem to be available yet.

Saturday, July 31, 2010

Miracle of the Herrings

Not being Catholic, I found it strange that the church would bother making such a ridiculous claim. Why not just celebrate Thomas Aquinas for what he did? What is the point of making up some pathetic story and labeling it a miracle? I was curious if this was just made up for television so I tried to find (i.e., searched for a few minutes online, not a serious scholarly effort) an official document from the church on why Thomas Aquinas was declared a saint. I didn't have much luck finding what I was looking for on the www.vatican.va website. Many results came up for Thomas Aquinas, but not what I was looking for.

The best reference I was able to find was The Sanctity and Miracles of St. Thomas Aquinas, but I have no idea about the veracity of the source. It does partially corroborate the story about the herrings:

Asked about miracles - whether he knew of any worked through the merits of Thomas either before or after death - the witness said that when Thomas died his body was buried at first before the high altar, but then the monks, fearing it might be taken from them, transferred it secretly to St. Stephen's Chapel in the same abbey-church. But about seven months later Thomas appeared in a dream to a brother James, who was prior at the time, and said:'Take me back where I was at first.' So they took him back, with due solemnity. (This dream was and still is commonly talked about in the monastery.) And when the tomb was opened a delicious fragrance came out, filling all the chapel and cloister: whereupon the community sang the Mass Os justi meditabitur sapientiam, etc., in honour of Thomas as of a saint; they thought the Mass Pro defunctis hardly suitable for such a man.

All this the witness knew because he was there and saw it for himself; it happened about seven months after Thomas's death; but he could not be sure of the month or the day. Asked who were present, he said 'the whole community'.... Asked who had called him to the place where the fragrance was smelt, he said he himself smelled it; it drew him to where the tomb was.

IX. Asked if he knew of other miracles attributed to brother Thomas, the witness said that he had heard of many; and in particular that when Thomas lay sick in the castle of Maenza and was urged to eat something, he answered, 'I would eat fresh herrings, if I had some.' Now it happened that a pedlar called just then with salted fish. He was asked to open his baskets, and one was found full of fresh herrings, though it had contained only salted fish. But when the herrings were brought to Thomas, he would not eat them.

The witness spoke too of a Master Reginald, a cripple, who was cured at the tomb of brother Thomas.

Asked how he knew of these two miracles, he replied that that about the fish he had from brother William of Tocco, prior of the Friar Preachers at Benevento, who himself had it from several people at Maenza, where the event occurred. The other story he had from brother Octavian (mentioned above) who averred that he had seen it happen. And in the monastery these miracles were common knowledge.
In this story it explicitly states that Thomas Aquinas did not eat the fish as Stephen Fry stated in the video. However, the book Saint Thomas Aquinas: the person and his work (by Jean-Pierre Torrell, the chapter "The Last Months and Death" on page 291 in the version scanned by Google) suggests he may have eaten some of it:
It was there that he fell ill and totally lost his appetite; the doctor called to take care of him—John of Guido, from Piperno—asked what he would like to eat and received a disconcerting response: some fresh herring, which he once enjoyed when he was in the Ile de France. Miraculously, some were found. But according to Tocco, it was the others who ate them, since the patient no longer wanted them. An eyewitness assures us, however, that he ate some of it: de quibus etiam arengis comedit dictus frater Thomas.
History is messy. But it doesn't really make a difference whether or not he ate some of the fish in terms of the miracle. A NY Times book review When the Lights Went Out in Europe gives a similar story about the herrings:
When St. Thomas Aquinas lay dying, in 1274, it was said that he asked for herrings, which were unknown thereabouts. Yet sure enough they soon obligingly turned up at the local fishmongers. Even in the early 14th century, when Thomas's candidacy for sainthood was under investigation, and at least two miracles were required for admission, this unlikely tale did not wash -- not least because it emerged that the witnesses had no way of telling whether what they had seen were herrings or not.
Though I would prefer an official document from the Catholic church, from what I can tell the miracle of the herrings is a real claim made to support the canonization of Thomas Aquinas. It would be nice if the Catholic church had an easily searchable database of all the saints and the records for how they qualified for sainthood. However, if all of the "miracles" are this pathetic, then it is probably better for public relations not to make the information more accessible.

Friday, July 30, 2010

Damn boost::program_options

I was dismayed by the awful help screen on one of our internal tools. The help message looked something like:
$ ./a.out -h 
Allowed options:
  -h [ --help ]                                                               s
                                                                              h
                          ... skipping ...
                                                                              e
                                                                              d
  -c [ --config ] arg (=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa) c
                                                                              o
                                                                              n
                                                                              f
                                                                              i
                                                                              g
                                                                              u
                                                                              r
                                                                              a
                                                                              t
                                                                              i
                                                                              o
                                                                              n
                                                                              f
                                                                              i
                                                                              l
                                                                              e
Of course, this was an internal tool that is no longer being maintained. Poking at the source I found it was based on boost::program_options and was just writing the options_description to stdout. I found that troubling as I have used and recommended this library many times. The root of the problem seemed to be that the default value was based on a file on the system specified by an environment variable. On my system that path is quite long and there was only room for a description that was one character wide. I put together a quick test program to illustrate the problem:
// g++ -I/opt/local/include -L/opt/local/lib -lboost_program_options-mt boostopt.cpp

#include <cstdlib>
#include <iostream>
#include <string>

#include <boost/program_options.hpp>

using namespace std;
using namespace boost::program_options;

int
main(int argc, char **argv) {

    char *file = getenv("CONFIG");
    string config((file == NULL) ? "-" : file);

    options_description desc("Allowed options");
    desc.add_options()
        ("help,h",     "show this help message, some additional text "
                       "that is here for no other reason than to make "
                       "the message wrap when the help is printed")
        ("config,c",   value<string>(&config)->default_value(config),
                       "configuration file")
    ;

    variables_map vm;
    try {
        store(parse_command_line(argc, argv, desc), vm);
        notify(vm);
    } catch (const std::exception &e) {
        cerr << "Error: " << e.what() << endl
             << endl << desc << endl;
        return 1;
    }

    if (vm.count("help")) {
        cerr << desc << endl;
        return 2;
    }

    return 0;
}
However, to my surprise when I ran the test program I could not reproduce the issue. The output when I supply a large default value was still sane, it just wraps the description to the next line. For example:
$ env CONFIG=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ./a.out -h
Allowed options:
  -h [ --help ]                         show this help message, some additional
                                        text that is here for no other reason 
                                        than to make the message wrap when the 
                                        help is printed
  -c [ --config ] arg (=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)
                                        configuration file
A little research showed the problem was fixed in boost 1.42.0. And sure enough, if I try with boost 1.41.0, like what the internal tool was using, I can easily reproduce the problem:
$ env CONFIG=aaa ./a.out -h
Allowed options:
  -h [ --help ]              show this help message, some additional text that 
                             is here for no other reason than to make the 
                             message wrap when the help is printed
  -c [ --config ] arg (=aaa) configuration file

$ env CONFIG=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ./a.out -h
Allowed options:
  -h [ --help ]                                                              sh
                                                                             ow
                          ... skipping ...
                                                                             te
                                                                             d
  -c [ --config ] arg (=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa) co
                                                                             nf
                                                                             ig
                                                                             ur
                                                                             at
                                                                             io
                                                                             n 
                                                                             fi
                                                                             le

$ env CONFIG=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ./a.out -h 
Allowed options:
  -h [ --help ]                                                               s
                                                                              h
                          ... skipping ...
                                                                              e
                                                                              d
  -c [ --config ] arg (=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa) c
                                                                              o
                                                                              n
                                                                              f
                                                                              i
                                                                              g
                                                                              u
                                                                              r
                                                                              a
                                                                              t
                                                                              i
                                                                              o
                                                                              n
                                                                              f
                                                                              i
                                                                              l
                                                                              e
And luckily for me, there isn't any work to fix the tool other than bump the version of boost it depends on.