Sunday, October 24, 2010

Apocalypse in 2012! Wait, shouldn't it be 4772?

After quickly losing interest in the movie 2012, I found myself reading about some of the claims for the 2012 phenomenon instead of paying attention to the film. For me the most interesting part was learning more about the Mayan long count calendar. The long count works in a similar way to Unix time in that it is a count from a fixed starting point known as the epoch. Unix time counts the number of seconds since January 1, 1970. The Mayan system counts the number of days since August 11, 3114 BCE.

I had always been under the impression that the 2012 fears were because it represented the end of the Mayan calendar. It turns out this isn't true. Some do seem to think it is the end of the calendar, but others just describe it as the end of a period. It is true that on December 21, 2012 the most significant digit, called the b'ak'tun, will increase by one and the least significant digits will be zero just like hitting 100,000 miles on the odometer of a car. Regardless of whether they think it is the end of the calendar or just of a given cycle, many do go on to claim the date represents the time when some catastrophic event will occur such as galactic alignment, solar flares causing geomagnetic reversal, Nibiru crashing into Earth, etc. Insane nonsense aside, the interesting thing to me was there is an overflow problem with the way the Mayan calendar, at least as I saw it described on Wikipedia, represents dates. Considering the recent problems with overflow such as Y2K and the coming year 2038 problem the obvious question is when will it overflow?

Overflow occurs because of limits in how the values are represented. For Y2K the problem was that many systems used two digits to represent the year. So the year 1900 would be recorded as "00" and 1970 would be recorded as "70". The problem is then how do you represent the year 2000? For Unix systems the time is typically stored as a signed 32-bit integer value. With one bit used for the sign, this means that there are 31-bits for the value to represent the number of seconds since the epoch. Do the math and you find: 231 / (60 s/min * 60 min/hr * 24 hr/day * 365.25 day/yr) = 68.05 years. With an epoch of January 1, 1970 the overflow will occur in 2038.

So how are Mayan dates represented? The Mayan date is represented with five digits. Each digit is base 20, except for the middle digit that is base 18. The Mayan's had symbols for representing quantities from 0 to 19, similar to how we use 0 to 9 with the decimal system. Do the math and you find that the Mayan encoding can represent 2,880,000 distinct values. As mentioned earlier it was used to count the number of days since the epoch of August 11, 3114 BCE. Using 365.25 days per year the encoding will overflow after around 7885 years, on October 13, 4772 CE. For the purposes of media fear mongering I suppose the more imminent date is useful, but you would think for a movie they could exploit interesting aspects of the actual calendar system for the plot. Of course, with my particular proposal you would have to find some reason why the date rolling over and going back to zero in a calendaring system that nobody uses would cause harm.

Thursday, October 14, 2010

Happy Birthday C++

Twenty-five years ago today the first reference guide for C++ was published. Wired has an interview with Bjarne Stroustrup to celebrate the occasion. C++ was the programming language I was taught in high school and was the first language I learned for programming a desktop computer. On an irrelevant tangent, the first programming language that I learned was UserRPL for the HP 48G calculator. I competed in UIL calculator competitions using the HP 32SII and became addicted to RPN making most other calculators unusable. That and my focus on engineering in college made the HP 48G a natural choice when I needed a graphing calculator. As it turns out, one of my first C++ programs was a simple calculator that accepted a postfix expression as input (much easier than processing infix expressions).

Fax to Email

Hilarious video of a guy seeking venture capital for a business that would accept faxes and then manually send out emails based on the form that was faxed. This is 2010 for crying out loud. Better yet you can fax them your username and password and they will fax your emails back to you.

I had to check, motel 6 lists wifi internet access as an amenity that that is available at all of their locations.

Saturday, October 9, 2010

The Go Programming Language, for every page

I have been playing around with the Go Programming Language recently, and one really annoying aspect is that every page seems to have the same title: The Go Programming Language. With many tabs open to different pages from the package documentation, this can make it difficult to see which tab has the content you want.
Luckily, there is already bug 1158 to address the issue. We'll see how long it takes to get fixed.

Hijacking Error Messages

When running tests for a previous post, I was at first surprised that I didn't get an error about not being able to resolve the host. Poking at it, I found that bad host names all resolved to 208.68.139.38. It turns out this is a "feature" from Comcast called Domain Helper. They return an IP to a Comcast search service when a domain name doesn't resolve. Fortunately Comcast does have an easy way to opt out. Even if you grant that it is useful to normal users just running a browser, it can cause numerous problems for other tools that rely on DNS. Sadly, it seems many ISPs perform this kind of hijacking now.

Another example of this sort are Soft 404s. Some sites will return a custom result pages with an HTTP 200 code instead of the proper error code. The rationale for site owners is that it can provide a better user experience than a generic error page from the web server. However, this is a very poor excuse as you can return a custom payload even with HTTP error responses. Having the proper error code means that automated tools and programs can correctly interpret the result. Hosting providers can also exploit HTTP errors by configuring the web server to return error pages similar to the Comcast search page to provide advertising and direct traffic back to the provider.

Damn Java Socket Exception Messages

When creating an error message you should think about what information would be useful for understanding what went wrong. This should especially be true if you are creating a library that is likely to be used by many other systems. Providing good error messages up front means that even if the programmers using the library do not check and customize the messages, the user will still get a reasonable result. For some use cases, such as scripting, it can also be useful because it may be a quick one-off program where the goal is to quickly automate a repetitive task or perform some analysis. In this case, having useful default error messages can speed up the initial development so you get your answer faster.

In my case, I was checking logs for a system that crawls pages and hence attempts to resolve and connect to thousands of hosts. This system logs the exceptions, but unfortunately did not provide a customized message when doing so. Analyzing the logs showed that the two most common error messages were: 1) a failure to resolve the hostname and 2) failing to connect to an HTTP server on the host. An example error message for the first case is java.net.UnknownHostException: some-host-that-does-not-exist. This message is quite useful as the exception name explains the problem and the message tells me the name of the host that could not be resolved. An example message for the second case is java.net.SocketTimeoutException: connect timed out. This message is explains the problem, but doesn't given me the crucial information of what it was trying to connect to.

Though this can easily be fixed in the application code, it is disappointing that the default message is so bad. I have noticed that my opinion of a programming language or technology seems to go down steadily the longer I am forced to use it at work. Is the grass greener on one of the other sides? How do other languages, or rather the networking libraries they provide, fair for this use case? I looked at 13 options to see how many would give a decent error message for both use cases. The results were not very encouraging. Only one option, Go, had reasonable messages for both. For the host not found case 4 options included the hostname. Only two options provided the host and port in the failed to connect case. The results are summarized in the table below the fold with links to the source code and raw error messages.

Sunday, October 3, 2010

Family Planning

It was a popular week for family planning. First there was a TED talk by Mechai Viravaidya discussing the work he has done in Thailand to encourage family planning, in particular, the use of condoms. How Mr. Condom made Thailand a better place:

It is too bad the religious nuts in the U.S. aren't swayed by evidence showing that abstinence education does not work. Then I came across an article in Science, Has China Outgrown The One-Child Policy?, about the effects of the one-child policy on the Chinese population. One issue is a rapidly aging population:
The country has benefited from a "demographic dividend"—a surfeit of young workers born during a 1960s baby boom—that will dry up as China gets old before it gets rich. From 2010 to 2020, the number of Chinese aged 20 to 24 will drop by a whopping 45%, from 125 million to 68 million.
If you look at an age pyramid for China you can see that the population is starting to contract:

I'm not sure what caused the sharp reduction of individuals in their twenties compared to teens and those in their thirties. Here is a similar pyramid for to the United States:

The pyramid for Afghanistan shows a more typical pyramid shape:

However, if you look carefully at the age pyramid for China there is another problem. There are considerably more males than females:
China's ratio of male to female births—now 119 boys born for every 100 girls—has been "really intensified by the family-planning policy," says Shuzhuo Li, a demographer at Xi'an Jiaotong University. The gender imbalance is projected to yield 30 million more men than women by 2030, heightening the risk of social instability.
The skew seems to be the result of cultural preferences for a male child leading to practices such as sex-selective abortions. These problems are further compounded by the complexity of the laws:
That decentralized structure, which still stands, has yielded a clunky policy that is comparable in complexity to the U.S. tax code, says Wang. To discourage sex-selective abortion, many provinces allow rural parents whose first child is a girl to try again for a boy, an exception sometimes called the "1.5-child policy." All told, there are 22 exceptions qualifying a couple for more children, ranging from one partner being disabled to one being a miner.
Another similarity to the U.S. tax industry is the huge bureaucracy that has been built up around the policy:
As of 2005, the family-planning bureaucracy had swollen to 509,000 employees, along with 6 million workers who help with implementation. Those stakeholders are "risk-averse," says Wang. "They pay no cost for doing nothing."
It'll be interesting to see how China reacts to these problems in the coming decades.