Building a Scalable Geospatial Database on top of Apache Cassandra - Mike Malone (SimpleGeo)
This talk will explore the real world technical challenges we overcame at SimpleGeo while building a spatial database on top of Apache Cassandra. Cassandra offers simple decentralized operations, no single point of failure, and near-linear horizontal scalability. But Cassandra fell far short of providing the sort of sophisticated spatial queries we need. Our challenge was to bridge that gap.This was the most interesting talk I attended. The main part of the talk was on how to use a distributed hash table, in particular Cassandra, as a spatial database. The key problem is how to support the needed types of queries including:
- Exact match: find a particular key
- Range: find all keys in some interval
- Proximity: find the nearest neighbors to a key
- Misc others: reasonable expectation of being able to adapt to new use cases
- Poor locality for some points. This is a general problem of the space-filling curves that some points in the n-D space will be close, but when following the curve will be much further apart. In practice, this means that some searches will be much more expensive than they should be.
- Non-random distribution of data. The default partition function will randomly spread out the data which avoids hotspots where many keys fall in the same bucket. By customizing the partition function to provide order it also led to a problem that the skew inherent in the dataset became a problem. In the presentation he showed a photo showing the distribution of lights and the clusters around cities. A similar photo is shown below of Egypt with obvious clustering around the Nile river.
Overall, nice presentation and good progression though their various attempts and explaining the issues they encountered.
Teach a Dog to REST - Brian Mulloy (apigee)
It's been 10 years since Fielding first defined REST. So, where are all the elegant REST APIs? While many claim REST has arrived, many APIs in the wild exhibit arbitrary, productivity-killing deviations from true REST. We'll start with a typical poorly-designed API and iterate it into a well-behaved RESTful API.Nothing spectacular, but he did have some reasonable advice for constructing APIs and some of the common problems they have seen. This presentation also had more of the sales element with the speaker frequently mentioning the apigee console for learning and playing around with APIs for popular services such as Twitter and LinkedIn. I personally found the speaker to be annoying, e.g. he had a schtick about not knowing how to pronounce idempotent methods that I'm pretty sure was an attempt at self-deprecation to help make the talk more appealing to a non-technical audience. The brief summary is:
- Be RESTful. The speaker seems to prefer RESTful interfaces over traditional RPC interfaces such as SOAP or JSON-RPC. The primary reasoning is that it leads to greater simplicity and fewer endpoints for the developer. His preferred interface is two URLs per resource: one for a collection, such as
/dogs
; and one for a specific element, such as/dogs/cujo
. I liked his focus on APIs that are easy for developers to understand and to push for conventions that make it easier to reason about how APIs should work. If done right you can guess what the API will be without ever having to look at the documentation. - Verbs are bad. Nouns are good. At first you might think he is a subject of Evil King Java, but it is not quite the same. The RESTful model is about managing resources and the argument is that the verbs are already provided as part of the HTTP Protocol. So really it is verbs as part of the URL are bad. URLs should refer to a noun.
- Plurals are better. Here he is referring to the name for collections and clearly stated that this point was just his opinion. I don't really have a strong preference, but I do agree with him that if a widely used convention was present, it would be much easier to guess what the URL should be for a given API. Plurals also do seem to make it clearer that the response would be a collection instead of a single item.
- Move complexity after the question mark. The basic idea here was that the messy parts of the API should be made query parameters to the URL. The justification is that there will be some mess and that other locations, such as HTTP headers, are more obscure and difficult to quickly hack together in a browser. Another good point I think he had is that you should try to make the API trivial to start using. The easier it is to play around with an API the more likely it is to get used.
- Borrow from leading APIs. This goes back to his theme about convention. By following other popular APIs it is more likely your API will be familiar to new developers looking at your system. He also mentioned that in his opinion LinkedIn was currently doing the best at designing clean easy to use APIs for their offerings.
Your API Sucks - Marsh Gardiner (apigee)
We've learned the hard way that websites need great user experiences to survive. So why aren't we being this aggressive with API design? What are the deeper reasons behind why REST killed SOAP? And why aren't all API providers thinking about the truly important issues, making APIs that will be used by people? Come for the hall of shame and stay for the wake-up call.Boring series of "don't do this" examples. At least the previous speaker bothered to explain why he was pushing for APIs to be a certain way. The speaker reminded me of John Hodgman, but without the humor. Waste of time.
Lunch
They had some pre-made sandwiches for the lunch. I don't make it into San Francisco that often so I decided to eat out instead.
Scaling Your Web App - Sebastian Stadil (Scalr)
Got app? Learn to scale it, with tricks for creating and managing scalable infrastructure on EC2 or elsewhere.I came in late to this talk. The part I saw was him showing off their UI. Complete waste of time, I might as well have flipped through the tour on their website.
Inside MongoDB - Alvin Richards (mongoDB)
In this talk we'll describe and discuss MongoDB's data format (BSON), the insert path, the query optimizer, auto-sharding, replication, and more. The talk will be of interest to developers interested in MongoDB and looking to learn more about what's going on "under the hood", as well as anyone interested in distributed systems and the design decisions that go into creating a system like MongoDB.Not a bad introductory overview. You could probably get the same information by spending an hour reading through the mongoDB documentation, but you wouldn't have easy access to someone for questions.
AWS Feedback Session - Jeffrey Barr (Amazon Web Services)
If you are an AWS user and want to ask questions or provide feedback, here's your chance. Senior AWS Evangelist Jeff Barr will be conducting an interactive feedback session on EC2, S3, RDS, and the other services. All of the feedback will be routed directly to the product teams.This session was only really useful as a more direct way to communicate issues to Amazon. The speaker was quite knowledgable about the Amazon stack and its good to see they are eager to get customer feedback. One aspect that came up several times was the poor support for Windows. The two issues I remember were the long delay until new versions of Windows are available as to use and, one I found quite amusing, that if you create a VM snapshot of a Windows VM then apparently the admin password is changed in the original VM.
Hackathon
I skipped the hackathon.
Summary
Not bad for a free event. I heard from others that some of the sessions were worse about just being sales pitches than the ones I attended. Very little technical depth in most of the presentations.
No comments:
Post a Comment