Sunday, June 7, 2009

When You Know Exactly What You're After: Key-Value Databases

In the beginning...

Back in 1979, Ken Thompson, a programming deity responsible for Unix (and, with Dennis Ritchie, one of the original Unix 'graybeards') and the B language (predecessor to the more familiar C), wrote a program named, in typical UNIX terseness, 'dbm'. This was short for 'database manager', which was a nice thing to have on a computer then and now. Essentially, it stored data identified by keys in a file. This sort of data structure is known in Python circles as a dictionary, to Perl programmers as (way back when) an associative array or (more recently) a hash, and in Java and it's smarter little brother C#, a Map. It makes for a better way of storing or persisting data (and accessing it) than using a generic flatfile approach. What the user doesn't have is a nice declarative interface for pulling data out via queries specifying conditions of interest (date ranges, department, and yadda yadda yadda). If you don't know the key, tough luck.

DBM is still here 30 years later, and by here I mean here on my Mac. If I type 'man dbm' at the prompt, I get the man page for it, and so will you, if you are a Mac or Linux user. Although some BSD code made it's way into Microsoft code, dbm didn't, so Windows users won't have it. Not that that is a terrible loss in this day and age.

Let a Thousand dbm, Jrs Bloom

As is often the case, dbm spawned off a number of successors, a notable one being the Berkeley Database created at Berkeley as part of the 1986-1994 effort to move from BSD 4.3 to 4.4, and also to move away from AT&T code. In 1996 Netscape asked for some extensions for their needs (for an LDAP server and for their browser) which resulted in the formation of the Sleepycat software company, who maintained and sold the embedded database through 2006, when they were acquired by Oracle (recent acquirers of Sun, and, with it, another well-known and popular database, MySQL). A visit to the official site these days reveals almost nothing useful, in addition to being stripped of the colorful name (it's now 'Oracle Berkely DB' the site is the sort of scrubbed marketing enterprisese Oracle does really well). It is still a viable embedded (no server) database, used by, among other tools OpenLDAP and the source control tool Subversion (the open-source successor to the Netscape browser, Firefox, now uses SQLite to store bookmarks, history, and so on).

Berkeley DB, in addition to being small (the library is about ~300K), fast, ACID compliant, and scalable, gives the user 4 storage options: Hash (as with the old style dbm, for 'I know what I'm looking for' storage), B-Tree (good for searching ranges), persistent Queue and RecNo (record numbers are the keys, and these are generated automatically). Also interesting is that a developer has the option of turning features like locking or transactions off or on as needed. In general, the prevailing value seems to be giving a developer control over a datastore without a lot of administration overhead, allowing the developer to optimize data access or shoot himself in the foot, depending on the choices she makes (I mixed up the developer's gender there. It happens.)

One interesting thing I discovered in my investigations of Berkeley DB is the way joins are handled. Joins are very familiar to relational database users, but it's not obvious how one would join tables in a key-value store. First off, Berkeley DB has this thing called the 'secondary index', which gives one a way of accessing data using something other than the key (removing that limitation). For example, suppose you have an employee database, and you want to find employees in a given department. Rather than going through every record in the database, you'd now have the option of looking up employees according to department.

Using secondary indices, it's possible to join tables. In the best Berkely DB reference I found, that's done like this (the example below is lifted from that source. I give attribution anyhow):

Consider the following three databases:

  • key = SSN
  • data = record containing name, address, phone number, job title
  • key = lastname
  • data = SSN
  • key = job title
  • data = SSN

Consider the following query:

Return the personnel records of all people named smith with the job
title manager.

This query finds are all the records in the primary database (personnel) for whom the criteria lastname=smith and job title=manager is true.

Assume that all databases have been properly opened and have the handles: pers_db, name_db, job_db. We also assume that we have an active transaction to which the handle txn refers.

DBC *name_curs, *job_curs, *join_curs;
DBC *carray[3];
DBT key, data;
int ret, tret;

name_curs = NULL;
job_curs = NULL;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));

if ((ret =
name_db->cursor(name_db, txn, &name_curs, 0)) != 0)
goto err; = "smith";
key.size = sizeof("smith");
if ((ret =
name_curs->c_get(name_curs, &key, &data, DB_SET)) != 0)
goto err;

if ((ret = job_db->cursor(job_db, txn, &job_curs, 0)) != 0)
goto err; = "manager";
key.size = sizeof("manager");
if ((ret =
job_curs->c_get(job_curs, &key, &data, DB_SET)) != 0)
goto err;

carray[0] = name_curs;
carray[1] = job_curs;
carray[2] = NULL;

if ((ret =
pers_db->join(pers_db, carray, &join_curs, 0)) != 0)
goto err;
while ((ret =
join_curs->c_get(join_curs, &key, &data, 0)) == 0) {
/* Process record returned in key/data. */

* If we exited the loop because we ran out of records,
* then it has completed successfully.
if (ret == DB_NOTFOUND)
ret = 0;

if (join_curs != NULL &&
(tret = join_curs->c_close(join_curs)) != 0 && ret == 0)
ret = tret;
if (name_curs != NULL &&
(tret = name_curs->c_close(name_curs)) != 0 && ret == 0)
ret = tret;
if (job_curs != NULL &&
(tret = job_curs->c_close(job_curs)) != 0 && ret == 0)
ret = tret;

return (ret);

Simple, right? This brings up another point: all the major scripting languages have modules or libraries allowing a developer to accomplish this in a much briefer and more straightforward
manner. Life is too short to program in C when you don't need to.

A Product Of Fine Japanese Engineering

Berkeley DB, like all successful software packages, has a number of offshoots, and a particularly interesting one is Tokyo Cabinet, which can be an embedded database, or one can use the Tokyo Tyrant server for remote access. Also available: Tokyo Dystopia for full text search. Watch out for future releases, Tokyo Epidemic, Tokyo Martial Law, and Tokyo Zombie Apocalypse.

Tokyo Cabinet was written by Mikio Hirabayashi whilst working for (apparently, it's like Facebook). Hirabayashi is also known for QDBM (yes, part of the DBM family). TC takes some of the ideas in Berkeley DB, and takes them further. Whereas your dbms and Berkeley DBs are primarily APIs for accessing local files, TC ups the ante with the aforementioned Tyrant for remote access (also available: replication, transaction logs, and hot backup). There are also a number of options for your underlying storage (as with Berkeley DB): Hash, B-Tree, Fixed-Length, and an option distinguishing it from Berkeley DB: the Table Engine, which mimics an RDBMS, but in a schema-less way. It is somewhat like CouchDB, which we'll eventually get to. The developer can declare indexes on columns, and query the Table in a SQLesque manner. While you wouldn't want to use it in a situation where a true RDBMS is what you need, this is definitely handy.

Tokyo Cabinet has developed a loyal following, who praise its speed, flexibility, and ease of use. It also has its detractors. There are bindings for numerous languages, but C, Perl, Ruby, Java and Lua appear to be best: in research I found people complaining that the pytc module for Python lacks support for some features.

As Tokyo Cabinet has developed some momentum, is fairly well established, and is interesting, I plan to dig into it further. Still, in the next installment, we will cover other databases.

Some fun Tokyo Cabinet links:

The Online Manual

Another Person's Notes on Distributed Key Stores

The Sourceforge Project Page