November 16, 2014

Databases

Our databases are inflexible. Our databases are designed for specific tasks, even if they’re intended to be general purpose. They make decisions for us, choosing how consistency, availability, and partition tolerance fit together. By design, we first select our databases around the data we want to store, rather than basing our decision on their maturity and features.

Normalized databases lend themselves to many different kinds of data, but rigidly enforce a strong consistency policy. Denormalized databases generally relinquish consistency in favor of availability. Some more special-use data stores like Redis prefer consistency and pure data structures, but ultimately don’t scale well, and sacrifice durability in a wild dash for raw speed.

We need a database that gets out of our way, and provides the flexibility we need to get the job done, along the way minimizing resources spent on developing with a slew of databases, and subsequently managing them in production. We need a database that gives us exactly enough control over consistency, availability, and partition tolerance, and one that allows us to choose how they fit together.

We need choose what sets of data are versioned, what sets are strongly consistent, and what sets use CRDTs for availability. Some data, like configuration data, needs to be highly available for reads, but should have consistent writes. Some data needs to scale efficiently with number of objects, to the point that it needs automatic warehousing. Some data needs complex enough reporting that it needs to roll up into summarized records automatically, while other data needs implicit two-phase commits. We shouldn’t have to implement all these strategies at the application level, and we shouldn’t have to implement them over and over again.

We need a better database.

Kudos

Databases

Now read this

Honesty and Artificial Intelligence