I did a talk at AWS re:Invent last week along with Alex DeBrie. The talk was about deploying modern and efficient data models with Amazon DynamoDB. One small part of the talk was about the “single-table” design pattern. Over the next couple of days I have been flooded with questions about this pattern. I’m not really sure what all the hoopla is about this pattern, and why there is so much passion and almost religious fervor around this topic.
With RDBMS there are clearly defined benefits and drawbacks with normalization (and denormalization). Normalization and denormalization are an exercise in trading off between a well understood set of redundancies and anomalies, and runtime complexity and cost. When you normalize your data, the only mechanism to get it “back together” is using a “join”.
If you happen to use a database that doesn’t support joins, or if joins turn out to be expensive, you may prefer to accept the redundancies and anomalies that come with denormalization. This has been a long established pattern, for example in the analytics realm.
The “single-table” design pattern extends traditional RDBMS denormalization in three interesting ways. First, it quite often uses non-atomic datatypes that are not allowed in the normalized terminology of Codd, Date, and others. Second, makes use of the flexible schema support in NoSQL databases to commingle data from different entities in a single table. Finally, it uses data colocation guarantees in NoSQL databases to minimize the number of blocks read, and the number of API calls required in fetching related data.
Here’s what I think these options look like in practice.
First, this is a normalized schema with three tables. When you want to reconstruct the data, you join the tables. There are primary and foreign key constraints in place to ensure that data is consistent.
The next option is the fully denormalized structure where data from all tables is “pre-joined” into a single table.
The single-table schema is just slightly different. Data for all entities are commingled into a single table.
Application designers, and data modelers should look at their specific use-cases and determine whether or not they want to eliminate redundancies and inconsistencies and pay the cost of joins (or performing the join themselves in multiple steps if the database doesn’t support it), or denormalize and benefit from lower complexity and sometimes lower cost!
The other thing to keep in mind is that nothing in the single table design pattern requires you to bring all entities into a single table. A design where some entities are combined into a single table, coexists perfectly with others that are not.
What’s all the hoopla about?