If it is, you read it here first!
I posted this article on my other (work related) blog.
I think the future for NoSQL isn’t as bright as a lot of pundits would have you believe. Yes, Yes, I know that MongoDB got a $1.2 billion valuation. Some other things to keep in mind.
- In the heyday of OODBMS, XML DB, and OLAP/MDX, there was similar hype about those technologies.
- Today, more and more NoSQL vendors are trying to build “SQL’isms” into their products. I often hear of people who want a product that has the scalability of NoSQL with transactions and a standard query language. Yes, we have that; it is called a horizontally scalable RDBMS!
Technologies come and technologies go but the underlying trends are worth understanding.
And the trends don’t favor NoSQL.
I just posted an article comparing parallel databases to sharding on the ParElastic blog at http://bit.ly/JaMeVr
It was motivated by the fact that I’ve been asked a couple of times recently how the ParElastic architecture compares with sharding and it occurred to me this past weekend that
“Parallel Database” is a database architecture but sharding is an application architecture
Read the entire blog post here:
A common myth that has been perpetrated is that relational database do not scale beyond two or three nodes. That, and the CAP Theorem are considered to be the reason why relational databases are unscalable and why NoSQL is the only feasible solution!
I ran into a very thought provoking article that makes just this case yesterday. You can read that entire post here. In this post, the author Srinath Perera provides an interesting template for choosing the data store for an application. In it, he makes the case that relational databases do not scale beyond 2 or 5 nodes. He writes,
The low scalability class roughly denotes the limits of RDBMS where they can be scaled by adding few replicas. However, data synchronization is expensive and usually RDBMSs do not scale for more than 2-5 nodes. The “Scalable” class roughly denotes data sharded (partitioned) across many nodes, and high scalability means ultra scalable systems like Google.
In 2002, when I started at Netezza, the first system I worked on (affectionately called Monolith) had almost 100 nodes. The first production class “Mercury” system had 108 nodes (112 nodes, 4 spares). By 2006, the systems had over 650 nodes and more recently much larger systems have been put into production. Yet, people still believe that relational databases don’t scale beyond two or three nodes!
Systems like ParElastic (Elastic Transparent Sharding) can certainly scale to much more than two or three nodes, and I’ve run prototype systems with upto 100 nodes on Amazon EC2!
Srinath’s post does contain an interesting perspective on unstructured and semi-structured data though, one that I think most will generally agree with.