I’ve been following the blockchain ecosystem for some time now largely because it strikes me as yet another distributed database architecture, and I dream about those things.
For some time now, I’ve been wondering what to do after Tesora and blockchain was one of the things I’ve been looking at as a promising technology but I wasn’t seeing it. Of late I’ve been asking people who claim to be devotees at the altar of blockchain what they see as the killer app. All I hear are a large number of low rumbling sounds.
And then I saw this article by Jamie Burke of Convergence.vc and I feel better that I’m not the only one who feels that this emperor is in need of a wardrobe.
Let’s be clear, I absolutely agree that bitcoin is a wonderful use of the blockchain technology and it solves the issue of trust very cleverly through proof of work. I think there is little dispute of elegance of this solution.
But once we go past bitcoin, the applications largely sound and feel like my stomach after eating gas station sushi; they sound horrible and throw me into convulsions of pain.
In his article, Jamie Burke talks of 3d printing based on a blockchain sharded CAD file. I definitely don’t see how blockchain can prevent the double-spend (i.e. buy one Yoda CAD file, print 10,000).
Most of the blockchain ideas I’m seeing are things which are attempting to piggy-back on the hot new buzzword and where blockchain is being used to refer to “secure and encrypted database”. After all, there’s a bunch of crypto involved and there’s data stored there right? so it must be a secure encrypted database.
To which I say, Bullshit!
P.S. Oh, the irony. This blog post references a blog post with a picture labeled “Burke’s Bullshit Cycle”, and the name of this blog is hypecycles.com.
I just posted an article comparing parallel databases to sharding on the ParElastic blog at http://bit.ly/JaMeVr
It was motivated by the fact that I’ve been asked a couple of times recently how the ParElastic architecture compares with sharding and it occurred to me this past weekend that
“Parallel Database” is a database architecture but sharding is an application architecture
Read the entire blog post here:
Read all about it at C Mohan’s blog (cmohan.tumblr.com).
Mohan knows a thing or two about databases. As a matter of fact, keeping track of his database related achievements, publications and citations is in itself a big-data problem.
NonsenSQL, read all about it!
Great article on High Scalability entitled Revisiting Network I/O APIs: The netmap Framework. Get the paper they reference here.
As network performance continues, the bottleneck will become the amount of time spent in moving packets between the wire (hardware) and the application (software) and vice versa. The netamp framework is an interesting approach to address this.
Just reading this article http://www.wireclub.com/development/TqnkQwQ8CxUYTVT90/read describing one companies experiences migrating from SQL Server to MongoDB.
Having read the article, my only question to these folks is “why do it”?
Let’s begin by saying that we should discount all one time costs related to data migration. They are just that, one time migration costs. However monumental, if you believe that the final outcome is going to justify it, grin and bear the cost.
But, once you are in the (promised) MongoDB land, what then?
The things that this author believes that they will miss are:
- query expressiveness
- case insensitive indexes on text fields
Really, and you would still roll the dice in favor of a NoSQL science project. Well, then the benefits must be really really awesome! Let’s go take a look at what those are. Let’s take a look at what those are:
- MongoDB is free
- MongoDB is fast
- Freedom from rigid schemas
- ObjectID’s are expressive and handy
- GridFS for distributed file storage
- Developed in the open
OK, I’m scratching my head now. None of these really blows me away. Let’s look at these one at a time.
MongoDB is fast
- So is PostgreSQL and MySQL
Freedom from rigid schemas
- So are PostgreSQL and MySQL if you put them on the same SSD and multiple HDD’s like you claim you do with MongoDB
ObjectID’s are expressive and handy
- I’ll give you this one, relational databases are kind of “old school” in this department
GridFS for distributed file storage
Developed in the open
- Elastic Transparent Sharding schemes like ParElastic overcome this with Elastic Sequences which give you the same benefits. A half-way decent developer could do this for you with a simple sharded architecture.
- Yes, MongoDB is free and developed in the open like a puppy is “free”. You just told us all the “costs” associated with this “free puppy”
So really, why do people use MongoDB? I know there are good circumstances where MongoDB will whip the pants off any relational database but I submit to you that those are the 1%.
To this day, I believe that the best description of MongoDB is this one:
Mongo DB is web scale
A common myth that has been perpetrated is that relational database do not scale beyond two or three nodes. That, and the CAP Theorem are considered to be the reason why relational databases are unscalable and why NoSQL is the only feasible solution!
I ran into a very thought provoking article that makes just this case yesterday. You can read that entire post here. In this post, the author Srinath Perera provides an interesting template for choosing the data store for an application. In it, he makes the case that relational databases do not scale beyond 2 or 5 nodes. He writes,
The low scalability class roughly denotes the limits of RDBMS where they can be scaled by adding few replicas. However, data synchronization is expensive and usually RDBMSs do not scale for more than 2-5 nodes. The “Scalable” class roughly denotes data sharded (partitioned) across many nodes, and high scalability means ultra scalable systems like Google.
In 2002, when I started at Netezza, the first system I worked on (affectionately called Monolith) had almost 100 nodes. The first production class “Mercury” system had 108 nodes (112 nodes, 4 spares). By 2006, the systems had over 650 nodes and more recently much larger systems have been put into production. Yet, people still believe that relational databases don’t scale beyond two or three nodes!
Systems like ParElastic (Elastic Transparent Sharding) can certainly scale to much more than two or three nodes, and I’ve run prototype systems with upto 100 nodes on Amazon EC2!
Srinath’s post does contain an interesting perspective on unstructured and semi-structured data though, one that I think most will generally agree with.