Old write-up about CAP Theorem

In 2011, Ken Rugg and I were having a number of conversations around CAP Theorem and after much discussion, we came up with the following succinct inequality. It was able to help us much better speak to the issue of what constituted “availability”, “partition tolerance”, and “consistency”. It also confirmed our suspicions that availability and partition tolerance were not simple binary attributes; Yes or No, but rather that they had shades of gray.

So here’s the write-up we prepared at that time.

parelastic-brewers-conjecture

Unfortunately, the six part blog post that we wrote (on parelastic.com) never made it in the transition to the new owners.

More than 99% of Blockchain Use Cases Are Bullshit

I’ve been following the blockchain ecosystem for some time now largely because it strikes me as yet another distributed database architecture, and I dream about those things.

For some time now, I’ve been wondering what to do after Tesora and blockchain was one of the things I’ve been looking at as a promising technology but I wasn’t seeing it. Of late I’ve been asking people who claim to be devotees at the altar of blockchain what they see as the killer app. All I hear are a large number of low rumbling sounds.

And then I saw this article by Jamie Burke of Convergence.vc and I feel better that I’m not the only one who feels that this emperor is in need of a wardrobe.

Let’s be clear, I absolutely agree that bitcoin is a wonderful use of the blockchain technology and it solves the issue of trust very cleverly through proof of work. I think there is little dispute of elegance of this solution.

But once we go past bitcoin, the applications largely sound and feel like my stomach after eating gas station sushi; they sound horrible and throw me into convulsions of pain.

In his article, Jamie Burke talks of 3d printing based on a blockchain sharded CAD file. I definitely don’t see how blockchain can prevent the double-spend (i.e. buy one Yoda CAD file, print 10,000).

Most of the blockchain ideas I’m seeing are things which are attempting to piggy-back on the hot new buzzword and where blockchain is being used to refer to “secure and encrypted database”. After all, there’s a bunch of crypto involved and there’s data stored there right? so it must be a secure encrypted database.

To which I say, Bullshit!

P.S. Oh, the irony. This blog post references a blog post with a picture labeled “Burke’s Bullshit Cycle”, and the name of this blog is hypecycles.com.

Is this the end of NoSQL?

If it is, you read it here first!

I posted this article on my other (work related) blog.

I think the future for NoSQL isn’t as bright as a lot of pundits would have you believe. Yes, Yes, I know that MongoDB got a $1.2 billion valuation. Some other things to keep in mind.
  1. In the heyday of OODBMS, XML DB, and OLAP/MDX, there was similar hype about those technologies.
  2. Today, more and more NoSQL vendors are trying to build “SQL’isms” into their products. I often hear of people who want a product that has the scalability of NoSQL with transactions and a standard query language. Yes, we have that; it is called a horizontally scalable RDBMS!

Technologies come and technologies go but the underlying trends are worth understanding.

And the trends don’t favor NoSQL.

Ingesting data at over 1,000,000 rows/second with MySQL in Amazon’s cloud!

I just posted this article on my other (work related) blog.

http://www.parelastic.com/blog/ingesting-over-1000000-rows-second-mysql-aws-cloud

Just to be clear, this was with standard MySQL, InnoDB, and with machines in Amazon’s cloud (AWS).

The data was inserted using standard SQL INSERT statements and can be queried immediately using SQL as well. All standard database stuff, no NoSQL tomfoolery going on.

This kind of high ingest rate has long been considered to be out of the reach of traditional databases; not at all true.

Why MongoDB and NoSQL make me want to scream

Recently I saw an article on the MongoHQ blog where they described a “slow query” and how to improve the performance.

The problem is this. I have a document with the following fields:

  • ID
  • submitdate
  • status
  • content

And the status can be something like ‘published’, ‘rejected’, ‘in progress’, ‘draft’ etc.,

I want to find all articles with some set of statuses and sorted by submit date.

Apparently the MongoDB solution to this problem (according to their own blog) is to:

  1. create a new field called ‘unpublished_submit_date’
  2. set that field to a ‘null’ value if the document is of an uninteresting status (i.e. published)
  3. set that field to the submitdate if it is an interesting status (i.e not published)
  4. then query on the single column unpublished_submit_date

Really? Really? You’ve got to be kidding me.

For more on this interesting exchange, a response from a MongoDB fanboy, and a follow-up, read my work blog at

http://parelastic.com/blog/more-subject-improving-performance-removing-query-logic

The things people have to do to use NoSQL, boggles the mind!

Comparing parallel databases to sharding

I just posted an article comparing parallel databases to sharding on the ParElastic blog at http://bit.ly/JaMeVr

It was motivated by the fact that I’ve been asked a couple of times recently how the ParElastic architecture compares with sharding and it occurred to me this past weekend that

“Parallel Database” is a database architecture but sharding is an application architecture

Read the entire blog post here:

http://bit.ly/JaMeVr

Scaling MongoDB: A year with MongoDB (Engineering at KiiP)

Here is the synopsis:

  • A year with MongoDB in production
  • Nine months spent in moving 95% of the data off MongoDB and onto PostgreSQL

Over the past 6 months, we’ve “scaled” MongoDB by moving data off of it.

Read the complete article here: http://bit.ly/HIQ8ox