Quick test drive of #amazon #ec2 Provisioned IOPS EBS volumes

After getting the email this morning about the new provisioned IOPS EBS volumes, I took a small test drive.

It is really easy to get yourself a provisioned IOPS volume; when creating the volume there’s a new selection.

One of the things that has long annoyed me about Amazon EC2 network and storage performance is that it is highly variable. The target for provisioned IOPS is exactly in the sweet spot of where I want it to be; database servers.

With provisioned IOPS, it appears that we’re seeing the first semblance of SLA’s or guaranteed quality of service for storage in the cloud. This is huge!

I’ve setup a multi-volume RAID set and am running performance tests and the numbers look good but what I like the most so far is that they are steady. That’s just awesome! More to come as I get the results.

Comparing parallel databases to sharding

I just posted an article comparing parallel databases to sharding on the ParElastic blog at http://bit.ly/JaMeVr

It was motivated by the fact that I’ve been asked a couple of times recently how the ParElastic architecture compares with sharding and it occurred to me this past weekend that

“Parallel Database” is a database architecture but sharding is an application architecture

Read the entire blog post here:

http://bit.ly/JaMeVr

Cloud CPU Cost over the years

Great article by Greg Arnette about the crashing cost of CPU Costs over the years, thanks to the introduction of the cloud.

http://www.gregarnette.com/blog/2011/11/a-brief-history-cloud-cpu-costs-over-the-past-5-years/

Personally, I think the most profound one was in December 2009 with the introduction of “spot pricing”.

Effectively you have an auction for the cost of an instance at any time and so long as the prevailing price is lower than the price you are willing to pay, you get to keep your instance.

Do one thing, and do it awesomely … Gimmebar!

From time to time you see a company come along that offers a simple product or service, and when they launch it just works.

The last time (that I can recall) when this happened was when I first used Dropbox. Download this little app and you got a 2GB drive in the cloud. And it worked on my Windows PC, on my Ubuntu PC, on my Android phone.

It just worked!

That was a while ago. And since then I’ve installed tons of software (and uninstalled 99% of it because it just didn’t work).

Last week I found Gimmebar.

There was no software to install, I just created an account on their web page. And it just worked!

What is Gimmebar? They consider themselves the 5th greatest invention of all time and they call themselves “a data steward”. I don’t know what that means. They also don’t tell you what the other 4 inventions are.

Here is how I would describe Gimmebar.

Gimmebar is a web saving/sharing tool that allows you to save things that you find interesting on the web in a nicely organised personal library in the cloud, and share some of that content with others if you so desire. They have something about saving stuff to your Dropbox account but I haven’t figured all of that out yet.

It has a bookmarklet for your browser, click it and things just get bookmarked and saved into your account.

But, it just worked!

I made a couple of collections, made one of them public and one of them shared.

If you share a collection it automatically gets a URL.

And that URL automatically supports an RSS Feed!

And they also backup your tweets, (I don’t give a crap about that).

So, what’s missing?

  • Some way to import all your stuff (from Google Reader)
  • An Android application (more generally, mobile application for platform of choice …)
  • The default ‘view’ on the collections includes previews; I will have enough crap before long where the preview will be a drag. How about a way to get just a list?
  • Saving a bookmark is right now at least a three click process; once you visit the site, click the bookmarklet and you get a little banner on the bottom of the screen, you click there to indicate whether you want the page to go to your private or public area, then you click the collection you want to store it in. This is functional but not easy to use.

I had one interaction with their support (little feedback tab on their page). Very quick to respond and they answered my question immediately.

On the whole, this feels like my first experience with Dropbox. Give it a shot, I think you’ll like it.

Why? Because Gimmebar set out to do one thing and they did it awesomely. It just worked!

Database scalability myth (again)

A common myth that has been perpetrated is that relational database do not scale beyond two or three nodes. That, and the CAP Theorem are considered to be the reason why relational databases are unscalable and why NoSQL is the only feasible solution!

I ran into a very thought provoking article that makes just this case yesterday. You can read that entire post here. In this post, the author Srinath Perera provides an interesting template for choosing the data store for an application. In it, he makes the case that relational databases do not scale beyond 2 or 5 nodes. He writes,

The low scalability class roughly denotes the limits of RDBMS where they can be scaled by adding few replicas. However, data synchronization is expensive and usually RDBMSs do not scale for more than 2-5 nodes. The “Scalable” class roughly denotes data sharded (partitioned) across many nodes, and high scalability means ultra scalable systems like Google.

In 2002, when I started at Netezza, the first system I worked on (affectionately called Monolith) had almost 100 nodes. The first production class “Mercury” system had 108 nodes (112 nodes, 4 spares). By 2006, the systems had over 650 nodes and more recently much larger systems have been put into production. Yet, people still believe that relational databases don’t scale beyond two or three nodes!

Systems like ParElastic (Elastic Transparent Sharding) can certainly scale to much more than two or three nodes, and I’ve run prototype systems with upto 100 nodes on Amazon EC2!

Srinath’s post does contain an interesting perspective on unstructured and semi-structured data though, one that I think most will generally agree with.