I’ve heard a lot of people explain cloud computing to me and none have been as lucid as this gem.
A common myth that has been perpetrated is that relational database do not scale beyond two or three nodes. That, and the CAP Theorem are considered to be the reason why relational databases are unscalable and why NoSQL is the only feasible solution!
I ran into a very thought provoking article that makes just this case yesterday. You can read that entire post here. In this post, the author Srinath Perera provides an interesting template for choosing the data store for an application. In it, he makes the case that relational databases do not scale beyond 2 or 5 nodes. He writes,
The low scalability class roughly denotes the limits of RDBMS where they can be scaled by adding few replicas. However, data synchronization is expensive and usually RDBMSs do not scale for more than 2-5 nodes. The “Scalable” class roughly denotes data sharded (partitioned) across many nodes, and high scalability means ultra scalable systems like Google.
In 2002, when I started at Netezza, the first system I worked on (affectionately called Monolith) had almost 100 nodes. The first production class “Mercury” system had 108 nodes (112 nodes, 4 spares). By 2006, the systems had over 650 nodes and more recently much larger systems have been put into production. Yet, people still believe that relational databases don’t scale beyond two or three nodes!
Systems like ParElastic (Elastic Transparent Sharding) can certainly scale to much more than two or three nodes, and I’ve run prototype systems with upto 100 nodes on Amazon EC2!
Srinath’s post does contain an interesting perspective on unstructured and semi-structured data though, one that I think most will generally agree with.
I just posted a longish blog post (six parts actually) about the CAP Theorem at the ParElastic blog.
A short write-up about last night’s Big Data Summit appeared on xconomy today.
My thanks to our sponsors, Foley Hoag LLP and the wonderful team at the Emerging Enterprise Center, Infobright, Expressor Software, and Kalido.
Since the announcement of the Boston Big Data Summit on the 2nd of October, we have had a fantastic response. The event sold out two days ago. We figured that we could remove the tables from the room and accommodate more people. And, we sold out again. The response has been fantastic!
If you have registered but you are not going to be able to attend, please contact me and we will make sure that someone on the waiting list is confirmed.
There has been some question about what “Big Data” is. Curt Monash who will be delivering the keynote and moderating the discussion at the event next week writes:
… where “Big Data” evidently is to be construed as anything from a few terabytes on up. (Things are smaller in the Northeast than in California …)
When you catch a fish (whether it is the little fish on the left or the bigger fish on the right), the steps to prepare it for the table are surprisingly similar. You may have more work to do with the big fish and you may use different tools to do it with; but the things are the same.
So, while size influences the situation, it isn’t only about the size!
In my opinion, whether data is “Big” or not is more of a threshold discussion. Data is “Big” if the tools and techniques being used to acquire, cleanse, pre-process, store, process and archive, are either unable to keep up, or are not cost effective.
Yes, everything is bigger in California, even the size of the mess they are in. Now, that is truly a “Big Problem”!
The 50,000 row spreadsheet, the half a terabyte of data in SQL Server, or the 1 trillion row table on a large ADBMS are all, in their own ways, “Big Data” problems.
The user with 50k rows in Excel may not want ( or be able to afford ) a solution with a “real database”, and may resort to splitting the spreadsheet into two sheets. The user with half a terabyte of SQL Server or MySQL data may adopt some home-grown partitioning or sharding technique instead of upgrading to a bigger platform, and the user with a trillion CDR’s may reduce the retention period; but they are all responding to the same basic challenge of “Big Data”.
We now have three panelists:
- Ellen Rubin, Founder & VP Products, Cloudswitch
- Larry Dennison, Ph.D. President and Founder, Lightwolf Technologies
- David Cohen, Chief Architect, Cloud Infrastructure Group, EMC Corporation.
It promises to be a fun evening.
I have some thoughts on subjects for the next meeting, if you have ideas please post a comment here.
Announcement for the kickoff of the Boston “Big Data Summit”. The event will be held on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA. Register at http://bigdata102209.eventbrite.com
The Boston “Big Data Summit” will be holding its first meeting on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA.
The Boston area is home to a large number of companies involved in the collection, storage, analysis, data integration, data quality, master data management, and archival of “Big Data”. If you are involved in any of these, then the meeting of the Boston “Big Data Summit” is something you should plan to attend. Save the date!
The first meeting of the group will feature a discussion of “Big Data” and the challenges of “Big Data” analysis in the cloud.
Over 120 people signed up as of October 14th 2009.
There is a waiting list. If you are registered and won’t be able to attend, please contact me so we can allow someone on the wait list to attend instead.
Seating is limited so go online and register for the event at http://bigdata102209.eventbrite.com.
The Boston “Big Data Summit” thanks the Emerging Enterprise Center at Foley Hoag LLP for their support and assistance in organizing this event.
- 6:00 PM to 7:00 PM: Networking
- 7:00 PM to 7:45 PM: Keynote by Curt Monash, President, Monash Research – Leading Industry Analyst will cover The Information Marketplace
- 7:45 PM to 8:30 PM: Panel On Big Data and Cloud Computing moderated by Curt Monash
- 8:30 PM: Conclusion and wrap up.
For more information about the Boston “Big Data Summit” please contact the group administrator at firstname.lastname@example.org
The GMail user interface, while very good and much better than some of the others lacks some useful functionality to make it a complete replacement for a desktop email client like Outlook.
Joe Kissell writes in CIO magazine about the six reasons why desktop email clients still rule. He opines that he would take a desktop email client any day and provides the following reason, and six more:
Well, there is the issue of outages like the one Gmail experienced this week. I like to be able to access my e-mail whenever I want. But beyond that, webmail still lags far behind desktop clients in several key areas.
Much has been written by many on this subject. As long ago as 2005, Cedric pronounced his verdict. Brad Shorr had a more measured comparison that I read before I made the switch about a month ago. Lifehacker pronounced the definitive comparison (IMHO it fell flat, their verdicts were shallow). Rakesh Agarwal presented some good insights and suggestions.
I read all of this and tried to decide what to do about a month ago. Here is a description of my usage.
1. Email Accounts
I have about a dozen. Some are through GMail, some are on domains that I own, one is at Yahoo, one at Hotmail and then there are a smattering of others like aol.com, ZoHo and mail.com. While employed (currently a free agent) I have always had an Exchange Server to look at as well.
2. Email volume
Excluding work related email, I receive about 20 or 30 messages a day (after eliminating SPAM).
I have about 1200 contacts in my address book.
4. Mobile device
I have a Windows Mobile based phone and I use it for calendaring, email and as a telephone. I like to keep my complete contact list on my phone.
5. Access to Email
I am NOT a Power-User who keeps reading email all the time (there are some who will challenge this). If I can read my email on my phone, I’m usually happy. But, I prefer a big screen view when possible.
6. I like to use instant messengers. Since I have accounts on AOL IM, Y!, HotMail and Google, I use a single application that supports all the flavors of IM.
Seems simple enough, right? Think again. Here is why, after migrating entirely to GMail, I have switched back to a desktop client.
1. Google Calendar and Contact Synchronization is a load of crap.
Google does somethings well. GMail (the mail and parts of the interface) are one of these things. They support POP and IMAP, they support consolidation of accounts through POP or IMAP, they allow email to be sent as if from another account. They are far ahead of the rest. With Google Labs you can get a pretty slick interface. But, Calendar and Contact Synchronization really suck.
For example, I start off with 1200 contacts and synchronize my mobile device with Google. How do I do it? By creating an Exchange Server called m.google.com and having it provide Calendar and Contacts. You can read about that here. After synchronizing the two, I had 1240 or so contacts on my phone. Ok, I had sent email to 40 people through GMail who were not in my address book. Great!
Then I changed one persons email address and the wheels came off the train. It tried to synchronize everything and ended up with some errors.
I started off with about 120 entries in my calendar after synchronizing every hour, I now have 270 or so. Well, each time it felt that contacts had been changed, it refreshed them and I now have seventeen appointments tomorrow saying it is someones birthday. Really, do I get extra cake or something?
2. Google Chat and Contact Synchronization don’t work well together.
After synchronizing contacts my Google Chat went to hell in a hand-basket. There’s no way to tell why, I just don’t see anyone in my Google Chat window any more.
Google does some things well. The GMail server side is one of them. As Bing points out, Google Search returns tons of crap (not that Bing does much better). Calendar, Contacts and Chat are still not in the “does well” category.
So, it is back to Outlook Calendar and Contacts and POP Email. I will get all the email to land in my GMail account though, nice backup and all that. But GMail Web interface, bye-bye. Outlook 2007 here I come, again.
The best of both worlds
The stable interface between a phone and Outlook, a stable calendar, contacts and email interface (hangs from time to time but the button still works), and a nice online backup at Google. And, if I’m at a PC other than my own, the web interface works in a pinch.
POP all mail from a variety of accounts into one GMail account and access just that one account from both the telephone and the desktop client. And install that IM client application again.
What do I lose? The threaded message format that GMail has (that I don’t like). Yippie!