The GCE outage on June 2 2019

I happened to notice the GCE outage on June 2 for an odd reason. I have a number of motion activated cameras that continually stream to a small Raspberry Pi cluster (where tensor flow does some nifty stuff). This cluster pushes some more serious processing onto GCE. Just as a fail-safe, I have the system also generate an email when they notice an anomaly, some unexplained movement, and so on.

And on June 2nd, this all went dark for a while, and I wasn’t quite sure why. Digging around later, I realize that the issue was that I relied on GCE for the cloud infrastructure, and gmail for the email. So when GCE had an outage, the whole thing came apart – there’s no resiliency if you have a single-point-of-failure (SPOF) and GCE was my SPOF.

WhiScreen Shot 2019-06-05 at 7.17.17 AMle I was receiving mobile alerts that there was motion, I got no notification(s) on what the cause was. The expected behavior was that I would receive alerts on my mobile device, and explanations as email. For example, the alert would read “Motion detected, camera-5 <time>”. The explanation would be something like “NORMAL: camera-5 motion detected at <time> – timer activated light change”,  “NORMAL: camera-3 motion detected at <time> – garage door closed”, or “WARNING: camera-4 motion detected at <time> – unknown pattern”.

I now realize that the reason was that the email notification, and the pattern detection relied on GCE and that SPOF caused delays in processing, and email notification. OK, so I fixed my error and now use Office365 for email generation so at least I’ll get a warning email.

But, I’m puzzled by Google’s blog post about this outage. The summary of that post is that a configuration change that was intended for a small number of servers ended up going to other servers, shit happened, shit cleanup took longer because troubleshooting network was the same as the affected network.

So, just as I had a SPOF, Google appears to have had an SPOF. But, why is it that we still have these issues where a configuration change intended for a small number of servers ends up going to a large number of servers?

Wasn’t this the same kind of thing that caused the 2017 Amazon S3 outage?

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

Shouldn’t there be a better way to detect the intended scope of a change, and a verification that this is intended? Seems like an opportunity for a different kind of check-and-balance?

Building completely redundant systems sounds like a simple solution but at some point the cost of this becomes exorbitant. So building completely independent control and user networks may seem like the obvious solution but is it cost effective to do that?

Reflections on the (first annual) OpenDev Conference, SFO

Earlier this week, I attended the OpenDev conference in San Francisco, CA.

The conference was focused on the emerging “edge computing” use cases for the cloud. This is an area that is of particular interest, not just from the obvious applicability to my ‘day job’ at Verizon, but also from the fact that it opens up an interesting new set of opportunities for distributed computing applications.

The highlight(s) of the show were two keynotes by M. Satyanarayanan of CMU. Both sessions were video taped and I’m hoping that the videos will be made available soon.

His team is working on some real cool stuff, and he showed off some of their work. The one that I found most fascinating, which most completely illustrates the value of edge computing is the augmented reality application to playing table tennis (which they call ping pong, and I know that annoys a lot of people :))

It was great to hear a user perspective presented by Andrew Mitry of Walmart. With 11,000 stores and an enormous (2mm??) employees, their edge computing use-case truly represents the scale at which these systems will have to operate, and the benefits that they can bring to the enterprise.

The conference sessions were very interesting and some of my key takeaways were that:

  • Edge Computing means different things to different people, because the term ‘Edge’ means different things to different applications. In some cases the edge device may be in a data center, in other cases in your houses, and in other cases on top of a lamp post at the end of your street.
  • A common API in orchestrating applications across the entirety of the cloud is very important, but different technologies may be better suited to each location in the cloud. There was a lot of discussion of the value (or lack thereof) of having OpenStack at the edge, and whether it made sense for edge devices to be orchestrated by OpenStack (or not).
  • I think an enormous amount of time was spent on debating whether or not OpenStack could be made to fit on a system with limited resources and I found this discussion to be rather tiring. After all, OpenStack runs fine on a little raspberry-pi and for a deployment where there will be relatively few OpenStack operations (instance, volume, security group creation, update, deletetion) the limited resources at the edge should be more than sufficient.
  • There are different use-cases for edge-computing and NFV/VNF are not the only ones, and while they may be the early movers into this space, they may be unrepresentative of the larger market opportunity presented by the edge.

There is a lot of activity going on in the edge computing space and many of the things we’re doing at Verizon fall into that category. There were several sessions that showcased some of the things that we have been doing, AT&T had a couple of sessions describing their initiatives in the space as well.

There was a very interesting discussion of the edge computing use-cases and the etherpad for that session can be found here.

Some others who attended the session also posted summaries on their blogs. This one from Chris Dent provides a good summary.

A conclusion/wrap-up session identified some clear follow-up activities. The etherpad for that session can be found here.

Quick test drive of #amazon #ec2 Provisioned IOPS EBS volumes

After getting the email this morning about the new provisioned IOPS EBS volumes, I took a small test drive.

It is really easy to get yourself a provisioned IOPS volume; when creating the volume there’s a new selection.

One of the things that has long annoyed me about Amazon EC2 network and storage performance is that it is highly variable. The target for provisioned IOPS is exactly in the sweet spot of where I want it to be; database servers.

With provisioned IOPS, it appears that we’re seeing the first semblance of SLA’s or guaranteed quality of service for storage in the cloud. This is huge!

I’ve setup a multi-volume RAID set and am running performance tests and the numbers look good but what I like the most so far is that they are steady. That’s just awesome! More to come as I get the results.

Comparing parallel databases to sharding

I just posted an article comparing parallel databases to sharding on the ParElastic blog at http://bit.ly/JaMeVr

It was motivated by the fact that I’ve been asked a couple of times recently how the ParElastic architecture compares with sharding and it occurred to me this past weekend that

“Parallel Database” is a database architecture but sharding is an application architecture

Read the entire blog post here:

http://bit.ly/JaMeVr

Cloud CPU Cost over the years

Great article by Greg Arnette about the crashing cost of CPU Costs over the years, thanks to the introduction of the cloud.

http://www.gregarnette.com/blog/2011/11/a-brief-history-cloud-cpu-costs-over-the-past-5-years/

Personally, I think the most profound one was in December 2009 with the introduction of “spot pricing”.

Effectively you have an auction for the cost of an instance at any time and so long as the prevailing price is lower than the price you are willing to pay, you get to keep your instance.

Do one thing, and do it awesomely … Gimmebar!

From time to time you see a company come along that offers a simple product or service, and when they launch it just works.

The last time (that I can recall) when this happened was when I first used Dropbox. Download this little app and you got a 2GB drive in the cloud. And it worked on my Windows PC, on my Ubuntu PC, on my Android phone.

It just worked!

That was a while ago. And since then I’ve installed tons of software (and uninstalled 99% of it because it just didn’t work).

Last week I found Gimmebar.

There was no software to install, I just created an account on their web page. And it just worked!

What is Gimmebar? They consider themselves the 5th greatest invention of all time and they call themselves “a data steward”. I don’t know what that means. They also don’t tell you what the other 4 inventions are.

Here is how I would describe Gimmebar.

Gimmebar is a web saving/sharing tool that allows you to save things that you find interesting on the web in a nicely organised personal library in the cloud, and share some of that content with others if you so desire. They have something about saving stuff to your Dropbox account but I haven’t figured all of that out yet.

It has a bookmarklet for your browser, click it and things just get bookmarked and saved into your account.

But, it just worked!

I made a couple of collections, made one of them public and one of them shared.

If you share a collection it automatically gets a URL.

And that URL automatically supports an RSS Feed!

And they also backup your tweets, (I don’t give a crap about that).

So, what’s missing?

  • Some way to import all your stuff (from Google Reader)
  • An Android application (more generally, mobile application for platform of choice …)
  • The default ‘view’ on the collections includes previews; I will have enough crap before long where the preview will be a drag. How about a way to get just a list?
  • Saving a bookmark is right now at least a three click process; once you visit the site, click the bookmarklet and you get a little banner on the bottom of the screen, you click there to indicate whether you want the page to go to your private or public area, then you click the collection you want to store it in. This is functional but not easy to use.

I had one interaction with their support (little feedback tab on their page). Very quick to respond and they answered my question immediately.

On the whole, this feels like my first experience with Dropbox. Give it a shot, I think you’ll like it.

Why? Because Gimmebar set out to do one thing and they did it awesomely. It just worked!

Database scalability myth (again)

A common myth that has been perpetrated is that relational database do not scale beyond two or three nodes. That, and the CAP Theorem are considered to be the reason why relational databases are unscalable and why NoSQL is the only feasible solution!

I ran into a very thought provoking article that makes just this case yesterday. You can read that entire post here. In this post, the author Srinath Perera provides an interesting template for choosing the data store for an application. In it, he makes the case that relational databases do not scale beyond 2 or 5 nodes. He writes,

The low scalability class roughly denotes the limits of RDBMS where they can be scaled by adding few replicas. However, data synchronization is expensive and usually RDBMSs do not scale for more than 2-5 nodes. The “Scalable” class roughly denotes data sharded (partitioned) across many nodes, and high scalability means ultra scalable systems like Google.

In 2002, when I started at Netezza, the first system I worked on (affectionately called Monolith) had almost 100 nodes. The first production class “Mercury” system had 108 nodes (112 nodes, 4 spares). By 2006, the systems had over 650 nodes and more recently much larger systems have been put into production. Yet, people still believe that relational databases don’t scale beyond two or three nodes!

Systems like ParElastic (Elastic Transparent Sharding) can certainly scale to much more than two or three nodes, and I’ve run prototype systems with upto 100 nodes on Amazon EC2!

Srinath’s post does contain an interesting perspective on unstructured and semi-structured data though, one that I think most will generally agree with.

All you ever wanted to know about the CAP Theorem but were scared to ask!

I just posted a longish blog post (six parts actually) about the CAP Theorem at the ParElastic blog.

http://www.parelastic.com/database-architectures/an-analysis-of-the-cap-theorem/

-amrith

A Report from Boston’s First “Big Data Summit”

A short write-up about last night’s Big Data Summit appeared on xconomy today.

My thanks to our sponsors, Foley Hoag LLP and the wonderful team at the Emerging Enterprise Center, Infobright, Expressor Software, and Kalido.

Boston Big Data Summit Kickoff, October 22nd 2009

BBD_logoSince the announcement of the Boston Big Data Summit on the 2nd of October, we have had a fantastic response. The event sold out two days ago. We figured that we could remove the tables from the room and accommodate more people. And, we sold out again. The response has been fantastic!

If you have registered but you are not going to be able to attend, please contact me and we will make sure that someone on the waiting list is confirmed.

There has been some question about what “Big Data” is. Curt Monash who will be delivering the keynote and moderating the discussion at the event next week writes:

… where “Big Data” evidently is to be construed as anything from a few terabytes on up.  (Things are smaller in the Northeast than in California …)

Little FishBig FishWhen you catch a fish (whether it is the little fish on the left or the bigger fish on the right), the steps to prepare it for the table are surprisingly similar. You may have more work to do with the big fish and you may use different tools to do it with; but the things are the same.

So, while size influences the situation, it isn’t only about the size!

In my opinion, whether data is “Big” or not is more of a threshold discussion. Data is “Big” if the tools and techniques being used to acquire, cleanse, pre-process, store, process and archive, are either unable to keep up, or are not cost effective.

Yes, everything is bigger in California, even the size of the mess they are in. Now, that is truly a “Big Problem”!

The 50,000 row spreadsheet, the half a terabyte of data in SQL Server, or the 1 trillion row table on a large ADBMS are all, in their own ways, “Big Data” problems.

The user with 50k rows in Excel may not want  ( or be able to afford ) a solution with a “real database”, and may resort to splitting the spreadsheet into two sheets. The user with half a terabyte of SQL Server or MySQL data may adopt some home-grown partitioning or sharding technique instead of upgrading to a bigger platform, and the user with a trillion CDR’s may reduce the retention period; but they are all responding to the same basic challenge of “Big Data”.

We now have three panelists:

It promises to be a fun evening.

I have some thoughts on subjects for the next meeting, if you have ideas please post a comment here.

Announcing the Boston Big Data Summit

Announcement for the kickoff of the Boston “Big Data Summit”. The event will be held on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA. Register at http://bigdata102209.eventbrite.com

BBD_logo

The Boston “Big Data Summit” will be holding its first meeting on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA.

The Boston area is home to a large number of companies involved in the collection, storage, analysis, data integration, data quality, master data management, and archival of “Big Data”. If you are involved in any of these, then the meeting of the Boston “Big Data Summit” is something you should plan to attend. Save the date!

The first meeting of the group will feature a discussion of “Big Data” and the challenges of “Big Data” analysis in the cloud.

Over 120 people signed up as of October 14th 2009.

There is a waiting list. If you are registered and won’t be able to attend, please contact me so we can allow someone on the wait list to attend instead.

Seating is limited so go online and register for the event at http://bigdata102209.eventbrite.com.

The Boston “Big Data Summit” thanks the Emerging Enterprise Center at Foley Hoag LLP for their support and assistance in organizing this event.

Agenda Updated

The Boston “Big Data Summit” is being sponsored by Foley Hoag LLP, Infobright, Expressor  Software, and Kalido

For more information about the Boston “Big Data Summit” please contact the group administrator at boston.bigdata@gmail.com


The Boston Big Data Summit is organized by Bob Zurek and Amrith (me) in partnership with the Emerging Enterprise Center at Foley Hoag LLP.


Desktop Email Client vs. GMail: Why desktop mail clients are still better than the GMail interface

The GMail user interface, while very good and much better than some of the others lacks some useful functionality to make it a complete replacement for a desktop email client like Outlook.

Joe Kissell writes in CIO magazine about the six reasons why desktop email clients still rule. He opines that he would take a desktop email client any day and provides the following reason, and six more:

Well, there is the issue of outages like the one Gmail experienced this week. I like to be able to access my e-mail whenever I want. But beyond that, webmail still lags far behind desktop clients in several key areas.

Much has been written by many on this subject. As long ago as 2005, Cedric pronounced his verdict. Brad Shorr had a more measured comparison that I read before I made the switch about a month ago. Lifehacker pronounced the definitive comparison (IMHO it fell flat, their verdicts were shallow). Rakesh Agarwal presented some good insights and suggestions.

I read all of this and tried to decide what to do about a month ago. Here is a description of my usage.

My Usage

1. Email Accounts

I have about a dozen. Some are through GMail, some are on domains that I own, one is at Yahoo, one at Hotmail and then there are a smattering of others like aol.com, ZoHo and mail.com. While employed (currently a free agent) I have always had an Exchange Server to look at as well.

2. Email volume

Excluding work related email, I receive about 20 or 30 messages a day (after eliminating SPAM).

3. Contacts

I have about 1200 contacts in my address book.

4. Mobile device

I have a Windows Mobile based phone and I use it for calendaring, email and as a telephone. I like to keep my complete contact list on my phone.

5. Access to Email

I am NOT a Power-User who keeps reading email all the time (there are some who will challenge this). If I can read my email on my phone, I’m usually happy. But, I prefer a big screen view when possible.

6. I like to use instant messengers. Since I have accounts on AOL IM, Y!, HotMail and Google, I use a single application that supports all the flavors of IM.

Seems simple enough, right? Think again. Here is why, after migrating entirely to GMail, I have switched back to a desktop client.

The Problem

1. Google Calendar and Contact Synchronization is a load of crap.

Google does somethings well. GMail (the mail and parts of the interface) are one of these things. They support POP and IMAP, they support consolidation of accounts through POP or IMAP, they allow email to be sent as if from another account. They are far ahead of the rest. With Google Labs you can get a pretty slick interface. But, Calendar and Contact Synchronization really suck.

For example, I start off with 1200 contacts and synchronize my mobile device with Google. How do I do it? By creating an Exchange Server called m.google.com and having it provide Calendar and Contacts. You can read about that here. After synchronizing the two, I had 1240 or so contacts on my phone. Ok, I had sent email to 40 people through GMail who were not in my address book. Great!

Then I changed one persons email address and the wheels came off the train. It tried to synchronize everything and ended up with some errors.

I started off with about 120 entries in my calendar after synchronizing every hour, I now have 270 or so. Well, each time it felt that contacts had been changed, it refreshed them and I now have seventeen appointments tomorrow saying it is someones birthday. Really, do I get extra cake or something?

2. Google Chat and Contact Synchronization don’t work well together.

After synchronizing contacts my Google Chat went to hell in a hand-basket. There’s no way to tell why, I just don’t see anyone in my Google Chat window any more.

Google does some things well. The GMail server side is one of them. As Bing points out, Google Search returns tons of crap (not that Bing does much better). Calendar, Contacts and Chat are still not in the “does well” category.

So, it is back to Outlook Calendar and Contacts and POP Email. I will get all the email to land in my GMail account though, nice backup and all that. But GMail Web interface, bye-bye. Outlook 2007 here I come, again.

The best of both worlds

The stable interface between a phone and Outlook, a stable calendar, contacts and email interface (hangs from time to time but the button still works), and a nice online backup at Google. And, if I’m at a PC other than my own, the web interface works in a pinch.

POP all mail from a variety of accounts into one GMail account and access just that one account from both the telephone and the desktop client. And install that IM client application again.

What do I lose? The threaded message format that GMail has (that I don’t like). Yippie!

Boston Cloud Services meetup yesterday

summary of boston cloud services meetup yesterday.

Tsahy Shapsa of aprigo organized the second Boston Cloud Services meetup yesterday. There were two very informative presentations, the first by Riki Fine of EMC on the EMC Atmos project and the second by Craig Halliwell from TwinStrata.

What I learnt was that Atmos was EMC’s entry into the cloud arena. The initial product was a cloud storage offering with some additional functionality over other offerings like Amazon’s. Key product attributes appear to be scalability into the petabytes, policy and object metadata based management, multiple access methods (CIFS/NFS/REST/SOAP), and a common “unified namespace” for the entire managed storage. While the initial offering was for a cloud storage offering, there was a mention of a compute offering in the not too distant future.

In terms of delivery, EMC has setup its own data centers to host some of the Atmos clients. But, they have also partnered with other vendors (AT&T was mentioned) who would provide an cloud storage offerings that exposed the Atmos API. AT&T’s web page reads

AT&T Synaptic Cloud Storage uses the EMC Atmos™ backend to deliver an enterprise-grade global distribution system. The EMC Atmos™ Web Services API is a Web service that allows developers to enable a customized commodity storage system over the Internet, VPNs, or private MPLS connectivity.

I read this as a departure from the approach being taken by the other vendors. I don’t believe that other offerings (Amazon, Azure, …) provide a standardized API and allow others to offer cloud services compliant to that interface. In effect, I see this as an opportunity to create a marketplace for “plug compatible” cloud storage. Assume that a half dozen more vendors begin to offer Atmos based cloud storage, each offering a different location, SLA’s and price point, an end user has the option to pick and choose from that set. To the best of my knowledge, today the best one can do is pick a vendor and then decide where in that vendor’s infrastructure the data would reside.

Atmos also seems to offer some cool resiliency and replication functionality. An application can leverage a collection of Atmos storage providers. Based on policy, an object could be replicated (synchronously or asynchronously) to multiple locations on an Atmos cloud with the options of having some objects only within the firewall and others being replicated outside the firewall.

Enter TwinStrata who are an Atmos partner. They have a cool iSCSI interface to the Atmos cloud storage. With a couple of clicks of a mouse, they demonstrated the creation of a small Atmos based iSCSI block device. Going over to a windows server machine and rescanning disks they found the newly created volume. A couple of clicks later there was a newly minted “T:” that the application could use, just as it would a piece of local storage. TwinStrata provides some additional caching and ease of use features. We saw the “ease of use” part yesterday. The demo lasted a couple of minutes and no more than about a dozen mouse clicks. The version that was demo’ed was the iSCSI interface, there was talk of a file system based interface in the near future.

Right now, all of these offerings are expected to be for Tier-3 storage. Over time, there is a belief that T2 and T1 will also use this kind of infrastructure.

Very cool stuff! If you are in the Boston area and are interested in the Cloud paradigm, definitely check out the next event on Sept 23rd.

Pizza and refreshments were provided by Intuit. If you haven’t noticed, the folks from Intuit are doing a magnificent job fostering these kinds of events all over the Boston Area. I have attended several excellent events that they have sponsored. A great big “Thank You” to them!

Finally, a big “Thank You” to Tsahy and Aprigo for arranging this meetup and offering their premises for the meetings.

Not so fast, maybe relational databases aren’t dead!

Maybe the obituary announcing the demise of the relational database was premature!

Much has been written recently about the demise (or in some cases, the impending demise) of the relational database. “Relational databases are dead” writes Savio Rodrigues on July 2nd, I guess I missed the announcement and the funeral in the flood of emails and twitters about another high profile demise.

Some days ago, Michael Stonebraker authored an article with the title, “The End of a DBMS Era (Might be Upon Us)”. In September 2007 he made a similar argument in this article, and also in this 2005 paper with Uğur Çetintemel.

What Michael says here is absolutely true. And, in reality, Savio’s article just has a catchy title (and it worked). The body of the article makes a valid argument that there are some situations where the current “one size fits all” relational database offering that was born in the OLTP days may not be adequate for all data management problems.

So, let’s be perfectly clear about this; the issue isn’t that relational databases are dead. It is that a variety of use use cases are pushing the current relational database offerings to their limits.

I must emphasize that I consider relational databases (RDBMS’s) to be those systems that use a relational model (a definition consistent with http://en.wikipedia.org/wiki/Relational_database). As a result, columnar (or vertical) representations, row (or horizontal) representations, systems with hardware acceleration (FPGA’s, …) are all relational databases. There is arguably some confusion in terminology in the rest of this post, especially where I quote others who tend to use the term “Relational Database” more narrowly, so as to create a perception of differentiation between their product (columnar, analytic, …) and the conventional row oriented database which they refer to as an RDBMS.

Tony Bain begins his three part series about the problem with relational databases with an introduction where he says

“The specialist solutions have be slowly cropping up over the last 5 years and now today it wouldn’t be that unusual for an organization to choose a specialist data analytics database platform (such as those offered from Netezza, Greenplum, Vertica, Aster Data or Kickfire) over a generic database platform offered by IBM, Microsoft, Oracle or Sun for housing data for high end analytics.”

While I have some issues with his characterization of “specialist analytic database platforms” as something other than a Relational Database, I assume that he is using the term RDBMS to refer to the commonly available (general purpose) databases that are most often seen in OLTP environments.

I believe that whether you refer to a column oriented architecture (with or without compression), an architecture that uses hardware acceleration (Kickfire, Netezza, …) or a materialized view, you are attempting to address the same underlying issue; I/O is costly and performance is significantly improved when you reduce the I/O cost. Columnar representations significantly reduce I/O cost by not performing DMA on unnecessary columns of data. FPGA’s in Netezza serve a similar purpose; (among other things) they perform projections thereby reducing the amount of data that is DMA’ed. A materialized view with only the required columns (narrow table, thin table) serves the same purpose. In a similar manner (but for different reasons), indexes improve performance by quickly identifying the tuples that need to be DMA’ed.

Notice that all of these solutions fundamentally address one aspect of the problem; how to reduce the cost of I/O. The challenges that are facing databases these days are somewhat different. In addition to huge amounts of data that are being amassed (The Richard Winter article on the subject) there is a much broader variety of things that are being demanded of the repository of that information. For example, there is the “Search” model that has been discussed in a variety of contexts (web, peptide/nucleotide), the stream processing and data warehousing cases that have also received a fair amount of discussion.

Unlike the problem of I/O cost, many of these problems reflect issues with the fundamental structure and semantics of the SQL language. Some of these issues can be addressed with language extensions, User Defined Functions, MapReduce extensions and the like. But none of these address the underlying issue that the language and semantics were defined for a class of problems that we today come to classify as the “OLTP use case”.

Relational databases are not dead; on the contrary with the huge amounts of information that are being handled, they are more alive than ever before. The SQL language is not dead but it is in need of some improvements. That’s not something new; we’ve seen those in ’92, ’99, … But, more importantly the reason why the Relational Database and SQL have survived this long is because it is widely used and portable. By being an extensible and descriptive language, it has managed to adapt to many of the new requirements that were placed on it.

And if the current problems are significant, two more problems are just around the problem and waiting to rear their ugly heads. The first is the widespread adoption of the virtualization and the abstraction of computing resources. In addition to making it much hardware to adopt solutions with custom hardware (that cannot be virtualized), it introduces a level of unpredictability in I/O bandwidth, latency and performance. Right along with this, users are going to want the database to live on the cloud. With that will come all the requirements of scalability, ease of use and deployment that one associates with a cloud based offering (not just the deployment model). The second is the fact that users will expect one “solution” to meet a wide variety of demands including the current OLTP and reporting through the real time alerting that today’s “Google/Facebook/Twitter Generation” has come to demand (look-ma-no-silos).

These problems are going to drive a round of innovation, and the NoSQL trend is a good and healthy trend. In the same description of all the NoSQL and analytics alternatives, one should also mention the various vendors who are working on CEP solutions. As a result of all of these efforts, Relational Databases as we know them today (general purpose OLTP optimized, small data volume systems) will evolve into systems capable of managing huge volumes of data in a distributed/cloud/virtualized environment and capable of meeting a broad variety of consumer demands.

The current architectures that we know of (shared disk, shared nothing, shared memory) will need to be reconsidered in a virtualized environment. The architectures of our current databases will also need some changes to address the wide variety of consumer demands. Current optimization techniques will need to be adapted and the underlying data representations will have to change. But, in the end, I believe that the thing that will decide the success or failure of a technology in this area is the extent of compatibility and integration with the existing SQL language. If the system has a whole new set of semantics and is fundamentally incompatible with SQL I believe that adoption will slow. A system that extends SQL and meets these new requirements will do much better.

Relational Databases aren’t dead; the model of “one-size-fits-all” is certainly on shaky ground! There is a convergence between the virtualization/cloud paradigms, the cost and convenience advantages of managing large infrastructures in that model and the business need for large databases.

Fasten your seat-belts because the ride will be rough. But, it is a great time to be in the big-data-management field!


Boston Cloud Services- June Meetup.

Boston Cloud Services- June Meetup.

Tsahy setup a meetup group for Cloud Services at http://www.meetup.com/Boston-cloud-services/. The first meeting is today, check out the meeting link at

Boston Cloud Services- June Meetup.

Location

460 Totten Pond rd
suite 660
Waltham, MA 02451

All,
We have a great agenda for this 1st Boston cloud services meetup!& broadcasting live on http://www.stickam.co…

1. Tsahy Shapsa – 15 minutes- a case study of an early stage start-up and talk about what it’s like to build a new business now days, with all this cloud stuff going around. covering where we’re using cloud/SaaS to run our business,operations,IT etc, where we’re not and why, challenges that we faced / are facing etc. We can have an open discussions on the good,bad & ugly and I wouldn’t mind taking a few tips from the audience…

2. John Bennett – 30 minutes will give a talk on separating fact from fiction in the cloud computing market. John is the former marketing director of a cloud integration vendor (SnapLogic), and have been watching this market closely for a couple of years now.
Blog: http://bestrategic.blogspot.com.
bio here: http://www.bennettstr…

3. Mark E. Hodapp – 30 minutes – ‘Competing against Amazon’s EC2’
Mark was Director R&D / CTO at Sun microsystems where led a team of 20 engineers working on an advanced research effort,Project Caroline, a horizontally scalable platform for the development
and deployment of Internet services.

No cloud in sight!

The conventional wisdom at the beginning of ’09 was that the economic downturn would catapult cloud adoption but that hasn’t quite happened. This post explores trends and possible reasons for the slow adoption as well as what the future may hold.

A lot has been written in the past few days about Cloud Computing adoption based on a survey by ITIC (http://www.itic-corp.com/). At the time of this writing, I haven’t been able to locate a copy of this report or a link with more details online but most articles referencing this survey quote Laura DiDio as saying,

“An overwhelming 85% majority of corporate customers will not implement a private or public cloud computing infrastructure in 2009 because of fears that cloud providers may not be able to adequately secure sensitive corporate data”.

In another part of the country, structure09 had a lot of discussion about Cloud Computing. Moderating a panel of VC’s, Paul Kedrosky asked for a show of hands of VC’s who run their business on the cloud. To quote Liz Gannes,

“Let’s just say the hands did not go flying up”.

Elsewhere, a GigaOM report by George Gilbert and Juergen Urbanski conclude that leading storage vendors are planning their innovation around a three year time frame, expecting adoption of new storage technologies to coincide with emergence from the current recession.

My point of view

In the short term, services that are already “networked” will begin to migrate into the cloud. The migration may begin at the individual and SMB end of the market rather than at the Fortune 100. Email and CRM applications will be the poster-children for this wave.

PMCrunch also lists some SMB ERP solutions that will be in this early wave of migration.

But, this wave will primarily target the provision of application services through a different delivery model (application hosted on a remote server instead of a corporate server).

It will be a while before cloud based office applications (word-processing, spreadsheets, presentations) become mainstream. The issue is not so much security as it is network connectivity. The cloud is useless to a person who is not on the network and until ubiquitous high bandwidth network connectivity is available everywhere, and at an accessible and reasonable cost, the cloud platform will not be able to move forward.

We are beginning to see increased adoption in Broadband WiFi or Cellular Data in the US but the costs are still too high and service is still insufficient. Just ask anyone who has tried to get online at one of the many airports and hotels in the US.

Gartner highlights five key attributes of Cloud Computing.

  1. Uses Internet Technologies
  2. Service Based
  3. Metered by Use
  4. Shared
  5. Scalable and Elastic

Note that I have re-ordered them into what I believe is the order in which cloud adoption will progress. The early adoption will be in applications that “Uses Internet Technologies” and “Service Based” and the last will be “Scalable and Elastic”.

As stated above, the early adopters will deploy applications with a clearly defined and “static” set of deliverables in areas that currently require the user to have network connectivity (i.e. do no worse than current, change the application delivery model from in-house to hosted). In parallel, corporations will begin to deploy private clouds for use within their firewalls.

As high bandwidth connectivity is more easily available adoption will increase, currently I think that is the real limitation.

Data Security will be built along the way, as will best practices on things like Escrow and mechanisms to migrate from one service provider to another.

Two other things that could kick cloud adoption into high gear are

  1. the delivery of a cloud platform from a company like Akamai (why hasn’t this happened yet?)
  2. a mechanism that would allow applications to scale based on load and use the right amount of cloud resource. Applications like web servers can scale based on client demand but this isn’t (yet) the case with other downstream services like databases or mail servers.

That’s my point of view, and I’d love to hear yours especially in the area of companies that are addressing the problem of providing a cloud user the ability to migrate from one provider to another, or mechanisms to dynamically scale services like databases and mail servers.