Boston Big Data Summit Kickoff, October 22nd 2009

BBD_logoSince the announcement of the Boston Big Data Summit on the 2nd of October, we have had a fantastic response. The event sold out two days ago. We figured that we could remove the tables from the room and accommodate more people. And, we sold out again. The response has been fantastic!

If you have registered but you are not going to be able to attend, please contact me and we will make sure that someone on the waiting list is confirmed.

There has been some question about what “Big Data” is. Curt Monash who will be delivering the keynote and moderating the discussion at the event next week writes:

… where “Big Data” evidently is to be construed as anything from a few terabytes on up.  (Things are smaller in the Northeast than in California …)

Little FishBig FishWhen you catch a fish (whether it is the little fish on the left or the bigger fish on the right), the steps to prepare it for the table are surprisingly similar. You may have more work to do with the big fish and you may use different tools to do it with; but the things are the same.

So, while size influences the situation, it isn’t only about the size!

In my opinion, whether data is “Big” or not is more of a threshold discussion. Data is “Big” if the tools and techniques being used to acquire, cleanse, pre-process, store, process and archive, are either unable to keep up, or are not cost effective.

Yes, everything is bigger in California, even the size of the mess they are in. Now, that is truly a “Big Problem”!

The 50,000 row spreadsheet, the half a terabyte of data in SQL Server, or the 1 trillion row table on a large ADBMS are all, in their own ways, “Big Data” problems.

The user with 50k rows in Excel may not want  ( or be able to afford ) a solution with a “real database”, and may resort to splitting the spreadsheet into two sheets. The user with half a terabyte of SQL Server or MySQL data may adopt some home-grown partitioning or sharding technique instead of upgrading to a bigger platform, and the user with a trillion CDR’s may reduce the retention period; but they are all responding to the same basic challenge of “Big Data”.

We now have three panelists:

It promises to be a fun evening.

I have some thoughts on subjects for the next meeting, if you have ideas please post a comment here.

%d bloggers like this: