Learning from Joost

A lot of interesting articles have emerged in the wake of the recent happenings at Joost.

One post that caught my attention and deserves reading, and re-reading, and then reading one more time just to be sure is Ed Sim’s post, where he writes

raising too much money can be a curse and not a blessing

A lot to learn from in all of these articles.

Boston Cloud Services- June Meetup.

Boston Cloud Services- June Meetup.

Tsahy setup a meetup group for Cloud Services at http://www.meetup.com/Boston-cloud-services/. The first meeting is today, check out the meeting link at

Boston Cloud Services- June Meetup.

Location

460 Totten Pond rd
suite 660
Waltham, MA 02451

All,
We have a great agenda for this 1st Boston cloud services meetup!& broadcasting live on http://www.stickam.co…

1. Tsahy Shapsa – 15 minutes- a case study of an early stage start-up and talk about what it’s like to build a new business now days, with all this cloud stuff going around. covering where we’re using cloud/SaaS to run our business,operations,IT etc, where we’re not and why, challenges that we faced / are facing etc. We can have an open discussions on the good,bad & ugly and I wouldn’t mind taking a few tips from the audience…

2. John Bennett – 30 minutes will give a talk on separating fact from fiction in the cloud computing market. John is the former marketing director of a cloud integration vendor (SnapLogic), and have been watching this market closely for a couple of years now.
Blog: http://bestrategic.blogspot.com.
bio here: http://www.bennettstr…

3. Mark E. Hodapp – 30 minutes – ‘Competing against Amazon’s EC2’
Mark was Director R&D / CTO at Sun microsystems where led a team of 20 engineers working on an advanced research effort,Project Caroline, a horizontally scalable platform for the development
and deployment of Internet services.

Risks of mind controlled vehicles

Some risks of mind controlled vehicles are described.

It is now official, Toyota has announced that they are developing a mind controlled wheel chair. Unless Toyota is planning to take the world by storm by putting large numbers of motorized wheelchairs on the roads, I suspect this technology will soon make its way into their other products, maybe cars?

Of course, this will have some risks

  1. Cars may inherit drivers personality (ouch, there are some people who shouldn’t be allowed to drive)
  2. Cars with hungry drivers will make unpredictable turns into drive through lanes
  3. Car could inadvertently eject annoying back seat drivers
  4. Mortality rates for cheating husbands may go up (see http://www.thebostonchannel.com/news/1965115/detail.html, http://www.click2houston.com/news/1576669/detail.html, http://www.freerepublic.com/focus/news/840828/posts and a thousand other such posts)
  5. It would be really hard to administer a driving test! Would the ability to think be a requirement for drivers?

Can you think of other risks? I didn’t think I would come to this conclusion but I kind of like the way things are right now where my car has a mind of its own.

Is TPC-H really a blight upon the industry?

A recap of some posts (Curt Monash, Merv Adrian) about the ParAccel TPC-H 30 TB benchmark numbers.

On June 22, Curt Monash posted an interesting entry on his blog about TPC-H in the wake of an announcement by ParAccel. On the same day, Merv Adrian posted another take on the same subject on his blog.

Let me begin with a couple of disclaimers.

First, I am currently employed by Dataupia, I used to be employed at Netezza in the past. I am not affiliated with ParAccel in any way, nor Sun, nor the TPC committee, nor the Toyota Motor Corporation, the EPA, nor any other entity related in any way with this discussion. And if you are curious about my affiliations with any other body, just ask.

Second, this blog is my own and does not represent or intend to represent the point of view of my employer, former employer(s), or any other person or entity than myself. Any resemblance to the opinions or points of view of anyone other than myself are entirely coincidental.

As with any other benchmark, TPC-H only serves to illustrate how well or poorly a system was able to process a specified workload. If you happen to run a data warehouse that tracks parts, orders, suppliers, and lineitems in orders in 25 countries and 5 nations that resemble the TPC-H specification, your data warehouse may look something like the one specified in the benchmark specification. And if your business problems are similar to the twenty something queries that are presented in the specification, you can leverage hundreds of person-hours of free tuning advice given to you by the makers of most major databases and hardware.

In that regard, I feel that excellent performance on a published TPC-H benchmark does not guarantee that the same configuration would work well in my data warehouse environment.

But, if I understand correctly, the crux of the argument that Curt makes is that the benchmark configurations are bloated (and he cites the following examples)

  • 43 nodes to run the benchmark at SF 30,000
  • each node has 64 GB of RAM (total of over 2.5TB of RAM)
  • each node has 24 TB of disk (total of over 900TB of disk)

which leads to a “RAM:DATA ratio” of approximately 1:11 and a “DISK:DATA ratio” of approximately 32:1.

Let’s look at the DISK:DATA ratio first

What no one seems to have pointed out (and I apologize if I didn’t catch it in the ocean of responses) is that this 32:1 DISK:DATA ratio is the ratio between total disk capacity and data and therefore includes overheads.

First, whether it is in a benchmark context or a real life situation, one expects data protection in one form or another. The benchmark report indicates that the systems used RAID 0 and RAID 1 for various components. So, at the very least, the number should be approximately 16:1. In addition, the same disk space is also used for the Operating System, Operating System Swap as well as temporary table space. Therefore, I don’t know whether it is reasonable to assume that even with good compression, a system would acheive a 1:1 ratio between data and disk space but I would like to know more about this.

“By way of contrast, real-life analytic DBMS with good compression often have disk/data ratios of well under 1:1.”

Leaving the issue of DISK:DATA ratio aside, one thing that most performance tuning looks at is the number of “spindles”. And, having a large number of spindles is a good thing for performance whether it is in a benchmark or in real life. Given current disk drive prices, it is reasonable to assume that a pre-configured server comes with 500GB drives, as is the case with the Sun system that was used in the ParAccel benchmark. If I were to purchase a server today, I would expect either 500GB drives or 1TB drives. If it were necessary to have a lower DISK:DATA ratio and reducing that ratio had some value in real life, maybe the benchmark could have been conducted with smaller disk drives.

Reading section 5.2 of the Full Disclosure Report it is clear that the benchmark did not use all 900 or so Terabytes of data. If I understand the math in that section correctly, the benchmark is using the equivalent of 24 drives and 50GB per drive on each node for data. That is a total of approximately 52TB of storage set aside for the database data. That’s pretty respectable! Richard Gostanian in his post to Curt’s blog (June 24th, 2009 7:34 am) indicates that they only needed about 20TB of data. I can’t reconcile the math but we’re at least in the right ball-park.

And as for the RAM:DATA ratio, the ratio is 1:11. I find it hard to understand how the benchmark could have run entirely from RAM as conjectured by Curt.

“And so I conjecture that ParAccel’s latest TPC-H benchmark ran (almost) entirely in RAM as well.”

From my experience in sizing systems, one looks at more things than just the physical disk capacity. One should also consider things like concurrency, query complexity, and expected response times. I’ve been analyzing TPC-H numbers (for an unrelated exercise) and I will post some more information from that analysis over the next couple of weeks.

On the whole, I think TPC-H performance numbers (QPH, $/QPH) are as predictive of system performance in a specific data warehouse implementation as the EPA ratings on cars are of actual mileage that one may see in practice. If available, they may serve as one factor that a buyer could consider in a buying decision. In addition to reviewing the mileage information for a car, I’ll also take a test drive, speak to someone who drives the same car, and if possible rent the same make and model for a weekend to make sure I like it. I wouldn’t rely on just the EPA ratings so why should one assume that a person purchasing a data warehouse would rely solely on TPC-H performance numbers?

As an aside, does anyone want to buy a 2000 Toyota Sienna Mini Van? It is white in color and gave 22.4 mpg over the last 2000 or so miles.

No cloud in sight!

The conventional wisdom at the beginning of ’09 was that the economic downturn would catapult cloud adoption but that hasn’t quite happened. This post explores trends and possible reasons for the slow adoption as well as what the future may hold.

A lot has been written in the past few days about Cloud Computing adoption based on a survey by ITIC (http://www.itic-corp.com/). At the time of this writing, I haven’t been able to locate a copy of this report or a link with more details online but most articles referencing this survey quote Laura DiDio as saying,

“An overwhelming 85% majority of corporate customers will not implement a private or public cloud computing infrastructure in 2009 because of fears that cloud providers may not be able to adequately secure sensitive corporate data”.

In another part of the country, structure09 had a lot of discussion about Cloud Computing. Moderating a panel of VC’s, Paul Kedrosky asked for a show of hands of VC’s who run their business on the cloud. To quote Liz Gannes,

“Let’s just say the hands did not go flying up”.

Elsewhere, a GigaOM report by George Gilbert and Juergen Urbanski conclude that leading storage vendors are planning their innovation around a three year time frame, expecting adoption of new storage technologies to coincide with emergence from the current recession.

My point of view

In the short term, services that are already “networked” will begin to migrate into the cloud. The migration may begin at the individual and SMB end of the market rather than at the Fortune 100. Email and CRM applications will be the poster-children for this wave.

PMCrunch also lists some SMB ERP solutions that will be in this early wave of migration.

But, this wave will primarily target the provision of application services through a different delivery model (application hosted on a remote server instead of a corporate server).

It will be a while before cloud based office applications (word-processing, spreadsheets, presentations) become mainstream. The issue is not so much security as it is network connectivity. The cloud is useless to a person who is not on the network and until ubiquitous high bandwidth network connectivity is available everywhere, and at an accessible and reasonable cost, the cloud platform will not be able to move forward.

We are beginning to see increased adoption in Broadband WiFi or Cellular Data in the US but the costs are still too high and service is still insufficient. Just ask anyone who has tried to get online at one of the many airports and hotels in the US.

Gartner highlights five key attributes of Cloud Computing.

  1. Uses Internet Technologies
  2. Service Based
  3. Metered by Use
  4. Shared
  5. Scalable and Elastic

Note that I have re-ordered them into what I believe is the order in which cloud adoption will progress. The early adoption will be in applications that “Uses Internet Technologies” and “Service Based” and the last will be “Scalable and Elastic”.

As stated above, the early adopters will deploy applications with a clearly defined and “static” set of deliverables in areas that currently require the user to have network connectivity (i.e. do no worse than current, change the application delivery model from in-house to hosted). In parallel, corporations will begin to deploy private clouds for use within their firewalls.

As high bandwidth connectivity is more easily available adoption will increase, currently I think that is the real limitation.

Data Security will be built along the way, as will best practices on things like Escrow and mechanisms to migrate from one service provider to another.

Two other things that could kick cloud adoption into high gear are

  1. the delivery of a cloud platform from a company like Akamai (why hasn’t this happened yet?)
  2. a mechanism that would allow applications to scale based on load and use the right amount of cloud resource. Applications like web servers can scale based on client demand but this isn’t (yet) the case with other downstream services like databases or mail servers.

That’s my point of view, and I’d love to hear yours especially in the area of companies that are addressing the problem of providing a cloud user the ability to migrate from one provider to another, or mechanisms to dynamically scale services like databases and mail servers.

What’s next in tech? Boston, June 25th 2009

Recap of “What’s next in Tech”, Boston, June 25th 2009

What’s next in tech: Exploring the Growth Opportunities of 2009 and Beyond, June 25th 2009.

Scott Kirsner moderated two panels during the event; the first with three Veture Capitalists and the second with five entrepreneurs.

The first panel consisted of:

– Michael Greeley, General Partner, Flybridge Capital Partners

– Bijan Sabet, General Partner, Spark Capital

– Neil Sequiera, Managing Director, General Catalyst Partners

The second panel consisted of:

– Mike Dornbrook, COO, Harmonix Music Systems (makers of “Rock Band”)

– Helen Greiner, co-founder of iRobot Corp. and founder of The Droid Works

– Brian Halligan, social media expert and CEO of HubSpot

– Tim Healy, CEO of EnerNOC

– Ellen Rubin, Founder & VP/Product of CloudSwitch

Those who asked questions got a copy of Dan Bricklin’s book, Bricklin on Technology. There was also a DVD of some other book (I don’t know what it was) that was being given away.

Show of hands at the beginning was that 100% of the people were optimistic about the recovery and the future. Nice discussion though after leaving the session, I don’t get the feeling that anyone addressed the question “What’s next in tech” head on. Most of the conversation was about what has happened in tech and a lot of discussion about Twitter and Facebook. There were a lot of questions to the panel from the audience about what their companies were doing to help young entrepreneurs.And for an audience that was supposed to contain people interested in “what’s next”, no one wanted the DVD of the book, everyone wanted the paper copies. Hmm …

There has been a lot of interesting discussion about the topic and the event. One that caught my eye was Larry Cheng’s blog.

How many outstanding Shares in a start-up

Finding the number of outstanding shares in a start-up.

This question has come up quite often, most recently when I was having lunch with some co-workers last week.

While evaluating an offer at a start-up have you wondered whether your 10,000 shares was a significant chunk of money or a drop in the ocean?

Have you ever wondered how many shares are outstanding in a start-up you heard about?

This is a non-issue for a public company, the number of outstanding shares is a matter of public record. But not all start-up’s want to share this information with you.

I can’t say this for every state in the union but in MA, you can get a pretty good idea by looking at the Annual Report that every organization is required to file. It won’t be current information, the best you can do is get the most recent annual information and that means you can’t know anything about a very new company, but it is information that I would have liked to have had on some occasions in the past.

The link to search the Corporate Database is http://corp.sec.state.ma.us/corp/corpsearch/corpsearchinput.asp

Enter the name of the company and scroll to the bottom where you see a box that allows you to choose “Annual Reports”. Click that and you can read a recent Annual Report which will have the information you need.

If you find that this doesn’t work or is incomplete, please post your comments here. I’m sure others will appreciate the information.

And if you have information about other states, that is most welcome.