On June 22, Curt Monash posted an interesting entry on his blog about TPC-H in the wake of an announcement by ParAccel. On the same day, Merv Adrian posted another take on the same subject on his blog.
Let me begin with a couple of disclaimers.
First, I am currently employed by Dataupia, I used to be employed at Netezza in the past. I am not affiliated with ParAccel in any way, nor Sun, nor the TPC committee, nor the Toyota Motor Corporation, the EPA, nor any other entity related in any way with this discussion. And if you are curious about my affiliations with any other body, just ask.
Second, this blog is my own and does not represent or intend to represent the point of view of my employer, former employer(s), or any other person or entity than myself. Any resemblance to the opinions or points of view of anyone other than myself are entirely coincidental.
As with any other benchmark, TPC-H only serves to illustrate how well or poorly a system was able to process a specified workload. If you happen to run a data warehouse that tracks parts, orders, suppliers, and lineitems in orders in 25 countries and 5 nations that resemble the TPC-H specification, your data warehouse may look something like the one specified in the benchmark specification. And if your business problems are similar to the twenty something queries that are presented in the specification, you can leverage hundreds of person-hours of free tuning advice given to you by the makers of most major databases and hardware.
In that regard, I feel that excellent performance on a published TPC-H benchmark does not guarantee that the same configuration would work well in my data warehouse environment.
But, if I understand correctly, the crux of the argument that Curt makes is that the benchmark configurations are bloated (and he cites the following examples)
- 43 nodes to run the benchmark at SF 30,000
- each node has 64 GB of RAM (total of over 2.5TB of RAM)
- each node has 24 TB of disk (total of over 900TB of disk)
which leads to a “RAM:DATA ratio” of approximately 1:11 and a “DISK:DATA ratio” of approximately 32:1.
Let’s look at the DISK:DATA ratio first
What no one seems to have pointed out (and I apologize if I didn’t catch it in the ocean of responses) is that this 32:1 DISK:DATA ratio is the ratio between total disk capacity and data and therefore includes overheads.
First, whether it is in a benchmark context or a real life situation, one expects data protection in one form or another. The benchmark report indicates that the systems used RAID 0 and RAID 1 for various components. So, at the very least, the number should be approximately 16:1. In addition, the same disk space is also used for the Operating System, Operating System Swap as well as temporary table space. Therefore, I don’t know whether it is reasonable to assume that even with good compression, a system would acheive a 1:1 ratio between data and disk space but I would like to know more about this.
“By way of contrast, real-life analytic DBMS with good compression often have disk/data ratios of well under 1:1.”
Leaving the issue of DISK:DATA ratio aside, one thing that most performance tuning looks at is the number of “spindles”. And, having a large number of spindles is a good thing for performance whether it is in a benchmark or in real life. Given current disk drive prices, it is reasonable to assume that a pre-configured server comes with 500GB drives, as is the case with the Sun system that was used in the ParAccel benchmark. If I were to purchase a server today, I would expect either 500GB drives or 1TB drives. If it were necessary to have a lower DISK:DATA ratio and reducing that ratio had some value in real life, maybe the benchmark could have been conducted with smaller disk drives.
Reading section 5.2 of the Full Disclosure Report it is clear that the benchmark did not use all 900 or so Terabytes of data. If I understand the math in that section correctly, the benchmark is using the equivalent of 24 drives and 50GB per drive on each node for data. That is a total of approximately 52TB of storage set aside for the database data. That’s pretty respectable! Richard Gostanian in his post to Curt’s blog (June 24th, 2009 7:34 am) indicates that they only needed about 20TB of data. I can’t reconcile the math but we’re at least in the right ball-park.
And as for the RAM:DATA ratio, the ratio is 1:11. I find it hard to understand how the benchmark could have run entirely from RAM as conjectured by Curt.
“And so I conjecture that ParAccel’s latest TPC-H benchmark ran (almost) entirely in RAM as well.”
From my experience in sizing systems, one looks at more things than just the physical disk capacity. One should also consider things like concurrency, query complexity, and expected response times. I’ve been analyzing TPC-H numbers (for an unrelated exercise) and I will post some more information from that analysis over the next couple of weeks.
On the whole, I think TPC-H performance numbers (QPH, $/QPH) are as predictive of system performance in a specific data warehouse implementation as the EPA ratings on cars are of actual mileage that one may see in practice. If available, they may serve as one factor that a buyer could consider in a buying decision. In addition to reviewing the mileage information for a car, I’ll also take a test drive, speak to someone who drives the same car, and if possible rent the same make and model for a weekend to make sure I like it. I wouldn’t rely on just the EPA ratings so why should one assume that a person purchasing a data warehouse would rely solely on TPC-H performance numbers?
As an aside, does anyone want to buy a 2000 Toyota Sienna Mini Van? It is white in color and gave 22.4 mpg over the last 2000 or so miles.
Thanks for a thoughtful analysis. Separate additional conversations have included assertions that while prior TPC-H benchmarks did run in memory, this one did not, and “surprised” the players, from what I was told, by how well they did.
LikeLike
Hiya.
Yep, my guess on RAM as wrong.
But the number of spindles is way higher than on most comparable systems. E.g., Aster’s new hardware-heavy data warehouse appliance has vastly fewer spindles per terabyte of user data even with compression turned off. Turn it back on and you’re back to utter absurdity again.
And by the way, lots of data warehouse appliances are built with SAS drives of 300 gigs or less.
Throwing huge numbers of spindles at a contrived benchmark, while not technically fraud, is highly misleading. Yet that’s exactly what the TPC encourages.
ParAccel is worse than others in this regard only insofar as it garners attention for a TPC result. Well, there’s also the bit about them writing new code to make the benchmark work and then advertising “load and go” ease of running it …
Cheers,
CAM
LikeLike
Curt,
You may be correct about the fact that this benchmark has too many spindles involved, that isn’t something that one can glean easily from the TPC-H table/xls.
But looking quickly at the other “large” benchmarks, the other 30TB benchmark had 3072 drives, and similarly the first of the 10TB benchmarks (http://www.tpc.org/results/individual_results/HP/HP-10TB-080310-tpch.es.pdf) also had 3072 drives.
This is *NOT* an exhaustive analysis, but just the first two results that I looked at.
That may be a good thing to analyse, just as an indication of trends.
Thanks for your comment.
-amrith
LikeLike