Just a quick post to upload three charts that help visualize the numbers that Curt and I have been referring to in our posts. Curt’s original post was, my post was.
The first chart shows the disk to data ratio that was mentioned. Note that the X-Axis showing TPC-H scale factor is a logarithmic scale.The benchmark information shows that the ParAccel solution has in excess of 900TB of storage for the 30TB benchmark, the ratio is therefore in excess of 30:1.
The second chart shows the memory to data ratio. Note that both the X and Y Axis are logarithmic scales. The benchmark information shows that the ParAccel solution has 43 nodes and approximately 2.7TB of RAM, the ratio is therefore approximately 1:11 (or 9%).
The third chart shows the load time (in hours) for various recorded results. The ParAccel results indicate a load time of 3.48 hours. Note again that the X-Axis is a logarithmic scale.
For easy reading, I have labeled the ParAccel 30TB value on the chart. I have to admit, I don’t understand Curt’s point. And maybe others share this bewilderment? I think I’ve captured the numbers correctly, could someone help verify these please.
If the images above are shown as thumbnails, you may not be able to see the point I’m trying to make. You need to see the bigger images to see the pattern.
In response to an email, I looked at the data again and got the following RANK() information. Of the 151 results available today, the ParAccel 30TB numbers are 58th in Memory to Data and 115th in Disk to Data. It is meaningless to compare load time ranks without factoring in the scale and I’m not going to bother with that as the sample size at SF=30,000 is too small.
If you are willing to volunteer some of your time to review the spreadsheet with all of this information, I am happy to send you a copy. Just ask!
16 thoughts on “More on TPC-H comparisons”
Can you please explain the relevance and formula for “58th in Memory to Data and 115th in Disk to Data”?
Here’s what I did. And to illustrate, let me use the example of memory to data.
I took the spreadsheet that the TPC gives us, you can get a copy here.
I then went and read each and every TPC result and figured out how many GB of memory the submission involved. Also, as part of the results, vendors have to provide the disk to data ratio.
So, the ParAccel system had 43 nodes, each with 64 GB of memory and disclosed a disk to data ratio of 32.04 (here).
Tabulating all of this information takes a while (suggestion to TPC committee, it would be nice if you included total memory, total disk and load time in your results spreadsheet).
Since I had all the information in a spreadsheet (Excel), I just RANK’ed the data.
Why go to all this trouble you ask?
Well, in addition to casting doubt on the TPC-H benchmark as a whole, the suggestion in recent discussions has been that ParAccel is in some way “worse than others”. Well, I have no axe to grind either way, and can’t care less whether ParAccel is worse or better than others. I don’t care whether others feel that the TPC-H benchmark is useful or not; I use it for some things and as others have pointed out, realize some shortcomings and peculiarities.
But, I was curious about the assertion re: ParAccel.
A total of 151 results are available in the TPC-H spreadsheet. If the ParAccel “RANK” was “close to 151”, I would have felt that the assertion “was reasonable” (note the use of loosey-goosey language). But, the data that I’ve gathered thus far does not bear out the assertion.
Also, I have been wrong in the past. And, I will be wrong in the future. The question is this, am I wrong this time?
The fact of the matter is that you simply can’t buy new small SATA disks. The servers they used were commodity servers which take SATA disks. The same number of spindles on SAS would be much smaller, closer to the ratios of previous benchmarks.
The ratio is just a number with no inherent meaning. They weren’t benchmarking for price/performance anyway. They were benchmarking for the most performance on the hardware they chose to use, and that meant they needed lots of cheap spindles, which today just happen to have large platters too, because they aren’t tiny 2.5″ SAS drives.
Just another $.02 from me.
Also, ask any DBA what he wants for Xmas. Pretty much no matter what database he is administering he will probably answer with considerable enthusiasm “more spindles!”
“We need more spindles”was my constant mantra at AdBrite.
There is no such thing as too few spindles if you are trying to maximum read performance.
And last, if you really want to complain about TPC-H and skewed numbers, look at recent results at 1TB from Oracle/Exadata. There are many queries that return in .1 seconds! This is because TPC-H doesn’t have enough permutations for input parameters into the queries, and many results are served wholesale out of the query cache. This artificially inflates their QPPH by a very big margin.
Great points, I agree with you up to a point about spindles but definitely agree with you about the drive sizes. As I said earlier in my post, I don’t care what the disk:data ratio is, I just did this little piece of investigating because I was curious about the claim that ParAccel was “worse than others”.
I will point out an interesting article I just read at Storage Sense (http://storage-sense.blogspot.com/2009/06/making-case-for-ssd-right-now.html) where the gent shows that with many fewer spindles one can get the same IOPS.
I’m not suggesting that your assertion is incorrect, just that there are other technologies than SATA. With FC for example, you get much faster (and also much lower capacity) drives. I haven’t thought to compare relative power usage or cooling for the drives. I only highlight the comments on Storage Sense because they contrast performance of SATA against FC.
My point, simply, is that ALL TPC benchmarks run on ridiculous hardware configurations are ridiculous.
If you can prove that ParAccel is average or even a little below average in the ridiculousness of its submission, that in no way disproves my point.
I disagree with the notion that ALL TPC(tm) benchmarks are run on ridiculous hardware configurations.
Benchmark runs looking to achieve ridiculous performance are run on ridiculous hardware configurations for obvious reasons.
Benchmark runs targeting price/performance run on reasonable hardware and give reasonable (or phenomenal if you are Kickfire) performance results on configurations more likely to show up in your average data center.
FD: I work for Kickfire.
And I may have misread that sentence. That happens a lot around here. I realize that you said pretty much exactly what I did. Oops.
Also, I completely forget making the claim that ParAccel was “worse than others”, if it’s I you suggest made it.
If it is indeed I, could you please point me to where I made it, so that I may recall the context?
Thanks, yes I would like to understand the context in which you meant this.
“ParAccel is worse than others in this regard only insofar as it garners attention for a TPC result. Well, there’s also the bit about them writing new code to make the benchmark work and then advertising “load and go” ease of running it …”
I suspect that I misunderstand the context in which you make this statement, after all I’m certain that you follow these things a whole lot closely than I do.
I think I see what I misunderstood. You wrote:
Throwing huge numbers of spindles at a contrived benchmark, while not technically fraud, is highly misleading. Yet that’s exactly what the TPC encourages.
ParAccel is worse than others in this regard only insofar as it garners attention for a TPC result. Well, there’s also the bit about them writing new code to make the benchmark work and then advertising “load and go” ease of running it …
So, maybe you were talking only about the number of spindles that ParAccel used. In that case, you have a very valid point.
ParAccel ranks 146 in number of spindles. In other words, only five other results had more disk drives in the solution.
If that is what you meant, I stand corrected.
I said the only thing that made ParAccel worse than other TPC-H submitters was putting out a press release (and unspecified other marketing). I specifically did not say their hardware configuration was worse than others.
That was for two reasons:
1. I didn’t care enough to check. 🙂
2. Once you go way beyond what’s commercially reasonable, I don’t care how much further yet you go. Your benchmark is already nonsense.
Curt, thanks for the clarification.
But there is only one other statistically significant entry at 30TB. I don’t think you can compare results between different scale factors in any meaningful way.
I agree. I wasn’t questioning that. For that matter, I did not take exception to the number of disks that ParAccel used in their configuration. And I agree with your point on the fact that given current disk sizes, the amount of total disk space was probably an unavoidable consequence.
I started this post because I understood the suggestion in recent discussions has been that ParAccel is in some way “worse than others” when it came to disk to data or memory to data ratios. These were the things that Curt talked about in his original post and that is probably why I went astray.
I think (Curt can confirm this) I misunderstood the context in which Curt made the comments about ParAccel being “worse than others”. He was (I think) making the context strictly in reference to the number of spindles. And his comment does ring true based on the data that I have reviewed from TPC’s web page.
I’m not passing any judgements on whether the ParAccel configuration is a good one or an excessive one. That wasn’t and still isn’t my interest. It is worth pointing out that there is only one other entry at 30TB so four entries at SF less than 30,000 used more disks than ParAccel 🙂