Over the past several months, I have been wondering about the Shared Nothing architecture that seems to be very common in parallel and massively parallel systems (specifically for databases) these days. With the advent of technologies like cheap servers and fast interconnects, there has been a considerable amount of literature that point to an an apparent “consensus” that Shared Nothing is the preferred architecture for parallel databases. One such example is Stonebraker’s 1986 paper (Michael Stonebraker. The case for shared nothing. Database Engineering, 9:4–9, 1986). A reference to this apparent “consensus” can be found in a paper by Dewitt and Gray (Dewitt, Gray Parallel database systems: the future of high performance database systems 1992).
A consensus on parallel and distributed database system architecture has emerged. This architecture is based on a shared-nothing hardware design [STON86] in which processors communicate with one another only by sending messages via an interconnection network.
But two decades or so later, is this still the case, is “Shared Nothing” really the way to go? Are there other, better options that one should consider?
As long ago as 1996, Norman, Zurek and Thanisch (Norman, Zurek and Thanisch. Much ado about shared nothing 1996) made a compelling argument for hybrid systems. But even that was over a decade ago. And the last decade has seen some rather interesting changes. Is the argument proposed in 1996 by Norman et al., still valid? (Updated 2009-08-02: A related article in Computergram in 1995 can be read at http://www.cbronline.com/news/shared_nothing_parallel_database_architecture)
With the advent of clouds and virtualization, doesn’t one need to seriously consider the shared nothing architecture again? Is a shared disk architecture in a cloud even a worthwhile conversation to have? I was reading the other day about shared nothing cluster storage. What? That seems like an contradiction, doesn’t it?
Some interesting questions to ponder:
- In the current state of technology, are the characterizations “Shared Everything”, “Shared Memory”, “Shared Disk” and “Shared Nothing” still sufficient? Do we need additional categories?
- Is Shared Nothing the best way to go (As advocated by Stonebraker in 1986) or is a hybrid system the best way to go (as advocated by Norman et al., in 1996) for a high performance database?
- What one or two significant technology changes could cause a significant impact to the proposed solution?
I’ve been pondering this question for some time now and can’t quite seem to decide which way to lean. But, I’m convinced that the answer to #1 is that we need additional categories based on advances in virtualization. But, I am uncertain about the choice between a Hybrid System and a Shared Nothing system. The inevitable result of advancements in virtualization and clouds and such technologies seem to indicate that Shared Nothing is the way to go. But, Norman and others make a compelling case.
What do you think?