Why CCCC
From X10Wiki
Background notes and motivation
Memory models should support reliable lock-based programming. And they all do. All plausible memory models guarantee sequential consistency for those race-free programs using classic lock/monitor based programming techniques.
But over the past 20 years, people designing concurrent algorithms and their underlying language and platform support have become increasingly dissatisfied with lock-based techniques, for various well-known reasons, and increasingly turn to alternative approaches. As platform support for parallelism increases, so do efforts to further explore and develop such techniques.
There are only a few underlying ideas surrounding non-lock based (and typically "optimistic") techniques. The main one is Concurrent Readability -- the idea that multiple threads may concurrently read data values. You might think of these as generalizations of classic read-write locks, except that the gating that would be provided by such locks is left implicit. It's obvious that if you do not need to block threads attempting to read values, then a program will have greater throughput and fewer opportunities to lock up; so it is unsurprising that many people have pursued techniques somehow based on this idea. A natural corollary to concurrent readability is optimistic updates, usually relying on a compareAndSet operation (or variants), that succeed only if based on up-to-date uninterfered reads. The best cases have further properties of being provably wait-free, which also guarantee non-blocking progress for writes/updates. There is a growing set of algorithmic techniques that occupy the large middle ground here, including those like STM that might or might not someday provide a more general higher-level basis for expressing many such programs.
This raises a problem for memory models that focus on support for lock-based programs, because all techniques relying on concurrent readability intrinsically entail what would be, from a lock-based perspective, data races, for which such models assign either no or limited semantic guarantees. Java pioneered the alternative approach of providing useful semantics for such programs but only if the variables encountering such races are declared as "volatile", or are instances of special Atomic classes. This is carried on (for "atomic" but not "volatile" variables) in current draft C++ memory model. This decision generally works out pretty well from a software engineering perspective because it helps identify those variables that are specifically intended to be accessed across threads. Although there is also increasing sentiment to arrive at models that also cover arbitrary variables that may only sometimes encounter "legitimate" races.
One way to approach memory model abstractions and concrete platform support required for these kinds of modern concurrent programming techniques is to look at some common cases. Among the most common are concurrently readable collections (lists, sets, maps, etc). Here, the goal is to support simple "external" semantics while maximizing "internal" concurrency and avoiding blockage. Thus, for example, a user of a concurrent set should expect that all modifications to a data object performed prior to insertion are seen by any thread accessing that data object via the collection. The internals of the collection may support arbitrary degrees of concurrency that somehow obey this constraint.
Historically, the main algorithmic design rules for ensuring this are based on "linearizability" -- that there is some execution point at which an update is "visible" to readers. To make good on this sort of design rule (and its many less formal variants), a model/language/platform must provide some kind of happens-before rules that, among other things, state the conditions under which, if a reader sees one update, then it must also see its predecessors.
So, reasoning in terms of happens-before rules (rather than global consistency properties) is stock in trade of concurrent algorithm design. This holds whether or not programmers are conscious of this process. In Java, we find that they increasingly are; to the extent that we recently added explicit happens-before guarantees to the specs of all java.util.concurrent classes and utilities, as a way of clarifying guarantees for users.
The next question is: what must these happens-before rules look like to support reliable modern concurrent algorithms? In general, we'd like them (1) to be relatively simple, because programmers will need to bear them in mind (2) to maximize opportunities for concurrent readability, because this the basis of their power; (3) to be efficiently supportable on common multiprocessors; and (4) to support a smooth range of semantics stretching from the as-if-serial semantics of illegitimately racing variables to the stronger guarantees needed for synchronization variables used at encapsulation boundaries.
It would be convenient in terms of unifying classic and modern approaches to concurrent algorithm design if the best answer to this question were that volatile/atomic variables should guarantee sequential consistency. But this turns out not to be the case. While SC is indeed simple, it fails the other criteria: it can artificially limit concurrent readability, it is not in general as efficiently implementable as alternatives, and provides only an abrupt transition from very strong to non-existent or very weak semantics.
We've been exploring a family of models that fare better on these criteria. And in doing so, (re)discovered that reliance on two well-known, well-understood, and well-supported properties form a better basis than SC for modern concurrent programming: Cache coherence, and causal consistency (CCCC). Cache coherence, as supported on every commodity multiprocessors, ensures global total per-variable write orderings (and partially ordered reads). Causal consistency, as also readily supported, guarantees transitivity of update chains among sets of producers and consumers (or writers and readers). The resulting models provide those guarantees required for reliable optimistic programs without adding those not needed.
This work also reveals (or confirms) that reliance on a single consistency model doesn't suffice to support the range of constructions encountered in practice. Some need stronger guarantees than others, that come at some cost. So a small family of them must exist, where each differs from the other in a semantically meaningful way. In the CCCC family, there are at least two members: One (Strong CCCC) that guarantees Load Coherence, and one (Weak CCCC) that does not. Both can be further related on the top end to SC, and on the bottom end to the as-if-serial semantics of "ordinary" variables.

