memory_order_seq_cst. This is the default memory order when using a non-explicit atomic function. It is the strongest guarantee and imposes the largest performance penalty. All hardware memory models are weaker than this ordering and therefore special synchronization instructions must be added to achieve it.
The Sequentially Consistent memory order establishes Release-Acquire semantics and provides an additional guarantee: that a single total memory order is imposed on all sequentially consistent operations across all threads.
Consider a situation where we have two producer and two consumer threads:
// Producer 1
a = 1; // A
atomic_store_explicit(&b, 0, memory_order_release); // B
// Producer 2
a = 2; // C
atomic_store_explicit(&b, 0, memory_order_release); // D
// Consumer 1
c = atomic_load_explicit(&b, memory_order_acquire); // E
d = a; //F
// Consumer 2
c = atomic_load_explicit(&b, memory_order_acquire); // G
d = a; // H
It is possible that Consumer 1 observes A->B->C->D, so that a equals 2, and Consumer 2 observes C->D->A->B, so a equals 1. According to Release-Acquire semantics, the only guarantees are that A is before B and C before D.
If it is important all our consumer threads agree upon the exact same order of events as one another, this clearly won't do. Sequential Consistency fixes this by guaranteeing that A,B,C,D are given a single ordering relative to one another and that this singule order is observed by all threads. This means all of the load and store operations above must be tagged memory_order_seq_cst, because it is only operations tagged as Sequentially Consistent which are given a single total ordering.
For example, suppose we have the following:
// Producer 1
a = 1; // A
atomic_store_explicit(&b, 0, memory_order_seq_cst); // B
// Producer 2
a = 2; // C
atomic_store_explicit(&b, 0, memory_order_release); // D
// Consumer 1
c = atomic_load_explicit(&b, memory_order_seq_cst); // E
d = a; //F
// Consumer 2
c = atomic_load_explicit(&b, memory_order_seq_cst); // G
d = a; // H
If we know one thread observes A->B->G->H->E->F then we know every other thread will observe this same ordering. However, since D is only a release, then we also require C->D and any load on b that takes place after D must see C->D. That means one thread may observe A->B->C->D->G->H->E->F and another thread C->D->A->B->G->H->E->F.
There is one subtle consequence of Release-Acquire that seems unintuitive at first:
// Thread 1
atomic_store_explicit(&a, 1, memory_order_release); // A
c = atomic_load_explicit(&b, memory_order_acquire); // B
// Thread 2
atomic_store_explicit(&b, 1, memory_order_release); // C
d = atomic_load_explicit(&a, memory_order_acquire); // D
Acquire-Release guarantees no operations before a store can be moved after it, and no operations after a load can be moved before it. This means it's possible B gets re-ordered ahead of A and D ahead of C. As such, a thread can observe B->D->A->C, where they would find the value of a and b would be whatever these were initialized to. Neither would have the value 1, which at first seems perplexing.
We are told Sequential Consistency fixes this problem and guarantees at least one of the two variables is set to 1. But is that true?
// Thread 1
atomic_store_explicit(&a, 1, memory_order_seq_cst); // A
c = atomic_load_explicit(&b, memory_order_seq_cst); // B
// Thread 2
atomic_store_explicit(&b, 1, memory_order_seq_cst); // C
d = atomic_load_explicit(&a, memory_order_seq_cst); // D
Recall that Sequential Consistency provides the same guarantee as Release-Acquire and also imposes a total ordering. But we just saw Release-Acquire can allow a thread to observe B->D->A->C. Doesn't adding a total order change nothing about this scenario then, except that it ensures everyone sees this ordering rather than possibly only one thread?
That seems right. So what's going on? Remember, Release-Acquire is spread across two separate tags. The memory_order_release prevents prior writes from being re-ordered after this point. The memory_order_acquire prevents later writes from being re-ordered prior to this point. But Sequential Consistency is only a single tag. The only way to implement Release-Acquire in a single tag is to perform both, which is what memory_order_acq_rel does.
Because both release and acquire are performed, no earlier writes can come later and no later writes can come earlier. This means thread 1 must observe A->B and thread 2 must observe C->D. Either A must happen before C or vice versa. Thus, we are never in a situation where both c and d have no been updated. At least one of them, and possibly both, will be 1. And whatever order is observed on one thread will be seen by everyone. This means we won't have a scenario where someone sees c as 1 and d non-1, but another sees d as 1 and c non-1.
To implement Sequential Consistency on any platform, a full memory barrier must be used. A full memory barrier (or fence) is an instruction which does not allow earlier instructions to be re-ordered after it, nor later instructions to come before it.
This can become a performance bottleneck if you are performing a lot of Sequentially Consistent operations. This happens often since the non-explicit atomic functions are Sequentially Consistent and many people fear the explicit functions due to being intimidated by the memory model.
Often Release-Acquire semantics are strong enough for what you need. Consider that a mutex provides a Release-Acquire memory order guarantee and not Sequential Consistency.