memory_order_release, memory_order_acquire and memory_order_acq_rel. We have seen memory_order_release before. It was used on the store-side of Release-Consume. The new memory_order_acquire tag is used on loads to force Release-Acquire semantics. As the name implies, memory_order_acq_rel specifies both the release and acquire tags for a single operation. We will deal with memory_order_acq_rel at the end. Our main focus is the release and acquire tags.
Release-Acquire semantics guarantee the following. If thread A performs some write operations (which may be non-atomic or relaxed atomic) before performing an atomic store on a variable M with the memory_order_release tag, then all of those writes performed by A will be visible in the same order to any thread that loads M with the memory_order_acquire tag.
This is much stronger than Release-Consume, which only guarantees visibility to writes that carry a dependency to M and nothing else.
Release-Acquire synchronizes the two threads on a much more complete level. And frankly, a level that is much easier for programmers to reason about.
Let's look at an example:
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdatomic.h>
#include <pthread.h>
static uint32_t a = 1;
static uint32_t b = 2;
static atomic_uint c = 3;
static void *producer(void *arg) {
a = 4; // A
b = 5; // B
atomic_store_explicit(&c, b, memory_order_release); // C
return NULL;
}
static void *consumer(void *arg) {
uint32_t cval;
while (true) {
cval = atomic_load_explicit(&c, memory_order_acquire);
if (cval != 3) {
break;
}
}
uint32_t aval = a;
uint32_t bval = b;
printf("Consumer read: a=%u, b=%u, c=%u\n", aval, bval, cval);
return NULL;
}
int main(int argc, const char **argv) {
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, consumer, NULL);
pthread_create(&thread2, NULL, producer, NULL);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
return 0;
}
Let's put this code in a file named release_acquire.c and verify it works:
$ gcc -Wall -o release_acquire release_acquire.c -lpthread
$ ./release_acquire
Consumer read: a=4, b=5, c=5
In this case both a and b are regular non-atomic variables. The producer writes to both of them and then does an atomic store on c. The consumer, on the other hand, atomically loads c until it sees the value has changed (indicating the producer has done the store, completing the Release-Acquire transaction). After that point, it reads a and b non-atomically and then prints all 3 values off.
Because of Release-Acquire, the writes to a and b in the producer become visible to the consumer after the successful load. Both threads see the same order A->B->C. So it is completely safe for the consumer to now read the value of these non-atomic variables directly.
This probably feels wrong to people who are used to thinking of atomics as operations that only provide an atomic guarantee to a specific operation and have no bearing on anything else. But atomics provide synchronization guarantees on top of that, like this.
Notice that operation A does not carry a dependency to the atomic store at all. Yet the consumer is still guaranteed to observe it. If we changed the consumer's tag to memory_order_consume, then it would be guaranteed to see B->C in the original order, but absolutely no guarantees about A would be made. This means such a consumer may read the value of a to be 1 or 4, or even in an intermediate state if the operation A consisted of multiple micro instructions.
Like Release-Consume, these memory order guarantees are only made between the releasing and acquiring threads. If a thread does not atomically acquire c then memory ordering with respect to the three variables is undefined for it.
It is sometimes helpful to also think of Release-Acquire as forming a memory barrier (often called a memory fence) at the store and load points. Everything before the store cannot be re-ordered after the store because the acquiring thread must observe the same memory order. On the flip side, no operations which occur after the load can be re-ordered before it.
This last point feels redundant. No operations after the load can be re-ordered before it. But this all happens in a single thread, which already guarantees code-line memory order by default.
I will try to justify this as best I can. Admittedly, I was unable to find clarification here and this is the only reasoning I could come up with on my own.
Consider a read-modify-write atomic operation like atomic_fetch_add, which returns the previous value of the variable and then adds the incoming value to the variable in a single atomic operation. Suppose we had the following:
// Thread 1
a = 1; // A
b = 2; // B
int c = atomic_fetch_add(&a, 1, memory_order_release); // C
b = 3; // D
// Thread 2
if (atomic_load(&a, memory_order_acquire) != 2)
d = b;
The release of operation C ensures thread 2 will observe A and B when acquiring C. This means A and B can not be re-ordered after C. But, it is perfectly legal to re-order D before C since the two operations are independent. That is, from the POV of thread 1, nothing appears re-ordered. From thread 2's perspective, the value of d may either be 2 or 3, because it may either observe A->B->C->D or something like A->B->D->C. The only guarantee is that A is before B is before C. D could be placed anywhere.
If we then flipped this to use memory_order_acquire instead, what would happen?
// Thread 1
a = 1; // A
b = 2; // B
int c = atomic_fetch_add(&a, 1, memory_order_acquire); // C
b = 3; // D
// Thread 2
if (atomic_load(&a, memory_order_acquire) != 2)
int d = b;
The acquire of operation C ensures operation D does not happen before C. But it makes no guarantees about A and B relative to C. Since C depends on A, this memory order will be preserved by default from thread 1's POV, but B may happen after C. From thread 2's perspective, it may see the value of b as either 2 or whatever its initial state was because B may happen after its load. But it will be guaranteed that operation D has not happened yet, so it will not see the value of b as 3.
This subtlety is the reason why, as far as I can tell, memory_order_acq_rel exists. So that you can have a synchronization point that guarantees operations before it do not get re-ordered after it, and operations after it do not get re-ordered before it from the vantage point of another thread. Sticking with our example, the value of d now has one possibility: it must be 2:
// Thread 1
a = 1; // A
b = 2; // B
int c = atomic_fetch_add(&a, 1, memory_order_acq_rel); // C
b = 3; // D
// Thread 2
if (atomic_load(&a, memory_order_acquire) != 2)
int d = b;
I am uncertain of this explanation. If anyone can correct me on this point, that would be much appreciated.
For the majority of cases, Release-Acquire semantics are likely all you need. But there is still one more memory order, which is even stronger.