A mistake was identified by Ariel Ben-Yehuda in December 2024 (see https://github.com/rust-lang/unsafe-code-guidelines/issues/548#issuecomment-2549734323 ). We summarize the issue and its fix below.
The paper claims that register promotion is sound. This claim is incorrect as stated.
Consider the following program, together with the annotated outcome
(a = 2 && b = 0 && c = 0):
| Thread 1 | Thread 2 | Thread 3 |
|---|---|---|
e1: a := x_acq // 2 e2: b := y_sc // 0 |
e3: x :=_sc 1 e4: unused := z_na e5: x :=_sc 2 |
e6: y :=_sc 1 e7: c := x_sc // 0 |
In the model as given in the paper, this outcome is disallowed, since it can only be justified by a PSC cycle:
e2 -> e6 (via rb)
e6 -> e7 (via sb)
e7 -> e3 (via rb)
e3 -> e2 (via sb|!=loc from e3 to e4,
hb from e4 to e1,
and sb|!=loc from e1 to e2)
Together these edges form a PSC cycle, which is forbidden by the PSC acyclicity requirement.
However, since z is accessed by only one thread, it should be
possible to promote it into a register (say d), yielding the
program:
| Thread 1 | Thread 2 | Thread 3 |
|---|---|---|
e1: a := x_acq // 2 e2: b := y_sc // 0 |
e3: x :=_sc 1 unused := d e5: x :=_sc 2 |
e6: y :=_sc 1 e7: c := x_sc // 0 |
In this transformed program, the annotated behavior becomes allowed. In
particular, the PSC edge from e3 to e2 can no longer be
derived, since the first sb|!=loc relation used in the original
argument is no longer present.
This is a real issue: compilers do perform such promotions and related transformations for local (non-atomic) reads.
In the full version of the paper, the issue can be traced to Lemma I.12, whose proof states that the result “easily follows from our definitions”.
Our suggested repair is to give up elimination of SC accesses
(see Section 2.1.1) and to remove the sb|!=loc restriction from PSC,
i.e., to use:
(sb \ rmw) ; hb ; (sb \ rmw)
instead of:
sb|!=loc ; hb ; sb|!=loc
Removing rmw edges is needed since the formal model in the paper
(starting from Section 3) represents RMWs using two events (see Z6.U in Figure 7).
This change means that the model no longer supports elimination of repeated SC accesses, e.g.:
x :=_sc 1 ; x :=_sc 2 ~> x :=_sc 2 a := x_sc ; b := x_sc ~> a := x_sc ; b := a
To the best of our knowledge, such transformations are not performed by
mainstream compilers. Moreover, the C/C++ committee was not favorable toward
sb|!=loc, and the solution adopted in the C++20 standard already uses
sb rather than sb|!=loc in the corresponding constraints
(see
https://en.cppreference.com/w/cpp/atomic/memory_order.html
).
With sb \ rmw instead of sb|!=loc, Lemma I.12 should hold
and register promotion is sound again. A mechanized proof would, of course, be
beneficial.
While the C++ committee largely followed the approach proposed in this paper,
the C++20 memory model still exhibits issues at the time of writing this
correction. These stem from additional simplifications made
relative to the paper’s model. In particular, the standard appears to rely on
PSC_strong as described in Remark 4; however, that remark already
notes that this choice breaks the standard mapping to x86. This issue is known to
Hans Boehm, and work is ongoing to align the standard with the paper’s model
(see
https://cplusplus.github.io/LWG/issue3941
).