.. | .. |
---|
56 | 56 | smp_mb__{before,after}_atomic() |
---|
57 | 57 | |
---|
58 | 58 | |
---|
| 59 | +TYPES (signed vs unsigned) |
---|
| 60 | +----- |
---|
| 61 | + |
---|
| 62 | +While atomic_t, atomic_long_t and atomic64_t use int, long and s64 |
---|
| 63 | +respectively (for hysterical raisins), the kernel uses -fno-strict-overflow |
---|
| 64 | +(which implies -fwrapv) and defines signed overflow to behave like |
---|
| 65 | +2s-complement. |
---|
| 66 | + |
---|
| 67 | +Therefore, an explicitly unsigned variant of the atomic ops is strictly |
---|
| 68 | +unnecessary and we can simply cast, there is no UB. |
---|
| 69 | + |
---|
| 70 | +There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for |
---|
| 71 | +signed types. |
---|
| 72 | + |
---|
| 73 | +With this we also conform to the C/C++ _Atomic behaviour and things like |
---|
| 74 | +P1236R1. |
---|
| 75 | + |
---|
59 | 76 | |
---|
60 | 77 | SEMANTICS |
---|
61 | 78 | --------- |
---|
.. | .. |
---|
64 | 81 | |
---|
65 | 82 | The non-RMW ops are (typically) regular LOADs and STOREs and are canonically |
---|
66 | 83 | implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and |
---|
67 | | -smp_store_release() respectively. |
---|
| 84 | +smp_store_release() respectively. Therefore, if you find yourself only using |
---|
| 85 | +the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all |
---|
| 86 | +and are doing it wrong. |
---|
68 | 87 | |
---|
69 | | -The one detail to this is that atomic_set{}() should be observable to the RMW |
---|
70 | | -ops. That is: |
---|
| 88 | +A note for the implementation of atomic_set{}() is that it must not break the |
---|
| 89 | +atomicity of the RMW ops. That is: |
---|
71 | 90 | |
---|
72 | | - C atomic-set |
---|
| 91 | + C Atomic-RMW-ops-are-atomic-WRT-atomic_set |
---|
73 | 92 | |
---|
74 | 93 | { |
---|
75 | | - atomic_set(v, 1); |
---|
| 94 | + atomic_t v = ATOMIC_INIT(1); |
---|
| 95 | + } |
---|
| 96 | + |
---|
| 97 | + P0(atomic_t *v) |
---|
| 98 | + { |
---|
| 99 | + (void)atomic_add_unless(v, 1, 0); |
---|
76 | 100 | } |
---|
77 | 101 | |
---|
78 | 102 | P1(atomic_t *v) |
---|
79 | | - { |
---|
80 | | - atomic_add_unless(v, 1, 0); |
---|
81 | | - } |
---|
82 | | - |
---|
83 | | - P2(atomic_t *v) |
---|
84 | 103 | { |
---|
85 | 104 | atomic_set(v, 0); |
---|
86 | 105 | } |
---|
.. | .. |
---|
170 | 189 | |
---|
171 | 190 | smp_mb__{before,after}_atomic() |
---|
172 | 191 | |
---|
173 | | -only apply to the RMW ops and can be used to augment/upgrade the ordering |
---|
174 | | -inherent to the used atomic op. These barriers provide a full smp_mb(). |
---|
| 192 | +only apply to the RMW atomic ops and can be used to augment/upgrade the |
---|
| 193 | +ordering inherent to the op. These barriers act almost like a full smp_mb(): |
---|
| 194 | +smp_mb__before_atomic() orders all earlier accesses against the RMW op |
---|
| 195 | +itself and all accesses following it, and smp_mb__after_atomic() orders all |
---|
| 196 | +later accesses against the RMW op and all accesses preceding it. However, |
---|
| 197 | +accesses between the smp_mb__{before,after}_atomic() and the RMW op are not |
---|
| 198 | +ordered, so it is advisable to place the barrier right next to the RMW atomic |
---|
| 199 | +op whenever possible. |
---|
175 | 200 | |
---|
176 | 201 | These helper barriers exist because architectures have varying implicit |
---|
177 | 202 | ordering on their SMP atomic primitives. For example our TSO architectures |
---|
.. | .. |
---|
198 | 223 | atomic_dec(&X); |
---|
199 | 224 | |
---|
200 | 225 | is a 'typical' RELEASE pattern, the barrier is strictly stronger than |
---|
201 | | -a RELEASE. Similarly for something like: |
---|
| 226 | +a RELEASE because it orders preceding instructions against both the read |
---|
| 227 | +and write parts of the atomic_dec(), and against all following instructions |
---|
| 228 | +as well. Similarly, something like: |
---|
202 | 229 | |
---|
203 | 230 | atomic_inc(&X); |
---|
204 | 231 | smp_mb__after_atomic(); |
---|
.. | .. |
---|
206 | 233 | is an ACQUIRE pattern (though very much not typical), but again the barrier is |
---|
207 | 234 | strictly stronger than ACQUIRE. As illustrated: |
---|
208 | 235 | |
---|
209 | | - C strong-acquire |
---|
| 236 | + C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire |
---|
210 | 237 | |
---|
211 | 238 | { |
---|
212 | 239 | } |
---|
213 | 240 | |
---|
214 | | - P1(int *x, atomic_t *y) |
---|
| 241 | + P0(int *x, atomic_t *y) |
---|
215 | 242 | { |
---|
216 | 243 | r0 = READ_ONCE(*x); |
---|
217 | 244 | smp_rmb(); |
---|
218 | 245 | r1 = atomic_read(y); |
---|
219 | 246 | } |
---|
220 | 247 | |
---|
221 | | - P2(int *x, atomic_t *y) |
---|
| 248 | + P1(int *x, atomic_t *y) |
---|
222 | 249 | { |
---|
223 | 250 | atomic_inc(y); |
---|
224 | 251 | smp_mb__after_atomic(); |
---|
.. | .. |
---|
226 | 253 | } |
---|
227 | 254 | |
---|
228 | 255 | exists |
---|
229 | | - (r0=1 /\ r1=0) |
---|
| 256 | + (0:r0=1 /\ 0:r1=0) |
---|
230 | 257 | |
---|
231 | 258 | This should not happen; but a hypothetical atomic_inc_acquire() -- |
---|
232 | 259 | (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, |
---|
233 | | -since then: |
---|
| 260 | +because it would not order the W part of the RMW against the following |
---|
| 261 | +WRITE_ONCE. Thus: |
---|
234 | 262 | |
---|
235 | | - P1 P2 |
---|
| 263 | + P0 P1 |
---|
236 | 264 | |
---|
237 | 265 | t = LL.acq *y (0) |
---|
238 | 266 | t++; |
---|