| .. | .. |
|---|
| 56 | 56 | smp_mb__{before,after}_atomic() |
|---|
| 57 | 57 | |
|---|
| 58 | 58 | |
|---|
| 59 | +TYPES (signed vs unsigned) |
|---|
| 60 | +----- |
|---|
| 61 | + |
|---|
| 62 | +While atomic_t, atomic_long_t and atomic64_t use int, long and s64 |
|---|
| 63 | +respectively (for hysterical raisins), the kernel uses -fno-strict-overflow |
|---|
| 64 | +(which implies -fwrapv) and defines signed overflow to behave like |
|---|
| 65 | +2s-complement. |
|---|
| 66 | + |
|---|
| 67 | +Therefore, an explicitly unsigned variant of the atomic ops is strictly |
|---|
| 68 | +unnecessary and we can simply cast, there is no UB. |
|---|
| 69 | + |
|---|
| 70 | +There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for |
|---|
| 71 | +signed types. |
|---|
| 72 | + |
|---|
| 73 | +With this we also conform to the C/C++ _Atomic behaviour and things like |
|---|
| 74 | +P1236R1. |
|---|
| 75 | + |
|---|
| 59 | 76 | |
|---|
| 60 | 77 | SEMANTICS |
|---|
| 61 | 78 | --------- |
|---|
| .. | .. |
|---|
| 64 | 81 | |
|---|
| 65 | 82 | The non-RMW ops are (typically) regular LOADs and STOREs and are canonically |
|---|
| 66 | 83 | implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and |
|---|
| 67 | | -smp_store_release() respectively. |
|---|
| 84 | +smp_store_release() respectively. Therefore, if you find yourself only using |
|---|
| 85 | +the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all |
|---|
| 86 | +and are doing it wrong. |
|---|
| 68 | 87 | |
|---|
| 69 | | -The one detail to this is that atomic_set{}() should be observable to the RMW |
|---|
| 70 | | -ops. That is: |
|---|
| 88 | +A note for the implementation of atomic_set{}() is that it must not break the |
|---|
| 89 | +atomicity of the RMW ops. That is: |
|---|
| 71 | 90 | |
|---|
| 72 | | - C atomic-set |
|---|
| 91 | + C Atomic-RMW-ops-are-atomic-WRT-atomic_set |
|---|
| 73 | 92 | |
|---|
| 74 | 93 | { |
|---|
| 75 | | - atomic_set(v, 1); |
|---|
| 94 | + atomic_t v = ATOMIC_INIT(1); |
|---|
| 95 | + } |
|---|
| 96 | + |
|---|
| 97 | + P0(atomic_t *v) |
|---|
| 98 | + { |
|---|
| 99 | + (void)atomic_add_unless(v, 1, 0); |
|---|
| 76 | 100 | } |
|---|
| 77 | 101 | |
|---|
| 78 | 102 | P1(atomic_t *v) |
|---|
| 79 | | - { |
|---|
| 80 | | - atomic_add_unless(v, 1, 0); |
|---|
| 81 | | - } |
|---|
| 82 | | - |
|---|
| 83 | | - P2(atomic_t *v) |
|---|
| 84 | 103 | { |
|---|
| 85 | 104 | atomic_set(v, 0); |
|---|
| 86 | 105 | } |
|---|
| .. | .. |
|---|
| 170 | 189 | |
|---|
| 171 | 190 | smp_mb__{before,after}_atomic() |
|---|
| 172 | 191 | |
|---|
| 173 | | -only apply to the RMW ops and can be used to augment/upgrade the ordering |
|---|
| 174 | | -inherent to the used atomic op. These barriers provide a full smp_mb(). |
|---|
| 192 | +only apply to the RMW atomic ops and can be used to augment/upgrade the |
|---|
| 193 | +ordering inherent to the op. These barriers act almost like a full smp_mb(): |
|---|
| 194 | +smp_mb__before_atomic() orders all earlier accesses against the RMW op |
|---|
| 195 | +itself and all accesses following it, and smp_mb__after_atomic() orders all |
|---|
| 196 | +later accesses against the RMW op and all accesses preceding it. However, |
|---|
| 197 | +accesses between the smp_mb__{before,after}_atomic() and the RMW op are not |
|---|
| 198 | +ordered, so it is advisable to place the barrier right next to the RMW atomic |
|---|
| 199 | +op whenever possible. |
|---|
| 175 | 200 | |
|---|
| 176 | 201 | These helper barriers exist because architectures have varying implicit |
|---|
| 177 | 202 | ordering on their SMP atomic primitives. For example our TSO architectures |
|---|
| .. | .. |
|---|
| 198 | 223 | atomic_dec(&X); |
|---|
| 199 | 224 | |
|---|
| 200 | 225 | is a 'typical' RELEASE pattern, the barrier is strictly stronger than |
|---|
| 201 | | -a RELEASE. Similarly for something like: |
|---|
| 226 | +a RELEASE because it orders preceding instructions against both the read |
|---|
| 227 | +and write parts of the atomic_dec(), and against all following instructions |
|---|
| 228 | +as well. Similarly, something like: |
|---|
| 202 | 229 | |
|---|
| 203 | 230 | atomic_inc(&X); |
|---|
| 204 | 231 | smp_mb__after_atomic(); |
|---|
| .. | .. |
|---|
| 206 | 233 | is an ACQUIRE pattern (though very much not typical), but again the barrier is |
|---|
| 207 | 234 | strictly stronger than ACQUIRE. As illustrated: |
|---|
| 208 | 235 | |
|---|
| 209 | | - C strong-acquire |
|---|
| 236 | + C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire |
|---|
| 210 | 237 | |
|---|
| 211 | 238 | { |
|---|
| 212 | 239 | } |
|---|
| 213 | 240 | |
|---|
| 214 | | - P1(int *x, atomic_t *y) |
|---|
| 241 | + P0(int *x, atomic_t *y) |
|---|
| 215 | 242 | { |
|---|
| 216 | 243 | r0 = READ_ONCE(*x); |
|---|
| 217 | 244 | smp_rmb(); |
|---|
| 218 | 245 | r1 = atomic_read(y); |
|---|
| 219 | 246 | } |
|---|
| 220 | 247 | |
|---|
| 221 | | - P2(int *x, atomic_t *y) |
|---|
| 248 | + P1(int *x, atomic_t *y) |
|---|
| 222 | 249 | { |
|---|
| 223 | 250 | atomic_inc(y); |
|---|
| 224 | 251 | smp_mb__after_atomic(); |
|---|
| .. | .. |
|---|
| 226 | 253 | } |
|---|
| 227 | 254 | |
|---|
| 228 | 255 | exists |
|---|
| 229 | | - (r0=1 /\ r1=0) |
|---|
| 256 | + (0:r0=1 /\ 0:r1=0) |
|---|
| 230 | 257 | |
|---|
| 231 | 258 | This should not happen; but a hypothetical atomic_inc_acquire() -- |
|---|
| 232 | 259 | (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, |
|---|
| 233 | | -since then: |
|---|
| 260 | +because it would not order the W part of the RMW against the following |
|---|
| 261 | +WRITE_ONCE. Thus: |
|---|
| 234 | 262 | |
|---|
| 235 | | - P1 P2 |
|---|
| 263 | + P0 P1 |
|---|
| 236 | 264 | |
|---|
| 237 | 265 | t = LL.acq *y (0) |
|---|
| 238 | 266 | t++; |
|---|