hc
2023-12-09 95099d4622f8cb224d94e314c7a8e0df60b13f87
kernel/Documentation/atomic_t.txt
....@@ -56,6 +56,23 @@
5656 smp_mb__{before,after}_atomic()
5757
5858
59
+TYPES (signed vs unsigned)
60
+-----
61
+
62
+While atomic_t, atomic_long_t and atomic64_t use int, long and s64
63
+respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
64
+(which implies -fwrapv) and defines signed overflow to behave like
65
+2s-complement.
66
+
67
+Therefore, an explicitly unsigned variant of the atomic ops is strictly
68
+unnecessary and we can simply cast, there is no UB.
69
+
70
+There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
71
+signed types.
72
+
73
+With this we also conform to the C/C++ _Atomic behaviour and things like
74
+P1236R1.
75
+
5976
6077 SEMANTICS
6178 ---------
....@@ -64,23 +81,25 @@
6481
6582 The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
6683 implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
67
-smp_store_release() respectively.
84
+smp_store_release() respectively. Therefore, if you find yourself only using
85
+the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
86
+and are doing it wrong.
6887
69
-The one detail to this is that atomic_set{}() should be observable to the RMW
70
-ops. That is:
88
+A note for the implementation of atomic_set{}() is that it must not break the
89
+atomicity of the RMW ops. That is:
7190
72
- C atomic-set
91
+ C Atomic-RMW-ops-are-atomic-WRT-atomic_set
7392
7493 {
75
- atomic_set(v, 1);
94
+ atomic_t v = ATOMIC_INIT(1);
95
+ }
96
+
97
+ P0(atomic_t *v)
98
+ {
99
+ (void)atomic_add_unless(v, 1, 0);
76100 }
77101
78102 P1(atomic_t *v)
79
- {
80
- atomic_add_unless(v, 1, 0);
81
- }
82
-
83
- P2(atomic_t *v)
84103 {
85104 atomic_set(v, 0);
86105 }
....@@ -170,8 +189,14 @@
170189
171190 smp_mb__{before,after}_atomic()
172191
173
-only apply to the RMW ops and can be used to augment/upgrade the ordering
174
-inherent to the used atomic op. These barriers provide a full smp_mb().
192
+only apply to the RMW atomic ops and can be used to augment/upgrade the
193
+ordering inherent to the op. These barriers act almost like a full smp_mb():
194
+smp_mb__before_atomic() orders all earlier accesses against the RMW op
195
+itself and all accesses following it, and smp_mb__after_atomic() orders all
196
+later accesses against the RMW op and all accesses preceding it. However,
197
+accesses between the smp_mb__{before,after}_atomic() and the RMW op are not
198
+ordered, so it is advisable to place the barrier right next to the RMW atomic
199
+op whenever possible.
175200
176201 These helper barriers exist because architectures have varying implicit
177202 ordering on their SMP atomic primitives. For example our TSO architectures
....@@ -198,7 +223,9 @@
198223 atomic_dec(&X);
199224
200225 is a 'typical' RELEASE pattern, the barrier is strictly stronger than
201
-a RELEASE. Similarly for something like:
226
+a RELEASE because it orders preceding instructions against both the read
227
+and write parts of the atomic_dec(), and against all following instructions
228
+as well. Similarly, something like:
202229
203230 atomic_inc(&X);
204231 smp_mb__after_atomic();
....@@ -206,19 +233,19 @@
206233 is an ACQUIRE pattern (though very much not typical), but again the barrier is
207234 strictly stronger than ACQUIRE. As illustrated:
208235
209
- C strong-acquire
236
+ C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
210237
211238 {
212239 }
213240
214
- P1(int *x, atomic_t *y)
241
+ P0(int *x, atomic_t *y)
215242 {
216243 r0 = READ_ONCE(*x);
217244 smp_rmb();
218245 r1 = atomic_read(y);
219246 }
220247
221
- P2(int *x, atomic_t *y)
248
+ P1(int *x, atomic_t *y)
222249 {
223250 atomic_inc(y);
224251 smp_mb__after_atomic();
....@@ -226,13 +253,14 @@
226253 }
227254
228255 exists
229
- (r0=1 /\ r1=0)
256
+ (0:r0=1 /\ 0:r1=0)
230257
231258 This should not happen; but a hypothetical atomic_inc_acquire() --
232259 (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
233
-since then:
260
+because it would not order the W part of the RMW against the following
261
+WRITE_ONCE. Thus:
234262
235
- P1 P2
263
+ P0 P1
236264
237265 t = LL.acq *y (0)
238266 t++;