~hc/RK356X_SDK_RELEASE.git

..	..	@@ -56,6 +56,23 @@
56	56	smp_mb__{before,after}_atomic()
57	57
58	58
	59	+TYPES (signed vs unsigned)
	60	+-----
	61	+
	62	+While atomic_t, atomic_long_t and atomic64_t use int, long and s64
	63	+respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
	64	+(which implies -fwrapv) and defines signed overflow to behave like
	65	+2s-complement.
	66	+
	67	+Therefore, an explicitly unsigned variant of the atomic ops is strictly
	68	+unnecessary and we can simply cast, there is no UB.
	69	+
	70	+There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
	71	+signed types.
	72	+
	73	+With this we also conform to the C/C++ _Atomic behaviour and things like
	74	+P1236R1.
	75	+
59	76
60	77	SEMANTICS
61	78	---------
..	..	@@ -64,23 +81,25 @@
64	81
65	82	The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
66	83	implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
67		-smp_store_release() respectively.
	84	+smp_store_release() respectively. Therefore, if you find yourself only using
	85	+the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
	86	+and are doing it wrong.
68	87
69		-The one detail to this is that atomic_set{}() should be observable to the RMW
70		-ops. That is:
	88	+A note for the implementation of atomic_set{}() is that it must not break the
	89	+atomicity of the RMW ops. That is:
71	90
72		- C atomic-set
	91	+ C Atomic-RMW-ops-are-atomic-WRT-atomic_set
73	92
74	93	{
75		- atomic_set(v, 1);
	94	+ atomic_t v = ATOMIC_INIT(1);
	95	+ }
	96	+
	97	+ P0(atomic_t *v)
	98	+ {
	99	+ (void)atomic_add_unless(v, 1, 0);
76	100	}
77	101
78	102	P1(atomic_t *v)
79		- {
80		- atomic_add_unless(v, 1, 0);
81		- }
82		-
83		- P2(atomic_t *v)
84	103	{
85	104	atomic_set(v, 0);
86	105	}
..	..	@@ -170,8 +189,14 @@
170	189
171	190	smp_mb__{before,after}_atomic()
172	191
173		-only apply to the RMW ops and can be used to augment/upgrade the ordering
174		-inherent to the used atomic op. These barriers provide a full smp_mb().
	192	+only apply to the RMW atomic ops and can be used to augment/upgrade the
	193	+ordering inherent to the op. These barriers act almost like a full smp_mb():
	194	+smp_mb__before_atomic() orders all earlier accesses against the RMW op
	195	+itself and all accesses following it, and smp_mb__after_atomic() orders all
	196	+later accesses against the RMW op and all accesses preceding it. However,
	197	+accesses between the smp_mb__{before,after}_atomic() and the RMW op are not
	198	+ordered, so it is advisable to place the barrier right next to the RMW atomic
	199	+op whenever possible.
175	200
176	201	These helper barriers exist because architectures have varying implicit
177	202	ordering on their SMP atomic primitives. For example our TSO architectures
..	..	@@ -198,7 +223,9 @@
198	223	atomic_dec(&X);
199	224
200	225	is a 'typical' RELEASE pattern, the barrier is strictly stronger than
201		-a RELEASE. Similarly for something like:
	226	+a RELEASE because it orders preceding instructions against both the read
	227	+and write parts of the atomic_dec(), and against all following instructions
	228	+as well. Similarly, something like:
202	229
203	230	atomic_inc(&X);
204	231	smp_mb__after_atomic();
..	..	@@ -206,19 +233,19 @@
206	233	is an ACQUIRE pattern (though very much not typical), but again the barrier is
207	234	strictly stronger than ACQUIRE. As illustrated:
208	235
209		- C strong-acquire
	236	+ C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
210	237
211	238	{
212	239	}
213	240
214		- P1(int x, atomic_t y)
	241	+ P0(int x, atomic_t y)
215	242	{
216	243	r0 = READ_ONCE(*x);
217	244	smp_rmb();
218	245	r1 = atomic_read(y);
219	246	}
220	247
221		- P2(int x, atomic_t y)
	248	+ P1(int x, atomic_t y)
222	249	{
223	250	atomic_inc(y);
224	251	smp_mb__after_atomic();
..	..	@@ -226,13 +253,14 @@
226	253	}
227	254
228	255	exists
229		- (r0=1 /\ r1=0)
	256	+ (0:r0=1 /\ 0:r1=0)
230	257
231	258	This should not happen; but a hypothetical atomic_inc_acquire() --
232	259	(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
233		-since then:
	260	+because it would not order the W part of the RMW against the following
	261	+WRITE_ONCE. Thus:
234	262
235		- P1 P2
	263	+ P0 P1
236	264
237	265	t = LL.acq *y (0)
238	266	t++;