~hc/RK356X_SDK_RELEASE.git

..	..	@@ -36,27 +36,27 @@
36	36	defines calling convention that is compatible with C calling
37	37	convention of the linux kernel on those architectures.
38	38
39		-Q: can multiple return values be supported in the future?
	39	+Q: Can multiple return values be supported in the future?
40	40	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41	41	A: NO. BPF allows only register R0 to be used as return value.
42	42
43		-Q: can more than 5 function arguments be supported in the future?
	43	+Q: Can more than 5 function arguments be supported in the future?
44	44	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45	45	A: NO. BPF calling convention only allows registers R1-R5 to be used
46	46	as arguments. BPF is not a standalone instruction set.
47	47	(unlike x64 ISA that allows msft, cdecl and other conventions)
48	48
49		-Q: can BPF programs access instruction pointer or return address?
	49	+Q: Can BPF programs access instruction pointer or return address?
50	50	-----------------------------------------------------------------
51	51	A: NO.
52	52
53		-Q: can BPF programs access stack pointer ?
	53	+Q: Can BPF programs access stack pointer ?
54	54	------------------------------------------
55	55	A: NO.
56	56
57	57	Only frame pointer (register R10) is accessible.
58	58	From compiler point of view it's necessary to have stack pointer.
59		-For example LLVM defines register R11 as stack pointer in its
	59	+For example, LLVM defines register R11 as stack pointer in its
60	60	BPF backend, but it makes sure that generated code never uses it.
61	61
62	62	Q: Does C-calling convention diminishes possible use cases?
..	..	@@ -66,8 +66,8 @@
66	66	BPF design forces addition of major functionality in the form
67	67	of kernel helper functions and kernel objects like BPF maps with
68	68	seamless interoperability between them. It lets kernel call into
69		-BPF programs and programs call kernel helpers with zero overhead.
70		-As all of them were native C code. That is particularly the case
	69	+BPF programs and programs call kernel helpers with zero overhead,
	70	+as all of them were native C code. That is particularly the case
71	71	for JITed BPF programs that are indistinguishable from
72	72	native kernel C code.
73	73
..	..	@@ -75,9 +75,9 @@
75	75	------------------------------------------------------------------------
76	76	A: Soft yes.
77	77
78		-At least for now until BPF core has support for
	78	+At least for now, until BPF core has support for
79	79	bpf-to-bpf calls, indirect calls, loops, global variables,
80		-jump tables, read only sections and all other normal constructs
	80	+jump tables, read-only sections, and all other normal constructs
81	81	that C code can produce.
82	82
83	83	Q: Can loops be supported in a safe way?
..	..	@@ -85,8 +85,33 @@
85	85	A: It's not clear yet.
86	86
87	87	BPF developers are trying to find a way to
88		-support bounded loops where the verifier can guarantee that
89		-the program terminates in less than 4096 instructions.
	88	+support bounded loops.
	89	+
	90	+Q: What are the verifier limits?
	91	+--------------------------------
	92	+A: The only limit known to the user space is BPF_MAXINSNS (4096).
	93	+It's the maximum number of instructions that the unprivileged bpf
	94	+program can have. The verifier has various internal limits.
	95	+Like the maximum number of instructions that can be explored during
	96	+program analysis. Currently, that limit is set to 1 million.
	97	+Which essentially means that the largest program can consist
	98	+of 1 million NOP instructions. There is a limit to the maximum number
	99	+of subsequent branches, a limit to the number of nested bpf-to-bpf
	100	+calls, a limit to the number of the verifier states per instruction,
	101	+a limit to the number of maps used by the program.
	102	+All these limits can be hit with a sufficiently complex program.
	103	+There are also non-numerical limits that can cause the program
	104	+to be rejected. The verifier used to recognize only pointer + constant
	105	+expressions. Now it can recognize pointer + bounded_register.
	106	+bpf_lookup_map_elem(key) had a requirement that 'key' must be
	107	+a pointer to the stack. Now, 'key' can be a pointer to map value.
	108	+The verifier is steadily getting 'smarter'. The limits are
	109	+being removed. The only way to know that the program is going to
	110	+be accepted by the verifier is to try to load it.
	111	+The bpf development process guarantees that the future kernel
	112	+versions will accept all bpf programs that were accepted by
	113	+the earlier versions.
	114	+
90	115
91	116	Instruction level questions
92	117	---------------------------
..	..	@@ -109,16 +134,16 @@
109	134	A: This was necessary to avoid introducing flags into ISA which are
110	135	impossible to make generic and efficient across CPU architectures.
111	136
112		-Q: why BPF_DIV instruction doesn't map to x64 div?
	137	+Q: Why BPF_DIV instruction doesn't map to x64 div?
113	138	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114	139	A: Because if we picked one-to-one relationship to x64 it would have made
115	140	it more complicated to support on arm64 and other archs. Also it
116	141	needs div-by-zero runtime check.
117	142
118		-Q: why there is no BPF_SDIV for signed divide operation?
	143	+Q: Why there is no BPF_SDIV for signed divide operation?
119	144	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120	145	A: Because it would be rarely used. llvm errors in such case and
121		-prints a suggestion to use unsigned divide instead
	146	+prints a suggestion to use unsigned divide instead.
122	147
123	148	Q: Why BPF has implicit prologue and epilogue?
124	149	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
..	..	@@ -147,22 +172,41 @@
147	172	CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
148	173	be added to BPF in the future?
149	174
150		-A: NO. The first thing to improve performance on 32-bit archs is to teach
151		-LLVM to generate code that uses 32-bit subregisters. Then second step
152		-is to teach verifier to mark operations where zero-ing upper bits
153		-is unnecessary. Then JITs can take advantage of those markings and
154		-drastically reduce size of generated code and improve performance.
	175	+A: NO.
	176	+
	177	+But some optimizations on zero-ing the upper 32 bits for BPF registers are
	178	+available, and can be leveraged to improve the performance of JITed BPF
	179	+programs for 32-bit architectures.
	180	+
	181	+Starting with version 7, LLVM is able to generate instructions that operate
	182	+on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
	183	+compiling a program. Furthermore, the verifier can now mark the
	184	+instructions for which zero-ing the upper bits of the destination register
	185	+is required, and insert an explicit zero-extension (zext) instruction
	186	+(a mov32 variant). This means that for architectures without zext hardware
	187	+support, the JIT back-ends do not need to clear the upper bits for
	188	+subregisters written by alu32 instructions or narrow loads. Instead, the
	189	+back-ends simply need to support code generation for that mov32 variant,
	190	+and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
	191	+enable zext insertion in the verifier).
	192	+
	193	+Note that it is possible for a JIT back-end to have partial hardware
	194	+support for zext. In that case, if verifier zext insertion is enabled,
	195	+it could lead to the insertion of unnecessary zext instructions. Such
	196	+instructions could be removed by creating a simple peephole inside the JIT
	197	+back-end: if one instruction has hardware support for zext and if the next
	198	+instruction is an explicit zext, then the latter can be skipped when doing
	199	+the code generation.
155	200
156	201	Q: Does BPF have a stable ABI?
157	202	------------------------------
158	203	A: YES. BPF instructions, arguments to BPF programs, set of helper
159	204	functions and their arguments, recognized return codes are all part
160		-of ABI. However when tracing programs are using bpf_probe_read() helper
161		-to walk kernel internal datastructures and compile with kernel
162		-internal headers these accesses can and will break with newer
163		-kernels. The union bpf_attr -> kern_version is checked at load time
164		-to prevent accidentally loading kprobe-based bpf programs written
165		-for a different kernel. Networking programs don't do kern_version check.
	205	+of ABI. However there is one specific exception to tracing programs
	206	+which are using helpers like bpf_probe_read() to walk kernel internal
	207	+data structures and compile with kernel internal headers. Both of these
	208	+kernel internals are subject to change and can break with newer kernels
	209	+such that the program needs to be adapted accordingly.
166	210
167	211	Q: How much stack space a BPF program uses?
168	212	-------------------------------------------
..	..	@@ -201,17 +245,6 @@
201	245	program is loaded the kernel will print warning message, so
202	246	this helper is only useful for experiments and prototypes.
203	247	Tracing BPF programs are root only.
204		-
205		-Q: bpf_trace_printk() helper warning
206		-------------------------------------
207		-Q: When bpf_trace_printk() helper is used the kernel prints nasty
208		-warning message. Why is that?
209		-
210		-A: This is done to nudge program authors into better interfaces when
211		-programs need to pass data to user space. Like bpf_perf_event_output()
212		-can be used to efficiently stream data via perf ring buffer.
213		-BPF maps can be used for asynchronous data sharing between kernel
214		-and user space. bpf_trace_printk() should only be used for debugging.
215	248
216	249	Q: New functionality via kernel modules?
217	250	----------------------------------------