hc
2023-12-11 d2ccde1c8e90d38cee87a1b0309ad2827f3fd30d
kernel/Documentation/bpf/bpf_design_QA.rst
....@@ -36,27 +36,27 @@
3636 defines calling convention that is compatible with C calling
3737 convention of the linux kernel on those architectures.
3838
39
-Q: can multiple return values be supported in the future?
39
+Q: Can multiple return values be supported in the future?
4040 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4141 A: NO. BPF allows only register R0 to be used as return value.
4242
43
-Q: can more than 5 function arguments be supported in the future?
43
+Q: Can more than 5 function arguments be supported in the future?
4444 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4545 A: NO. BPF calling convention only allows registers R1-R5 to be used
4646 as arguments. BPF is not a standalone instruction set.
4747 (unlike x64 ISA that allows msft, cdecl and other conventions)
4848
49
-Q: can BPF programs access instruction pointer or return address?
49
+Q: Can BPF programs access instruction pointer or return address?
5050 -----------------------------------------------------------------
5151 A: NO.
5252
53
-Q: can BPF programs access stack pointer ?
53
+Q: Can BPF programs access stack pointer ?
5454 ------------------------------------------
5555 A: NO.
5656
5757 Only frame pointer (register R10) is accessible.
5858 From compiler point of view it's necessary to have stack pointer.
59
-For example LLVM defines register R11 as stack pointer in its
59
+For example, LLVM defines register R11 as stack pointer in its
6060 BPF backend, but it makes sure that generated code never uses it.
6161
6262 Q: Does C-calling convention diminishes possible use cases?
....@@ -66,8 +66,8 @@
6666 BPF design forces addition of major functionality in the form
6767 of kernel helper functions and kernel objects like BPF maps with
6868 seamless interoperability between them. It lets kernel call into
69
-BPF programs and programs call kernel helpers with zero overhead.
70
-As all of them were native C code. That is particularly the case
69
+BPF programs and programs call kernel helpers with zero overhead,
70
+as all of them were native C code. That is particularly the case
7171 for JITed BPF programs that are indistinguishable from
7272 native kernel C code.
7373
....@@ -75,9 +75,9 @@
7575 ------------------------------------------------------------------------
7676 A: Soft yes.
7777
78
-At least for now until BPF core has support for
78
+At least for now, until BPF core has support for
7979 bpf-to-bpf calls, indirect calls, loops, global variables,
80
-jump tables, read only sections and all other normal constructs
80
+jump tables, read-only sections, and all other normal constructs
8181 that C code can produce.
8282
8383 Q: Can loops be supported in a safe way?
....@@ -85,8 +85,33 @@
8585 A: It's not clear yet.
8686
8787 BPF developers are trying to find a way to
88
-support bounded loops where the verifier can guarantee that
89
-the program terminates in less than 4096 instructions.
88
+support bounded loops.
89
+
90
+Q: What are the verifier limits?
91
+--------------------------------
92
+A: The only limit known to the user space is BPF_MAXINSNS (4096).
93
+It's the maximum number of instructions that the unprivileged bpf
94
+program can have. The verifier has various internal limits.
95
+Like the maximum number of instructions that can be explored during
96
+program analysis. Currently, that limit is set to 1 million.
97
+Which essentially means that the largest program can consist
98
+of 1 million NOP instructions. There is a limit to the maximum number
99
+of subsequent branches, a limit to the number of nested bpf-to-bpf
100
+calls, a limit to the number of the verifier states per instruction,
101
+a limit to the number of maps used by the program.
102
+All these limits can be hit with a sufficiently complex program.
103
+There are also non-numerical limits that can cause the program
104
+to be rejected. The verifier used to recognize only pointer + constant
105
+expressions. Now it can recognize pointer + bounded_register.
106
+bpf_lookup_map_elem(key) had a requirement that 'key' must be
107
+a pointer to the stack. Now, 'key' can be a pointer to map value.
108
+The verifier is steadily getting 'smarter'. The limits are
109
+being removed. The only way to know that the program is going to
110
+be accepted by the verifier is to try to load it.
111
+The bpf development process guarantees that the future kernel
112
+versions will accept all bpf programs that were accepted by
113
+the earlier versions.
114
+
90115
91116 Instruction level questions
92117 ---------------------------
....@@ -109,16 +134,16 @@
109134 A: This was necessary to avoid introducing flags into ISA which are
110135 impossible to make generic and efficient across CPU architectures.
111136
112
-Q: why BPF_DIV instruction doesn't map to x64 div?
137
+Q: Why BPF_DIV instruction doesn't map to x64 div?
113138 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114139 A: Because if we picked one-to-one relationship to x64 it would have made
115140 it more complicated to support on arm64 and other archs. Also it
116141 needs div-by-zero runtime check.
117142
118
-Q: why there is no BPF_SDIV for signed divide operation?
143
+Q: Why there is no BPF_SDIV for signed divide operation?
119144 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120145 A: Because it would be rarely used. llvm errors in such case and
121
-prints a suggestion to use unsigned divide instead
146
+prints a suggestion to use unsigned divide instead.
122147
123148 Q: Why BPF has implicit prologue and epilogue?
124149 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
....@@ -147,22 +172,41 @@
147172 CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
148173 be added to BPF in the future?
149174
150
-A: NO. The first thing to improve performance on 32-bit archs is to teach
151
-LLVM to generate code that uses 32-bit subregisters. Then second step
152
-is to teach verifier to mark operations where zero-ing upper bits
153
-is unnecessary. Then JITs can take advantage of those markings and
154
-drastically reduce size of generated code and improve performance.
175
+A: NO.
176
+
177
+But some optimizations on zero-ing the upper 32 bits for BPF registers are
178
+available, and can be leveraged to improve the performance of JITed BPF
179
+programs for 32-bit architectures.
180
+
181
+Starting with version 7, LLVM is able to generate instructions that operate
182
+on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
183
+compiling a program. Furthermore, the verifier can now mark the
184
+instructions for which zero-ing the upper bits of the destination register
185
+is required, and insert an explicit zero-extension (zext) instruction
186
+(a mov32 variant). This means that for architectures without zext hardware
187
+support, the JIT back-ends do not need to clear the upper bits for
188
+subregisters written by alu32 instructions or narrow loads. Instead, the
189
+back-ends simply need to support code generation for that mov32 variant,
190
+and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
191
+enable zext insertion in the verifier).
192
+
193
+Note that it is possible for a JIT back-end to have partial hardware
194
+support for zext. In that case, if verifier zext insertion is enabled,
195
+it could lead to the insertion of unnecessary zext instructions. Such
196
+instructions could be removed by creating a simple peephole inside the JIT
197
+back-end: if one instruction has hardware support for zext and if the next
198
+instruction is an explicit zext, then the latter can be skipped when doing
199
+the code generation.
155200
156201 Q: Does BPF have a stable ABI?
157202 ------------------------------
158203 A: YES. BPF instructions, arguments to BPF programs, set of helper
159204 functions and their arguments, recognized return codes are all part
160
-of ABI. However when tracing programs are using bpf_probe_read() helper
161
-to walk kernel internal datastructures and compile with kernel
162
-internal headers these accesses can and will break with newer
163
-kernels. The union bpf_attr -> kern_version is checked at load time
164
-to prevent accidentally loading kprobe-based bpf programs written
165
-for a different kernel. Networking programs don't do kern_version check.
205
+of ABI. However there is one specific exception to tracing programs
206
+which are using helpers like bpf_probe_read() to walk kernel internal
207
+data structures and compile with kernel internal headers. Both of these
208
+kernel internals are subject to change and can break with newer kernels
209
+such that the program needs to be adapted accordingly.
166210
167211 Q: How much stack space a BPF program uses?
168212 -------------------------------------------
....@@ -201,17 +245,6 @@
201245 program is loaded the kernel will print warning message, so
202246 this helper is only useful for experiments and prototypes.
203247 Tracing BPF programs are root only.
204
-
205
-Q: bpf_trace_printk() helper warning
206
-------------------------------------
207
-Q: When bpf_trace_printk() helper is used the kernel prints nasty
208
-warning message. Why is that?
209
-
210
-A: This is done to nudge program authors into better interfaces when
211
-programs need to pass data to user space. Like bpf_perf_event_output()
212
-can be used to efficiently stream data via perf ring buffer.
213
-BPF maps can be used for asynchronous data sharing between kernel
214
-and user space. bpf_trace_printk() should only be used for debugging.
215248
216249 Q: New functionality via kernel modules?
217250 ----------------------------------------