| .. | .. |
|---|
| 36 | 36 | defines calling convention that is compatible with C calling |
|---|
| 37 | 37 | convention of the linux kernel on those architectures. |
|---|
| 38 | 38 | |
|---|
| 39 | | -Q: can multiple return values be supported in the future? |
|---|
| 39 | +Q: Can multiple return values be supported in the future? |
|---|
| 40 | 40 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|---|
| 41 | 41 | A: NO. BPF allows only register R0 to be used as return value. |
|---|
| 42 | 42 | |
|---|
| 43 | | -Q: can more than 5 function arguments be supported in the future? |
|---|
| 43 | +Q: Can more than 5 function arguments be supported in the future? |
|---|
| 44 | 44 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|---|
| 45 | 45 | A: NO. BPF calling convention only allows registers R1-R5 to be used |
|---|
| 46 | 46 | as arguments. BPF is not a standalone instruction set. |
|---|
| 47 | 47 | (unlike x64 ISA that allows msft, cdecl and other conventions) |
|---|
| 48 | 48 | |
|---|
| 49 | | -Q: can BPF programs access instruction pointer or return address? |
|---|
| 49 | +Q: Can BPF programs access instruction pointer or return address? |
|---|
| 50 | 50 | ----------------------------------------------------------------- |
|---|
| 51 | 51 | A: NO. |
|---|
| 52 | 52 | |
|---|
| 53 | | -Q: can BPF programs access stack pointer ? |
|---|
| 53 | +Q: Can BPF programs access stack pointer ? |
|---|
| 54 | 54 | ------------------------------------------ |
|---|
| 55 | 55 | A: NO. |
|---|
| 56 | 56 | |
|---|
| 57 | 57 | Only frame pointer (register R10) is accessible. |
|---|
| 58 | 58 | From compiler point of view it's necessary to have stack pointer. |
|---|
| 59 | | -For example LLVM defines register R11 as stack pointer in its |
|---|
| 59 | +For example, LLVM defines register R11 as stack pointer in its |
|---|
| 60 | 60 | BPF backend, but it makes sure that generated code never uses it. |
|---|
| 61 | 61 | |
|---|
| 62 | 62 | Q: Does C-calling convention diminishes possible use cases? |
|---|
| .. | .. |
|---|
| 66 | 66 | BPF design forces addition of major functionality in the form |
|---|
| 67 | 67 | of kernel helper functions and kernel objects like BPF maps with |
|---|
| 68 | 68 | seamless interoperability between them. It lets kernel call into |
|---|
| 69 | | -BPF programs and programs call kernel helpers with zero overhead. |
|---|
| 70 | | -As all of them were native C code. That is particularly the case |
|---|
| 69 | +BPF programs and programs call kernel helpers with zero overhead, |
|---|
| 70 | +as all of them were native C code. That is particularly the case |
|---|
| 71 | 71 | for JITed BPF programs that are indistinguishable from |
|---|
| 72 | 72 | native kernel C code. |
|---|
| 73 | 73 | |
|---|
| .. | .. |
|---|
| 75 | 75 | ------------------------------------------------------------------------ |
|---|
| 76 | 76 | A: Soft yes. |
|---|
| 77 | 77 | |
|---|
| 78 | | -At least for now until BPF core has support for |
|---|
| 78 | +At least for now, until BPF core has support for |
|---|
| 79 | 79 | bpf-to-bpf calls, indirect calls, loops, global variables, |
|---|
| 80 | | -jump tables, read only sections and all other normal constructs |
|---|
| 80 | +jump tables, read-only sections, and all other normal constructs |
|---|
| 81 | 81 | that C code can produce. |
|---|
| 82 | 82 | |
|---|
| 83 | 83 | Q: Can loops be supported in a safe way? |
|---|
| .. | .. |
|---|
| 85 | 85 | A: It's not clear yet. |
|---|
| 86 | 86 | |
|---|
| 87 | 87 | BPF developers are trying to find a way to |
|---|
| 88 | | -support bounded loops where the verifier can guarantee that |
|---|
| 89 | | -the program terminates in less than 4096 instructions. |
|---|
| 88 | +support bounded loops. |
|---|
| 89 | + |
|---|
| 90 | +Q: What are the verifier limits? |
|---|
| 91 | +-------------------------------- |
|---|
| 92 | +A: The only limit known to the user space is BPF_MAXINSNS (4096). |
|---|
| 93 | +It's the maximum number of instructions that the unprivileged bpf |
|---|
| 94 | +program can have. The verifier has various internal limits. |
|---|
| 95 | +Like the maximum number of instructions that can be explored during |
|---|
| 96 | +program analysis. Currently, that limit is set to 1 million. |
|---|
| 97 | +Which essentially means that the largest program can consist |
|---|
| 98 | +of 1 million NOP instructions. There is a limit to the maximum number |
|---|
| 99 | +of subsequent branches, a limit to the number of nested bpf-to-bpf |
|---|
| 100 | +calls, a limit to the number of the verifier states per instruction, |
|---|
| 101 | +a limit to the number of maps used by the program. |
|---|
| 102 | +All these limits can be hit with a sufficiently complex program. |
|---|
| 103 | +There are also non-numerical limits that can cause the program |
|---|
| 104 | +to be rejected. The verifier used to recognize only pointer + constant |
|---|
| 105 | +expressions. Now it can recognize pointer + bounded_register. |
|---|
| 106 | +bpf_lookup_map_elem(key) had a requirement that 'key' must be |
|---|
| 107 | +a pointer to the stack. Now, 'key' can be a pointer to map value. |
|---|
| 108 | +The verifier is steadily getting 'smarter'. The limits are |
|---|
| 109 | +being removed. The only way to know that the program is going to |
|---|
| 110 | +be accepted by the verifier is to try to load it. |
|---|
| 111 | +The bpf development process guarantees that the future kernel |
|---|
| 112 | +versions will accept all bpf programs that were accepted by |
|---|
| 113 | +the earlier versions. |
|---|
| 114 | + |
|---|
| 90 | 115 | |
|---|
| 91 | 116 | Instruction level questions |
|---|
| 92 | 117 | --------------------------- |
|---|
| .. | .. |
|---|
| 109 | 134 | A: This was necessary to avoid introducing flags into ISA which are |
|---|
| 110 | 135 | impossible to make generic and efficient across CPU architectures. |
|---|
| 111 | 136 | |
|---|
| 112 | | -Q: why BPF_DIV instruction doesn't map to x64 div? |
|---|
| 137 | +Q: Why BPF_DIV instruction doesn't map to x64 div? |
|---|
| 113 | 138 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|---|
| 114 | 139 | A: Because if we picked one-to-one relationship to x64 it would have made |
|---|
| 115 | 140 | it more complicated to support on arm64 and other archs. Also it |
|---|
| 116 | 141 | needs div-by-zero runtime check. |
|---|
| 117 | 142 | |
|---|
| 118 | | -Q: why there is no BPF_SDIV for signed divide operation? |
|---|
| 143 | +Q: Why there is no BPF_SDIV for signed divide operation? |
|---|
| 119 | 144 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|---|
| 120 | 145 | A: Because it would be rarely used. llvm errors in such case and |
|---|
| 121 | | -prints a suggestion to use unsigned divide instead |
|---|
| 146 | +prints a suggestion to use unsigned divide instead. |
|---|
| 122 | 147 | |
|---|
| 123 | 148 | Q: Why BPF has implicit prologue and epilogue? |
|---|
| 124 | 149 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|---|
| .. | .. |
|---|
| 147 | 172 | CPU architectures and 32-bit HW accelerators. Can true 32-bit registers |
|---|
| 148 | 173 | be added to BPF in the future? |
|---|
| 149 | 174 | |
|---|
| 150 | | -A: NO. The first thing to improve performance on 32-bit archs is to teach |
|---|
| 151 | | -LLVM to generate code that uses 32-bit subregisters. Then second step |
|---|
| 152 | | -is to teach verifier to mark operations where zero-ing upper bits |
|---|
| 153 | | -is unnecessary. Then JITs can take advantage of those markings and |
|---|
| 154 | | -drastically reduce size of generated code and improve performance. |
|---|
| 175 | +A: NO. |
|---|
| 176 | + |
|---|
| 177 | +But some optimizations on zero-ing the upper 32 bits for BPF registers are |
|---|
| 178 | +available, and can be leveraged to improve the performance of JITed BPF |
|---|
| 179 | +programs for 32-bit architectures. |
|---|
| 180 | + |
|---|
| 181 | +Starting with version 7, LLVM is able to generate instructions that operate |
|---|
| 182 | +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for |
|---|
| 183 | +compiling a program. Furthermore, the verifier can now mark the |
|---|
| 184 | +instructions for which zero-ing the upper bits of the destination register |
|---|
| 185 | +is required, and insert an explicit zero-extension (zext) instruction |
|---|
| 186 | +(a mov32 variant). This means that for architectures without zext hardware |
|---|
| 187 | +support, the JIT back-ends do not need to clear the upper bits for |
|---|
| 188 | +subregisters written by alu32 instructions or narrow loads. Instead, the |
|---|
| 189 | +back-ends simply need to support code generation for that mov32 variant, |
|---|
| 190 | +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to |
|---|
| 191 | +enable zext insertion in the verifier). |
|---|
| 192 | + |
|---|
| 193 | +Note that it is possible for a JIT back-end to have partial hardware |
|---|
| 194 | +support for zext. In that case, if verifier zext insertion is enabled, |
|---|
| 195 | +it could lead to the insertion of unnecessary zext instructions. Such |
|---|
| 196 | +instructions could be removed by creating a simple peephole inside the JIT |
|---|
| 197 | +back-end: if one instruction has hardware support for zext and if the next |
|---|
| 198 | +instruction is an explicit zext, then the latter can be skipped when doing |
|---|
| 199 | +the code generation. |
|---|
| 155 | 200 | |
|---|
| 156 | 201 | Q: Does BPF have a stable ABI? |
|---|
| 157 | 202 | ------------------------------ |
|---|
| 158 | 203 | A: YES. BPF instructions, arguments to BPF programs, set of helper |
|---|
| 159 | 204 | functions and their arguments, recognized return codes are all part |
|---|
| 160 | | -of ABI. However when tracing programs are using bpf_probe_read() helper |
|---|
| 161 | | -to walk kernel internal datastructures and compile with kernel |
|---|
| 162 | | -internal headers these accesses can and will break with newer |
|---|
| 163 | | -kernels. The union bpf_attr -> kern_version is checked at load time |
|---|
| 164 | | -to prevent accidentally loading kprobe-based bpf programs written |
|---|
| 165 | | -for a different kernel. Networking programs don't do kern_version check. |
|---|
| 205 | +of ABI. However there is one specific exception to tracing programs |
|---|
| 206 | +which are using helpers like bpf_probe_read() to walk kernel internal |
|---|
| 207 | +data structures and compile with kernel internal headers. Both of these |
|---|
| 208 | +kernel internals are subject to change and can break with newer kernels |
|---|
| 209 | +such that the program needs to be adapted accordingly. |
|---|
| 166 | 210 | |
|---|
| 167 | 211 | Q: How much stack space a BPF program uses? |
|---|
| 168 | 212 | ------------------------------------------- |
|---|
| .. | .. |
|---|
| 201 | 245 | program is loaded the kernel will print warning message, so |
|---|
| 202 | 246 | this helper is only useful for experiments and prototypes. |
|---|
| 203 | 247 | Tracing BPF programs are root only. |
|---|
| 204 | | - |
|---|
| 205 | | -Q: bpf_trace_printk() helper warning |
|---|
| 206 | | ------------------------------------- |
|---|
| 207 | | -Q: When bpf_trace_printk() helper is used the kernel prints nasty |
|---|
| 208 | | -warning message. Why is that? |
|---|
| 209 | | - |
|---|
| 210 | | -A: This is done to nudge program authors into better interfaces when |
|---|
| 211 | | -programs need to pass data to user space. Like bpf_perf_event_output() |
|---|
| 212 | | -can be used to efficiently stream data via perf ring buffer. |
|---|
| 213 | | -BPF maps can be used for asynchronous data sharing between kernel |
|---|
| 214 | | -and user space. bpf_trace_printk() should only be used for debugging. |
|---|
| 215 | 248 | |
|---|
| 216 | 249 | Q: New functionality via kernel modules? |
|---|
| 217 | 250 | ---------------------------------------- |
|---|