| .. | .. |
|---|
| 12 | 12 | memory page faults, something otherwise only the kernel code could do. |
|---|
| 13 | 13 | |
|---|
| 14 | 14 | For example userfaults allows a proper and more optimal implementation |
|---|
| 15 | | -of the PROT_NONE+SIGSEGV trick. |
|---|
| 15 | +of the ``PROT_NONE+SIGSEGV`` trick. |
|---|
| 16 | 16 | |
|---|
| 17 | 17 | Design |
|---|
| 18 | 18 | ====== |
|---|
| 19 | 19 | |
|---|
| 20 | | -Userfaults are delivered and resolved through the userfaultfd syscall. |
|---|
| 20 | +Userfaults are delivered and resolved through the ``userfaultfd`` syscall. |
|---|
| 21 | 21 | |
|---|
| 22 | | -The userfaultfd (aside from registering and unregistering virtual |
|---|
| 22 | +The ``userfaultfd`` (aside from registering and unregistering virtual |
|---|
| 23 | 23 | memory ranges) provides two primary functionalities: |
|---|
| 24 | 24 | |
|---|
| 25 | | -1) read/POLLIN protocol to notify a userland thread of the faults |
|---|
| 25 | +1) ``read/POLLIN`` protocol to notify a userland thread of the faults |
|---|
| 26 | 26 | happening |
|---|
| 27 | 27 | |
|---|
| 28 | | -2) various UFFDIO_* ioctls that can manage the virtual memory regions |
|---|
| 29 | | - registered in the userfaultfd that allows userland to efficiently |
|---|
| 28 | +2) various ``UFFDIO_*`` ioctls that can manage the virtual memory regions |
|---|
| 29 | + registered in the ``userfaultfd`` that allows userland to efficiently |
|---|
| 30 | 30 | resolve the userfaults it receives via 1) or to manage the virtual |
|---|
| 31 | 31 | memory in the background |
|---|
| 32 | 32 | |
|---|
| 33 | 33 | The real advantage of userfaults if compared to regular virtual memory |
|---|
| 34 | 34 | management of mremap/mprotect is that the userfaults in all their |
|---|
| 35 | 35 | operations never involve heavyweight structures like vmas (in fact the |
|---|
| 36 | | -userfaultfd runtime load never takes the mmap_sem for writing). |
|---|
| 36 | +``userfaultfd`` runtime load never takes the mmap_lock for writing). |
|---|
| 37 | 37 | |
|---|
| 38 | 38 | Vmas are not suitable for page- (or hugepage) granular fault tracking |
|---|
| 39 | 39 | when dealing with virtual address spaces that could span |
|---|
| 40 | 40 | Terabytes. Too many vmas would be needed for that. |
|---|
| 41 | 41 | |
|---|
| 42 | | -The userfaultfd once opened by invoking the syscall, can also be |
|---|
| 42 | +The ``userfaultfd`` once opened by invoking the syscall, can also be |
|---|
| 43 | 43 | passed using unix domain sockets to a manager process, so the same |
|---|
| 44 | 44 | manager process could handle the userfaults of a multitude of |
|---|
| 45 | 45 | different processes without them being aware about what is going on |
|---|
| 46 | | -(well of course unless they later try to use the userfaultfd |
|---|
| 46 | +(well of course unless they later try to use the ``userfaultfd`` |
|---|
| 47 | 47 | themselves on the same region the manager is already tracking, which |
|---|
| 48 | | -is a corner case that would currently return -EBUSY). |
|---|
| 48 | +is a corner case that would currently return ``-EBUSY``). |
|---|
| 49 | 49 | |
|---|
| 50 | 50 | API |
|---|
| 51 | 51 | === |
|---|
| 52 | 52 | |
|---|
| 53 | | -When first opened the userfaultfd must be enabled invoking the |
|---|
| 54 | | -UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or |
|---|
| 55 | | -a later API version) which will specify the read/POLLIN protocol |
|---|
| 56 | | -userland intends to speak on the UFFD and the uffdio_api.features |
|---|
| 57 | | -userland requires. The UFFDIO_API ioctl if successful (i.e. if the |
|---|
| 58 | | -requested uffdio_api.api is spoken also by the running kernel and the |
|---|
| 53 | +When first opened the ``userfaultfd`` must be enabled invoking the |
|---|
| 54 | +``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or |
|---|
| 55 | +a later API version) which will specify the ``read/POLLIN`` protocol |
|---|
| 56 | +userland intends to speak on the ``UFFD`` and the ``uffdio_api.features`` |
|---|
| 57 | +userland requires. The ``UFFDIO_API`` ioctl if successful (i.e. if the |
|---|
| 58 | +requested ``uffdio_api.api`` is spoken also by the running kernel and the |
|---|
| 59 | 59 | requested features are going to be enabled) will return into |
|---|
| 60 | | -uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of |
|---|
| 60 | +``uffdio_api.features`` and ``uffdio_api.ioctls`` two 64bit bitmasks of |
|---|
| 61 | 61 | respectively all the available features of the read(2) protocol and |
|---|
| 62 | 62 | the generic ioctl available. |
|---|
| 63 | 63 | |
|---|
| 64 | | -The uffdio_api.features bitmask returned by the UFFDIO_API ioctl |
|---|
| 65 | | -defines what memory types are supported by the userfaultfd and what |
|---|
| 66 | | -events, except page fault notifications, may be generated. |
|---|
| 64 | +The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl |
|---|
| 65 | +defines what memory types are supported by the ``userfaultfd`` and what |
|---|
| 66 | +events, except page fault notifications, may be generated: |
|---|
| 67 | 67 | |
|---|
| 68 | | -If the kernel supports registering userfaultfd ranges on hugetlbfs |
|---|
| 69 | | -virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in |
|---|
| 70 | | -uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be |
|---|
| 71 | | -set if the kernel supports registering userfaultfd ranges on shared |
|---|
| 72 | | -memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero |
|---|
| 73 | | -MAP_SHARED, memfd_create, etc). |
|---|
| 68 | +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events |
|---|
| 69 | + other than page faults are supported. These events are described in more |
|---|
| 70 | + detail below in the `Non-cooperative userfaultfd`_ section. |
|---|
| 74 | 71 | |
|---|
| 75 | | -The userland application that wants to use userfaultfd with hugetlbfs |
|---|
| 76 | | -or shared memory need to set the corresponding flag in |
|---|
| 77 | | -uffdio_api.features to enable those features. |
|---|
| 72 | +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` |
|---|
| 73 | + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` |
|---|
| 74 | + registrations for hugetlbfs and shared memory (covering all shmem APIs, |
|---|
| 75 | + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, |
|---|
| 76 | + etc) virtual memory areas, respectively. |
|---|
| 78 | 77 | |
|---|
| 79 | | -If the userland desires to receive notifications for events other than |
|---|
| 80 | | -page faults, it has to verify that uffdio_api.features has appropriate |
|---|
| 81 | | -UFFD_FEATURE_EVENT_* bits set. These events are described in more |
|---|
| 82 | | -detail below in "Non-cooperative userfaultfd" section. |
|---|
| 78 | +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports |
|---|
| 79 | + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory |
|---|
| 80 | + areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating |
|---|
| 81 | + support for shmem virtual memory areas. |
|---|
| 83 | 82 | |
|---|
| 84 | | -Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should |
|---|
| 85 | | -be invoked (if present in the returned uffdio_api.ioctls bitmask) to |
|---|
| 86 | | -register a memory range in the userfaultfd by setting the |
|---|
| 87 | | -uffdio_register structure accordingly. The uffdio_register.mode |
|---|
| 83 | +The userland application should set the feature flags it intends to use |
|---|
| 84 | +when invoking the ``UFFDIO_API`` ioctl, to request that those features be |
|---|
| 85 | +enabled if supported. |
|---|
| 86 | + |
|---|
| 87 | +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` |
|---|
| 88 | +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` |
|---|
| 89 | +bitmask) to register a memory range in the ``userfaultfd`` by setting the |
|---|
| 90 | +uffdio_register structure accordingly. The ``uffdio_register.mode`` |
|---|
| 88 | 91 | bitmask will specify to the kernel which kind of faults to track for |
|---|
| 89 | | -the range (UFFDIO_REGISTER_MODE_MISSING would track missing |
|---|
| 90 | | -pages). The UFFDIO_REGISTER ioctl will return the |
|---|
| 91 | | -uffdio_register.ioctls bitmask of ioctls that are suitable to resolve |
|---|
| 92 | +the range. The ``UFFDIO_REGISTER`` ioctl will return the |
|---|
| 93 | +``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve |
|---|
| 92 | 94 | userfaults on the range registered. Not all ioctls will necessarily be |
|---|
| 93 | | -supported for all memory types depending on the underlying virtual |
|---|
| 94 | | -memory backend (anonymous memory vs tmpfs vs real filebacked |
|---|
| 95 | | -mappings). |
|---|
| 95 | +supported for all memory types (e.g. anonymous memory vs. shmem vs. |
|---|
| 96 | +hugetlbfs), or all types of intercepted faults. |
|---|
| 96 | 97 | |
|---|
| 97 | | -Userland can use the uffdio_register.ioctls to manage the virtual |
|---|
| 98 | +Userland can use the ``uffdio_register.ioctls`` to manage the virtual |
|---|
| 98 | 99 | address space in the background (to add or potentially also remove |
|---|
| 99 | | -memory from the userfaultfd registered range). This means a userfault |
|---|
| 100 | +memory from the ``userfaultfd`` registered range). This means a userfault |
|---|
| 100 | 101 | could be triggering just before userland maps in the background the |
|---|
| 101 | 102 | user-faulted page. |
|---|
| 102 | 103 | |
|---|
| 103 | | -The primary ioctl to resolve userfaults is UFFDIO_COPY. That |
|---|
| 104 | | -atomically copies a page into the userfault registered range and wakes |
|---|
| 105 | | -up the blocked userfaults (unless uffdio_copy.mode & |
|---|
| 106 | | -UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to |
|---|
| 107 | | -UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an |
|---|
| 108 | | -half copied page since it'll keep userfaulting until the copy has |
|---|
| 109 | | -finished. |
|---|
| 104 | +Resolving Userfaults |
|---|
| 105 | +-------------------- |
|---|
| 106 | + |
|---|
| 107 | +There are three basic ways to resolve userfaults: |
|---|
| 108 | + |
|---|
| 109 | +- ``UFFDIO_COPY`` atomically copies some existing page contents from |
|---|
| 110 | + userspace. |
|---|
| 111 | + |
|---|
| 112 | +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. |
|---|
| 113 | + |
|---|
| 114 | +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. |
|---|
| 115 | + |
|---|
| 116 | +These operations are atomic in the sense that they guarantee nothing can |
|---|
| 117 | +see a half-populated page, since readers will keep userfaulting until the |
|---|
| 118 | +operation has finished. |
|---|
| 119 | + |
|---|
| 120 | +By default, these wake up userfaults blocked on the range in question. |
|---|
| 121 | +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates |
|---|
| 122 | +that waking will be done separately at some later time. |
|---|
| 123 | + |
|---|
| 124 | +Which ioctl to choose depends on the kind of page fault, and what we'd |
|---|
| 125 | +like to do to resolve it: |
|---|
| 126 | + |
|---|
| 127 | +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be |
|---|
| 128 | + resolved by either providing a new page (``UFFDIO_COPY``), or mapping |
|---|
| 129 | + the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map |
|---|
| 130 | + the zero page for a missing fault. With userfaultfd, userspace can |
|---|
| 131 | + decide what content to provide before the faulting thread continues. |
|---|
| 132 | + |
|---|
| 133 | +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in |
|---|
| 134 | + the page cache). Userspace has the option of modifying the page's |
|---|
| 135 | + contents before resolving the fault. Once the contents are correct |
|---|
| 136 | + (modified or not), userspace asks the kernel to map the page and let the |
|---|
| 137 | + faulting thread continue with ``UFFDIO_CONTINUE``. |
|---|
| 138 | + |
|---|
| 139 | +Notes: |
|---|
| 140 | + |
|---|
| 141 | +- You can tell which kind of fault occurred by examining |
|---|
| 142 | + ``pagefault.flags`` within the ``uffd_msg``, checking for the |
|---|
| 143 | + ``UFFD_PAGEFAULT_FLAG_*`` flags. |
|---|
| 144 | + |
|---|
| 145 | +- None of the page-delivering ioctls default to the range that you |
|---|
| 146 | + registered with. You must fill in all fields for the appropriate |
|---|
| 147 | + ioctl struct including the range. |
|---|
| 148 | + |
|---|
| 149 | +- You get the address of the access that triggered the missing page |
|---|
| 150 | + event out of a struct uffd_msg that you read in the thread from the |
|---|
| 151 | + uffd. You can supply as many pages as you want with these IOCTLs. |
|---|
| 152 | + Keep in mind that unless you used DONTWAKE then the first of any of |
|---|
| 153 | + those IOCTLs wakes up the faulting thread. |
|---|
| 154 | + |
|---|
| 155 | +- Be sure to test for all errors including |
|---|
| 156 | + (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges |
|---|
| 157 | + supplied were incorrect. |
|---|
| 158 | + |
|---|
| 159 | +Write Protect Notifications |
|---|
| 160 | +--------------------------- |
|---|
| 161 | + |
|---|
| 162 | +This is equivalent to (but faster than) using mprotect and a SIGSEGV |
|---|
| 163 | +signal handler. |
|---|
| 164 | + |
|---|
| 165 | +Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``. |
|---|
| 166 | +Instead of using mprotect(2) you use |
|---|
| 167 | +``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)`` |
|---|
| 168 | +while ``mode = UFFDIO_WRITEPROTECT_MODE_WP`` |
|---|
| 169 | +in the struct passed in. The range does not default to and does not |
|---|
| 170 | +have to be identical to the range you registered with. You can write |
|---|
| 171 | +protect as many ranges as you like (inside the registered range). |
|---|
| 172 | +Then, in the thread reading from uffd the struct will have |
|---|
| 173 | +``msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP`` set. Now you send |
|---|
| 174 | +``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)`` |
|---|
| 175 | +again while ``pagefault.mode`` does not have ``UFFDIO_WRITEPROTECT_MODE_WP`` |
|---|
| 176 | +set. This wakes up the thread which will continue to run with writes. This |
|---|
| 177 | +allows you to do the bookkeeping about the write in the uffd reading |
|---|
| 178 | +thread before the ioctl. |
|---|
| 179 | + |
|---|
| 180 | +If you registered with both ``UFFDIO_REGISTER_MODE_MISSING`` and |
|---|
| 181 | +``UFFDIO_REGISTER_MODE_WP`` then you need to think about the sequence in |
|---|
| 182 | +which you supply a page and undo write protect. Note that there is a |
|---|
| 183 | +difference between writes into a WP area and into a !WP area. The |
|---|
| 184 | +former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter |
|---|
| 185 | +``UFFD_PAGEFAULT_FLAG_WRITE``. The latter did not fail on protection but |
|---|
| 186 | +you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was |
|---|
| 187 | +used. |
|---|
| 110 | 188 | |
|---|
| 111 | 189 | QEMU/KVM |
|---|
| 112 | 190 | ======== |
|---|
| 113 | 191 | |
|---|
| 114 | | -QEMU/KVM is using the userfaultfd syscall to implement postcopy live |
|---|
| 192 | +QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live |
|---|
| 115 | 193 | migration. Postcopy live migration is one form of memory |
|---|
| 116 | 194 | externalization consisting of a virtual machine running with part or |
|---|
| 117 | 195 | all of its memory residing on a different node in the cloud. The |
|---|
| 118 | | -userfaultfd abstraction is generic enough that not a single line of |
|---|
| 196 | +``userfaultfd`` abstraction is generic enough that not a single line of |
|---|
| 119 | 197 | KVM kernel code had to be modified in order to add postcopy live |
|---|
| 120 | 198 | migration to QEMU. |
|---|
| 121 | 199 | |
|---|
| 122 | | -Guest async page faults, FOLL_NOWAIT and all other GUP features work |
|---|
| 200 | +Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work |
|---|
| 123 | 201 | just fine in combination with userfaults. Userfaults trigger async |
|---|
| 124 | 202 | page faults in the guest scheduler so those guest processes that |
|---|
| 125 | 203 | aren't waiting for userfaults (i.e. network bound) can keep running in |
|---|
| .. | .. |
|---|
| 132 | 210 | The implementation of postcopy live migration currently uses one |
|---|
| 133 | 211 | single bidirectional socket but in the future two different sockets |
|---|
| 134 | 212 | will be used (to reduce the latency of the userfaults to the minimum |
|---|
| 135 | | -possible without having to decrease /proc/sys/net/ipv4/tcp_wmem). |
|---|
| 213 | +possible without having to decrease ``/proc/sys/net/ipv4/tcp_wmem``). |
|---|
| 136 | 214 | |
|---|
| 137 | 215 | The QEMU in the source node writes all pages that it knows are missing |
|---|
| 138 | 216 | in the destination node, into the socket, and the migration thread of |
|---|
| 139 | | -the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE |
|---|
| 140 | | -ioctls on the userfaultfd in order to map the received pages into the |
|---|
| 141 | | -guest (UFFDIO_ZEROCOPY is used if the source page was a zero page). |
|---|
| 217 | +the QEMU running in the destination node runs ``UFFDIO_COPY|ZEROPAGE`` |
|---|
| 218 | +ioctls on the ``userfaultfd`` in order to map the received pages into the |
|---|
| 219 | +guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page). |
|---|
| 142 | 220 | |
|---|
| 143 | 221 | A different postcopy thread in the destination node listens with |
|---|
| 144 | | -poll() to the userfaultfd in parallel. When a POLLIN event is |
|---|
| 222 | +poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is |
|---|
| 145 | 223 | generated after a userfault triggers, the postcopy thread read() from |
|---|
| 146 | | -the userfaultfd and receives the fault address (or -EAGAIN in case the |
|---|
| 147 | | -userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run |
|---|
| 224 | +the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the |
|---|
| 225 | +userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run |
|---|
| 148 | 226 | by the parallel QEMU migration thread). |
|---|
| 149 | 227 | |
|---|
| 150 | 228 | After the QEMU postcopy thread (running in the destination node) gets |
|---|
| .. | .. |
|---|
| 155 | 233 | (just the time to flush the tcp_wmem queue through the network) the |
|---|
| 156 | 234 | migration thread in the QEMU running in the destination node will |
|---|
| 157 | 235 | receive the page that triggered the userfault and it'll map it as |
|---|
| 158 | | -usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it |
|---|
| 236 | +usual with the ``UFFDIO_COPY|ZEROPAGE`` (without actually knowing if it |
|---|
| 159 | 237 | was spontaneously sent by the source or if it was an urgent page |
|---|
| 160 | 238 | requested through a userfault). |
|---|
| 161 | 239 | |
|---|
| .. | .. |
|---|
| 168 | 246 | over it when receiving incoming userfaults. After sending each page of |
|---|
| 169 | 247 | course the bitmap is updated accordingly. It's also useful to avoid |
|---|
| 170 | 248 | sending the same page twice (in case the userfault is read by the |
|---|
| 171 | | -postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration |
|---|
| 249 | +postcopy thread just before ``UFFDIO_COPY|ZEROPAGE`` runs in the migration |
|---|
| 172 | 250 | thread). |
|---|
| 173 | 251 | |
|---|
| 174 | 252 | Non-cooperative userfaultfd |
|---|
| 175 | 253 | =========================== |
|---|
| 176 | 254 | |
|---|
| 177 | | -When the userfaultfd is monitored by an external manager, the manager |
|---|
| 255 | +When the ``userfaultfd`` is monitored by an external manager, the manager |
|---|
| 178 | 256 | must be able to track changes in the process virtual memory |
|---|
| 179 | 257 | layout. Userfaultfd can notify the manager about such changes using |
|---|
| 180 | 258 | the same read(2) protocol as for the page fault notifications. The |
|---|
| 181 | 259 | manager has to explicitly enable these events by setting appropriate |
|---|
| 182 | | -bits in uffdio_api.features passed to UFFDIO_API ioctl: |
|---|
| 260 | +bits in ``uffdio_api.features`` passed to ``UFFDIO_API`` ioctl: |
|---|
| 183 | 261 | |
|---|
| 184 | | -UFFD_FEATURE_EVENT_FORK |
|---|
| 185 | | - enable userfaultfd hooks for fork(). When this feature is |
|---|
| 186 | | - enabled, the userfaultfd context of the parent process is |
|---|
| 262 | +``UFFD_FEATURE_EVENT_FORK`` |
|---|
| 263 | + enable ``userfaultfd`` hooks for fork(). When this feature is |
|---|
| 264 | + enabled, the ``userfaultfd`` context of the parent process is |
|---|
| 187 | 265 | duplicated into the newly created process. The manager |
|---|
| 188 | | - receives UFFD_EVENT_FORK with file descriptor of the new |
|---|
| 189 | | - userfaultfd context in the uffd_msg.fork. |
|---|
| 266 | + receives ``UFFD_EVENT_FORK`` with file descriptor of the new |
|---|
| 267 | + ``userfaultfd`` context in the ``uffd_msg.fork``. |
|---|
| 190 | 268 | |
|---|
| 191 | | -UFFD_FEATURE_EVENT_REMAP |
|---|
| 269 | +``UFFD_FEATURE_EVENT_REMAP`` |
|---|
| 192 | 270 | enable notifications about mremap() calls. When the |
|---|
| 193 | 271 | non-cooperative process moves a virtual memory area to a |
|---|
| 194 | 272 | different location, the manager will receive |
|---|
| 195 | | - UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and |
|---|
| 273 | + ``UFFD_EVENT_REMAP``. The ``uffd_msg.remap`` will contain the old and |
|---|
| 196 | 274 | new addresses of the area and its original length. |
|---|
| 197 | 275 | |
|---|
| 198 | | -UFFD_FEATURE_EVENT_REMOVE |
|---|
| 276 | +``UFFD_FEATURE_EVENT_REMOVE`` |
|---|
| 199 | 277 | enable notifications about madvise(MADV_REMOVE) and |
|---|
| 200 | | - madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will |
|---|
| 201 | | - be generated upon these calls to madvise. The uffd_msg.remove |
|---|
| 278 | + madvise(MADV_DONTNEED) calls. The event ``UFFD_EVENT_REMOVE`` will |
|---|
| 279 | + be generated upon these calls to madvise(). The ``uffd_msg.remove`` |
|---|
| 202 | 280 | will contain start and end addresses of the removed area. |
|---|
| 203 | 281 | |
|---|
| 204 | | -UFFD_FEATURE_EVENT_UNMAP |
|---|
| 282 | +``UFFD_FEATURE_EVENT_UNMAP`` |
|---|
| 205 | 283 | enable notifications about memory unmapping. The manager will |
|---|
| 206 | | - get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and |
|---|
| 284 | + get ``UFFD_EVENT_UNMAP`` with ``uffd_msg.remove`` containing start and |
|---|
| 207 | 285 | end addresses of the unmapped area. |
|---|
| 208 | 286 | |
|---|
| 209 | | -Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP |
|---|
| 287 | +Although the ``UFFD_FEATURE_EVENT_REMOVE`` and ``UFFD_FEATURE_EVENT_UNMAP`` |
|---|
| 210 | 288 | are pretty similar, they quite differ in the action expected from the |
|---|
| 211 | | -userfaultfd manager. In the former case, the virtual memory is |
|---|
| 289 | +``userfaultfd`` manager. In the former case, the virtual memory is |
|---|
| 212 | 290 | removed, but the area is not, the area remains monitored by the |
|---|
| 213 | | -userfaultfd, and if a page fault occurs in that area it will be |
|---|
| 291 | +``userfaultfd``, and if a page fault occurs in that area it will be |
|---|
| 214 | 292 | delivered to the manager. The proper resolution for such page fault is |
|---|
| 215 | 293 | to zeromap the faulting address. However, in the latter case, when an |
|---|
| 216 | 294 | area is unmapped, either explicitly (with munmap() system call), or |
|---|
| 217 | 295 | implicitly (e.g. during mremap()), the area is removed and in turn the |
|---|
| 218 | | -userfaultfd context for such area disappears too and the manager will |
|---|
| 296 | +``userfaultfd`` context for such area disappears too and the manager will |
|---|
| 219 | 297 | not get further userland page faults from the removed area. Still, the |
|---|
| 220 | 298 | notification is required in order to prevent manager from using |
|---|
| 221 | | -UFFDIO_COPY on the unmapped area. |
|---|
| 299 | +``UFFDIO_COPY`` on the unmapped area. |
|---|
| 222 | 300 | |
|---|
| 223 | 301 | Unlike userland page faults which have to be synchronous and require |
|---|
| 224 | 302 | explicit or implicit wakeup, all the events are delivered |
|---|
| 225 | 303 | asynchronously and the non-cooperative process resumes execution as |
|---|
| 226 | | -soon as manager executes read(). The userfaultfd manager should |
|---|
| 227 | | -carefully synchronize calls to UFFDIO_COPY with the events |
|---|
| 228 | | -processing. To aid the synchronization, the UFFDIO_COPY ioctl will |
|---|
| 229 | | -return -ENOSPC when the monitored process exits at the time of |
|---|
| 230 | | -UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed |
|---|
| 231 | | -its virtual memory layout simultaneously with outstanding UFFDIO_COPY |
|---|
| 304 | +soon as manager executes read(). The ``userfaultfd`` manager should |
|---|
| 305 | +carefully synchronize calls to ``UFFDIO_COPY`` with the events |
|---|
| 306 | +processing. To aid the synchronization, the ``UFFDIO_COPY`` ioctl will |
|---|
| 307 | +return ``-ENOSPC`` when the monitored process exits at the time of |
|---|
| 308 | +``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed |
|---|
| 309 | +its virtual memory layout simultaneously with outstanding ``UFFDIO_COPY`` |
|---|
| 232 | 310 | operation. |
|---|
| 233 | 311 | |
|---|
| 234 | 312 | The current asynchronous model of the event delivery is optimal for |
|---|
| 235 | | -single threaded non-cooperative userfaultfd manager implementations. A |
|---|
| 313 | +single threaded non-cooperative ``userfaultfd`` manager implementations. A |
|---|
| 236 | 314 | synchronous event delivery model can be added later as a new |
|---|
| 237 | | -userfaultfd feature to facilitate multithreading enhancements of the |
|---|
| 238 | | -non cooperative manager, for example to allow UFFDIO_COPY ioctls to |
|---|
| 315 | +``userfaultfd`` feature to facilitate multithreading enhancements of the |
|---|
| 316 | +non cooperative manager, for example to allow ``UFFDIO_COPY`` ioctls to |
|---|
| 239 | 317 | run in parallel to the event reception. Single threaded |
|---|
| 240 | 318 | implementations should continue to use the current async event |
|---|
| 241 | 319 | delivery model instead. |
|---|