.. | .. |
---|
4 | 4 | Concepts overview |
---|
5 | 5 | ================= |
---|
6 | 6 | |
---|
7 | | -The memory management in Linux is complex system that evolved over the |
---|
8 | | -years and included more and more functionality to support variety of |
---|
| 7 | +The memory management in Linux is a complex system that evolved over the |
---|
| 8 | +years and included more and more functionality to support a variety of |
---|
9 | 9 | systems from MMU-less microcontrollers to supercomputers. The memory |
---|
10 | | -management for systems without MMU is called ``nommu`` and it |
---|
| 10 | +management for systems without an MMU is called ``nommu`` and it |
---|
11 | 11 | definitely deserves a dedicated document, which hopefully will be |
---|
12 | 12 | eventually written. Yet, although some of the concepts are the same, |
---|
13 | | -here we assume that MMU is available and CPU can translate a virtual |
---|
| 13 | +here we assume that an MMU is available and a CPU can translate a virtual |
---|
14 | 14 | address to a physical address. |
---|
15 | 15 | |
---|
16 | 16 | .. contents:: :local: |
---|
.. | .. |
---|
21 | 21 | The physical memory in a computer system is a limited resource and |
---|
22 | 22 | even for systems that support memory hotplug there is a hard limit on |
---|
23 | 23 | the amount of memory that can be installed. The physical memory is not |
---|
24 | | -necessary contiguous, it might be accessible as a set of distinct |
---|
| 24 | +necessarily contiguous; it might be accessible as a set of distinct |
---|
25 | 25 | address ranges. Besides, different CPU architectures, and even |
---|
26 | | -different implementations of the same architecture have different view |
---|
27 | | -how these address ranges defined. |
---|
| 26 | +different implementations of the same architecture have different views |
---|
| 27 | +of how these address ranges are defined. |
---|
28 | 28 | |
---|
29 | 29 | All this makes dealing directly with physical memory quite complex and |
---|
30 | 30 | to avoid this complexity a concept of virtual memory was developed. |
---|
.. | .. |
---|
35 | 35 | protection and controlled sharing of data between processes. |
---|
36 | 36 | |
---|
37 | 37 | With virtual memory, each and every memory access uses a virtual |
---|
38 | | -address. When the CPU decodes the an instruction that reads (or |
---|
| 38 | +address. When the CPU decodes an instruction that reads (or |
---|
39 | 39 | writes) from (or to) the system memory, it translates the `virtual` |
---|
40 | 40 | address encoded in that instruction to a `physical` address that the |
---|
41 | 41 | memory controller can understand. |
---|
.. | .. |
---|
48 | 48 | |
---|
49 | 49 | Each physical memory page can be mapped as one or more virtual |
---|
50 | 50 | pages. These mappings are described by page tables that allow |
---|
51 | | -translation from virtual address used by programs to real address in |
---|
52 | | -the physical memory. The page tables organized hierarchically. |
---|
| 51 | +translation from a virtual address used by programs to the physical |
---|
| 52 | +memory address. The page tables are organized hierarchically. |
---|
53 | 53 | |
---|
54 | 54 | The tables at the lowest level of the hierarchy contain physical |
---|
55 | 55 | addresses of actual pages used by the software. The tables at higher |
---|
.. | .. |
---|
121 | 121 | Many multi-processor machines are NUMA - Non-Uniform Memory Access - |
---|
122 | 122 | systems. In such systems the memory is arranged into banks that have |
---|
123 | 123 | different access latency depending on the "distance" from the |
---|
124 | | -processor. Each bank is referred as `node` and for each node Linux |
---|
125 | | -constructs an independent memory management subsystem. A node has it's |
---|
| 124 | +processor. Each bank is referred to as a `node` and for each node Linux |
---|
| 125 | +constructs an independent memory management subsystem. A node has its |
---|
126 | 126 | own set of zones, lists of free and used pages and various statistics |
---|
127 | 127 | counters. You can find more details about NUMA in |
---|
128 | 128 | :ref:`Documentation/vm/numa.rst <numa>` and in |
---|
.. | .. |
---|
149 | 149 | call. Usually, the anonymous mappings only define virtual memory areas |
---|
150 | 150 | that the program is allowed to access. The read accesses will result |
---|
151 | 151 | in creation of a page table entry that references a special physical |
---|
152 | | -page filled with zeroes. When the program performs a write, regular |
---|
| 152 | +page filled with zeroes. When the program performs a write, a regular |
---|
153 | 153 | physical page will be allocated to hold the written data. The page |
---|
154 | | -will be marked dirty and if the kernel will decide to repurpose it, |
---|
| 154 | +will be marked dirty and if the kernel decides to repurpose it, |
---|
155 | 155 | the dirty page will be swapped out. |
---|
156 | 156 | |
---|
157 | 157 | Reclaim |
---|
.. | .. |
---|
181 | 181 | The process of freeing the reclaimable physical memory pages and |
---|
182 | 182 | repurposing them is called (surprise!) `reclaim`. Linux can reclaim |
---|
183 | 183 | pages either asynchronously or synchronously, depending on the state |
---|
184 | | -of the system. When system is not loaded, most of the memory is free |
---|
185 | | -and allocation request will be satisfied immediately from the free |
---|
| 184 | +of the system. When the system is not loaded, most of the memory is free |
---|
| 185 | +and allocation requests will be satisfied immediately from the free |
---|
186 | 186 | pages supply. As the load increases, the amount of the free pages goes |
---|
187 | 187 | down and when it reaches a certain threshold (high watermark), an |
---|
188 | 188 | allocation request will awaken the ``kswapd`` daemon. It will |
---|
.. | .. |
---|
190 | 190 | they contain is available elsewhere, or evict to the backing storage |
---|
191 | 191 | device (remember those dirty pages?). As memory usage increases even |
---|
192 | 192 | more and reaches another threshold - min watermark - an allocation |
---|
193 | | -will trigger the `direct reclaim`. In this case allocation is stalled |
---|
| 193 | +will trigger `direct reclaim`. In this case allocation is stalled |
---|
194 | 194 | until enough memory pages are reclaimed to satisfy the request. |
---|
195 | 195 | |
---|
196 | 196 | Compaction |
---|
.. | .. |
---|
200 | 200 | fragmented. Although with virtual memory it is possible to present |
---|
201 | 201 | scattered physical pages as virtually contiguous range, sometimes it is |
---|
202 | 202 | necessary to allocate large physically contiguous memory areas. Such |
---|
203 | | -need may arise, for instance, when a device driver requires large |
---|
| 203 | +need may arise, for instance, when a device driver requires a large |
---|
204 | 204 | buffer for DMA, or when THP allocates a huge page. Memory `compaction` |
---|
205 | 205 | addresses the fragmentation issue. This mechanism moves occupied pages |
---|
206 | 206 | from the lower part of a memory zone to free pages in the upper part |
---|
.. | .. |
---|
208 | 208 | together at the beginning of the zone and allocations of large |
---|
209 | 209 | physically contiguous areas become possible. |
---|
210 | 210 | |
---|
211 | | -Like reclaim, the compaction may happen asynchronously in ``kcompactd`` |
---|
212 | | -daemon or synchronously as a result of memory allocation request. |
---|
| 211 | +Like reclaim, the compaction may happen asynchronously in the ``kcompactd`` |
---|
| 212 | +daemon or synchronously as a result of a memory allocation request. |
---|
213 | 213 | |
---|
214 | 214 | OOM killer |
---|
215 | 215 | ========== |
---|
216 | 216 | |
---|
217 | | -It may happen, that on a loaded machine memory will be exhausted. When |
---|
218 | | -the kernel detects that the system runs out of memory (OOM) it invokes |
---|
219 | | -`OOM killer`. Its mission is simple: all it has to do is to select a |
---|
220 | | -task to sacrifice for the sake of the overall system health. The |
---|
221 | | -selected task is killed in a hope that after it exits enough memory |
---|
222 | | -will be freed to continue normal operation. |
---|
| 217 | +It is possible that on a loaded machine memory will be exhausted and the |
---|
| 218 | +kernel will be unable to reclaim enough memory to continue to operate. In |
---|
| 219 | +order to save the rest of the system, it invokes the `OOM killer`. |
---|
| 220 | + |
---|
| 221 | +The `OOM killer` selects a task to sacrifice for the sake of the overall |
---|
| 222 | +system health. The selected task is killed in a hope that after it exits |
---|
| 223 | +enough memory will be freed to continue normal operation. |
---|