.. | .. |
---|
13 | 13 | |
---|
14 | 14 | This patchkit implements the necessary infrastructure in the VM. |
---|
15 | 15 | |
---|
16 | | -To quote the overview comment: |
---|
| 16 | +To quote the overview comment:: |
---|
17 | 17 | |
---|
18 | | - * High level machine check handler. Handles pages reported by the |
---|
19 | | - * hardware as being corrupted usually due to a 2bit ECC memory or cache |
---|
20 | | - * failure. |
---|
21 | | - * |
---|
22 | | - * This focusses on pages detected as corrupted in the background. |
---|
23 | | - * When the current CPU tries to consume corruption the currently |
---|
24 | | - * running process can just be killed directly instead. This implies |
---|
25 | | - * that if the error cannot be handled for some reason it's safe to |
---|
26 | | - * just ignore it because no corruption has been consumed yet. Instead |
---|
27 | | - * when that happens another machine check will happen. |
---|
28 | | - * |
---|
29 | | - * Handles page cache pages in various states. The tricky part |
---|
30 | | - * here is that we can access any page asynchronous to other VM |
---|
31 | | - * users, because memory failures could happen anytime and anywhere, |
---|
32 | | - * possibly violating some of their assumptions. This is why this code |
---|
33 | | - * has to be extremely careful. Generally it tries to use normal locking |
---|
34 | | - * rules, as in get the standard locks, even if that means the |
---|
35 | | - * error handling takes potentially a long time. |
---|
36 | | - * |
---|
37 | | - * Some of the operations here are somewhat inefficient and have non |
---|
38 | | - * linear algorithmic complexity, because the data structures have not |
---|
39 | | - * been optimized for this case. This is in particular the case |
---|
40 | | - * for the mapping from a vma to a process. Since this case is expected |
---|
41 | | - * to be rare we hope we can get away with this. |
---|
| 18 | + High level machine check handler. Handles pages reported by the |
---|
| 19 | + hardware as being corrupted usually due to a 2bit ECC memory or cache |
---|
| 20 | + failure. |
---|
| 21 | + |
---|
| 22 | + This focusses on pages detected as corrupted in the background. |
---|
| 23 | + When the current CPU tries to consume corruption the currently |
---|
| 24 | + running process can just be killed directly instead. This implies |
---|
| 25 | + that if the error cannot be handled for some reason it's safe to |
---|
| 26 | + just ignore it because no corruption has been consumed yet. Instead |
---|
| 27 | + when that happens another machine check will happen. |
---|
| 28 | + |
---|
| 29 | + Handles page cache pages in various states. The tricky part |
---|
| 30 | + here is that we can access any page asynchronous to other VM |
---|
| 31 | + users, because memory failures could happen anytime and anywhere, |
---|
| 32 | + possibly violating some of their assumptions. This is why this code |
---|
| 33 | + has to be extremely careful. Generally it tries to use normal locking |
---|
| 34 | + rules, as in get the standard locks, even if that means the |
---|
| 35 | + error handling takes potentially a long time. |
---|
| 36 | + |
---|
| 37 | + Some of the operations here are somewhat inefficient and have non |
---|
| 38 | + linear algorithmic complexity, because the data structures have not |
---|
| 39 | + been optimized for this case. This is in particular the case |
---|
| 40 | + for the mapping from a vma to a process. Since this case is expected |
---|
| 41 | + to be rare we hope we can get away with this. |
---|
42 | 42 | |
---|
43 | 43 | The code consists of a the high level handler in mm/memory-failure.c, |
---|
44 | 44 | a new page poison bit and various checks in the VM to handle poisoned |
---|