| .. | .. |
|---|
| 13 | 13 | |
|---|
| 14 | 14 | This patchkit implements the necessary infrastructure in the VM. |
|---|
| 15 | 15 | |
|---|
| 16 | | -To quote the overview comment: |
|---|
| 16 | +To quote the overview comment:: |
|---|
| 17 | 17 | |
|---|
| 18 | | - * High level machine check handler. Handles pages reported by the |
|---|
| 19 | | - * hardware as being corrupted usually due to a 2bit ECC memory or cache |
|---|
| 20 | | - * failure. |
|---|
| 21 | | - * |
|---|
| 22 | | - * This focusses on pages detected as corrupted in the background. |
|---|
| 23 | | - * When the current CPU tries to consume corruption the currently |
|---|
| 24 | | - * running process can just be killed directly instead. This implies |
|---|
| 25 | | - * that if the error cannot be handled for some reason it's safe to |
|---|
| 26 | | - * just ignore it because no corruption has been consumed yet. Instead |
|---|
| 27 | | - * when that happens another machine check will happen. |
|---|
| 28 | | - * |
|---|
| 29 | | - * Handles page cache pages in various states. The tricky part |
|---|
| 30 | | - * here is that we can access any page asynchronous to other VM |
|---|
| 31 | | - * users, because memory failures could happen anytime and anywhere, |
|---|
| 32 | | - * possibly violating some of their assumptions. This is why this code |
|---|
| 33 | | - * has to be extremely careful. Generally it tries to use normal locking |
|---|
| 34 | | - * rules, as in get the standard locks, even if that means the |
|---|
| 35 | | - * error handling takes potentially a long time. |
|---|
| 36 | | - * |
|---|
| 37 | | - * Some of the operations here are somewhat inefficient and have non |
|---|
| 38 | | - * linear algorithmic complexity, because the data structures have not |
|---|
| 39 | | - * been optimized for this case. This is in particular the case |
|---|
| 40 | | - * for the mapping from a vma to a process. Since this case is expected |
|---|
| 41 | | - * to be rare we hope we can get away with this. |
|---|
| 18 | + High level machine check handler. Handles pages reported by the |
|---|
| 19 | + hardware as being corrupted usually due to a 2bit ECC memory or cache |
|---|
| 20 | + failure. |
|---|
| 21 | + |
|---|
| 22 | + This focusses on pages detected as corrupted in the background. |
|---|
| 23 | + When the current CPU tries to consume corruption the currently |
|---|
| 24 | + running process can just be killed directly instead. This implies |
|---|
| 25 | + that if the error cannot be handled for some reason it's safe to |
|---|
| 26 | + just ignore it because no corruption has been consumed yet. Instead |
|---|
| 27 | + when that happens another machine check will happen. |
|---|
| 28 | + |
|---|
| 29 | + Handles page cache pages in various states. The tricky part |
|---|
| 30 | + here is that we can access any page asynchronous to other VM |
|---|
| 31 | + users, because memory failures could happen anytime and anywhere, |
|---|
| 32 | + possibly violating some of their assumptions. This is why this code |
|---|
| 33 | + has to be extremely careful. Generally it tries to use normal locking |
|---|
| 34 | + rules, as in get the standard locks, even if that means the |
|---|
| 35 | + error handling takes potentially a long time. |
|---|
| 36 | + |
|---|
| 37 | + Some of the operations here are somewhat inefficient and have non |
|---|
| 38 | + linear algorithmic complexity, because the data structures have not |
|---|
| 39 | + been optimized for this case. This is in particular the case |
|---|
| 40 | + for the mapping from a vma to a process. Since this case is expected |
|---|
| 41 | + to be rare we hope we can get away with this. |
|---|
| 42 | 42 | |
|---|
| 43 | 43 | The code consists of a the high level handler in mm/memory-failure.c, |
|---|
| 44 | 44 | a new page poison bit and various checks in the VM to handle poisoned |
|---|