.. | .. |
---|
54 | 54 | Types of errors |
---|
55 | 55 | --------------- |
---|
56 | 56 | |
---|
57 | | -Most mechanisms used on modern systems use use technologies like Hamming |
---|
| 57 | +Most mechanisms used on modern systems use technologies like Hamming |
---|
58 | 58 | Codes that allow error correction when the number of errors on a bit packet |
---|
59 | 59 | is below a threshold. If the number of errors is above, those mechanisms |
---|
60 | 60 | can indicate with a high degree of confidence that an error happened, but |
---|
.. | .. |
---|
156 | 156 | ECC memory |
---|
157 | 157 | ---------- |
---|
158 | 158 | |
---|
159 | | -As mentioned on the previous section, ECC memory has extra bits to be |
---|
160 | | -used for error correction. So, on 64 bit systems, a memory module |
---|
161 | | -has 64 bits of *data width*, and 74 bits of *total width*. So, there are |
---|
162 | | -8 bits extra bits to be used for the error detection and correction |
---|
163 | | -mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. |
---|
| 159 | +As mentioned in the previous section, ECC memory has extra bits to be |
---|
| 160 | +used for error correction. In the above example, a memory module has |
---|
| 161 | +64 bits of *data width*, and 72 bits of *total width*. The extra 8 |
---|
| 162 | +bits which are used for the error detection and correction mechanisms |
---|
| 163 | +are referred to as the *syndrome*\ [#f1]_\ [#f2]_. |
---|
164 | 164 | |
---|
165 | 165 | So, when the cpu requests the memory controller to write a word with |
---|
166 | 166 | *data width*, the memory controller calculates the *syndrome* in real time, |
---|
.. | .. |
---|
199 | 199 | mode). |
---|
200 | 200 | |
---|
201 | 201 | .. [#f3] For more details about the Machine Check Architecture (MCA), |
---|
202 | | - please read Documentation/x86/x86_64/machinecheck at the Kernel tree. |
---|
| 202 | + please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree. |
---|
203 | 203 | |
---|
204 | 204 | EDAC - Error Detection And Correction |
---|
205 | 205 | ************************************* |
---|
.. | .. |
---|
212 | 212 | purposes. |
---|
213 | 213 | |
---|
214 | 214 | When the subsystem was pushed upstream for the first time, on |
---|
215 | | - Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. |
---|
| 215 | + Kernel 2.6.16, it was renamed to ``EDAC``. |
---|
216 | 216 | |
---|
217 | 217 | Purpose |
---|
218 | 218 | ------- |
---|
.. | .. |
---|
330 | 330 | |
---|
331 | 331 | .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely |
---|
332 | 332 | used to refer to a memory module, although there are other memory |
---|
333 | | - packaging alternatives, like SO-DIMM, SIMM, etc. Along this document, |
---|
334 | | - and inside the EDAC system, the term "dimm" is used for all memory |
---|
335 | | - modules, even when they use a different kind of packaging. |
---|
| 333 | + packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI |
---|
| 334 | + specification (Version 2.7) defines a memory module in the Common |
---|
| 335 | + Platform Error Record (CPER) section to be an SMBIOS Memory Device |
---|
| 336 | + (Type 17). Along this document, and inside the EDAC subsystem, the term |
---|
| 337 | + "dimm" is used for all memory modules, even when they use a |
---|
| 338 | + different kind of packaging. |
---|
336 | 339 | |
---|
337 | 340 | Memory controllers allow for several csrows, with 8 csrows being a |
---|
338 | 341 | typical value. Yet, the actual number of csrows depends on the layout of |
---|
.. | .. |
---|
348 | 351 | +------------+-----------+-----------+ |
---|
349 | 352 | | | ``ch0`` | ``ch1`` | |
---|
350 | 353 | +============+===========+===========+ |
---|
351 | | - | ``csrow0`` | DIMM_A0 | DIMM_B0 | |
---|
352 | | - +------------+ | | |
---|
353 | | - | ``csrow1`` | | | |
---|
| 354 | + | |**DIMM_A0**|**DIMM_B0**| |
---|
354 | 355 | +------------+-----------+-----------+ |
---|
355 | | - | ``csrow2`` | DIMM_A1 | DIMM_B1 | |
---|
356 | | - +------------+ | | |
---|
357 | | - | ``csrow3`` | | | |
---|
| 356 | + | ``csrow0`` | rank0 | rank0 | |
---|
| 357 | + +------------+-----------+-----------+ |
---|
| 358 | + | ``csrow1`` | rank1 | rank1 | |
---|
| 359 | + +------------+-----------+-----------+ |
---|
| 360 | + | |**DIMM_A1**|**DIMM_B1**| |
---|
| 361 | + +------------+-----------+-----------+ |
---|
| 362 | + | ``csrow2`` | rank0 | rank0 | |
---|
| 363 | + +------------+-----------+-----------+ |
---|
| 364 | + | ``csrow3`` | rank1 | rank1 | |
---|
358 | 365 | +------------+-----------+-----------+ |
---|
359 | 366 | |
---|
360 | 367 | In the above example, there are 4 physical slots on the motherboard |
---|
.. | .. |
---|
374 | 381 | Channel, the csrows cross both DIMMs. |
---|
375 | 382 | |
---|
376 | 383 | Memory DIMMs come single or dual "ranked". A rank is a populated csrow. |
---|
377 | | -Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above |
---|
378 | | -will have just one csrow (csrow0). csrow1 will be empty. On the other |
---|
379 | | -hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0 |
---|
380 | | -and csrow1 will be populated. The pattern repeats itself for csrow2 and |
---|
381 | | -csrow3. |
---|
| 384 | +In the example above 2 dual ranked DIMMs are similarly placed. Thus, |
---|
| 385 | +both csrow0 and csrow1 are populated. On the other hand, when 2 single |
---|
| 386 | +ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will |
---|
| 387 | +have just one csrow (csrow0) and csrow1 will be empty. The pattern |
---|
| 388 | +repeats itself for csrow2 and csrow3. Also note that some memory |
---|
| 389 | +controllers don't have any logic to identify the memory module, see |
---|
| 390 | +``rankX`` directories below. |
---|
382 | 391 | |
---|
383 | 392 | The representation of the above is reflected in the directory |
---|
384 | 393 | tree in EDAC's sysfs interface. Starting in directory |
---|