| .. | .. |
|---|
| 54 | 54 | Types of errors |
|---|
| 55 | 55 | --------------- |
|---|
| 56 | 56 | |
|---|
| 57 | | -Most mechanisms used on modern systems use use technologies like Hamming |
|---|
| 57 | +Most mechanisms used on modern systems use technologies like Hamming |
|---|
| 58 | 58 | Codes that allow error correction when the number of errors on a bit packet |
|---|
| 59 | 59 | is below a threshold. If the number of errors is above, those mechanisms |
|---|
| 60 | 60 | can indicate with a high degree of confidence that an error happened, but |
|---|
| .. | .. |
|---|
| 156 | 156 | ECC memory |
|---|
| 157 | 157 | ---------- |
|---|
| 158 | 158 | |
|---|
| 159 | | -As mentioned on the previous section, ECC memory has extra bits to be |
|---|
| 160 | | -used for error correction. So, on 64 bit systems, a memory module |
|---|
| 161 | | -has 64 bits of *data width*, and 74 bits of *total width*. So, there are |
|---|
| 162 | | -8 bits extra bits to be used for the error detection and correction |
|---|
| 163 | | -mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_. |
|---|
| 159 | +As mentioned in the previous section, ECC memory has extra bits to be |
|---|
| 160 | +used for error correction. In the above example, a memory module has |
|---|
| 161 | +64 bits of *data width*, and 72 bits of *total width*. The extra 8 |
|---|
| 162 | +bits which are used for the error detection and correction mechanisms |
|---|
| 163 | +are referred to as the *syndrome*\ [#f1]_\ [#f2]_. |
|---|
| 164 | 164 | |
|---|
| 165 | 165 | So, when the cpu requests the memory controller to write a word with |
|---|
| 166 | 166 | *data width*, the memory controller calculates the *syndrome* in real time, |
|---|
| .. | .. |
|---|
| 199 | 199 | mode). |
|---|
| 200 | 200 | |
|---|
| 201 | 201 | .. [#f3] For more details about the Machine Check Architecture (MCA), |
|---|
| 202 | | - please read Documentation/x86/x86_64/machinecheck at the Kernel tree. |
|---|
| 202 | + please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree. |
|---|
| 203 | 203 | |
|---|
| 204 | 204 | EDAC - Error Detection And Correction |
|---|
| 205 | 205 | ************************************* |
|---|
| .. | .. |
|---|
| 212 | 212 | purposes. |
|---|
| 213 | 213 | |
|---|
| 214 | 214 | When the subsystem was pushed upstream for the first time, on |
|---|
| 215 | | - Kernel 2.6.16, for the first time, it was renamed to ``EDAC``. |
|---|
| 215 | + Kernel 2.6.16, it was renamed to ``EDAC``. |
|---|
| 216 | 216 | |
|---|
| 217 | 217 | Purpose |
|---|
| 218 | 218 | ------- |
|---|
| .. | .. |
|---|
| 330 | 330 | |
|---|
| 331 | 331 | .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely |
|---|
| 332 | 332 | used to refer to a memory module, although there are other memory |
|---|
| 333 | | - packaging alternatives, like SO-DIMM, SIMM, etc. Along this document, |
|---|
| 334 | | - and inside the EDAC system, the term "dimm" is used for all memory |
|---|
| 335 | | - modules, even when they use a different kind of packaging. |
|---|
| 333 | + packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI |
|---|
| 334 | + specification (Version 2.7) defines a memory module in the Common |
|---|
| 335 | + Platform Error Record (CPER) section to be an SMBIOS Memory Device |
|---|
| 336 | + (Type 17). Along this document, and inside the EDAC subsystem, the term |
|---|
| 337 | + "dimm" is used for all memory modules, even when they use a |
|---|
| 338 | + different kind of packaging. |
|---|
| 336 | 339 | |
|---|
| 337 | 340 | Memory controllers allow for several csrows, with 8 csrows being a |
|---|
| 338 | 341 | typical value. Yet, the actual number of csrows depends on the layout of |
|---|
| .. | .. |
|---|
| 348 | 351 | +------------+-----------+-----------+ |
|---|
| 349 | 352 | | | ``ch0`` | ``ch1`` | |
|---|
| 350 | 353 | +============+===========+===========+ |
|---|
| 351 | | - | ``csrow0`` | DIMM_A0 | DIMM_B0 | |
|---|
| 352 | | - +------------+ | | |
|---|
| 353 | | - | ``csrow1`` | | | |
|---|
| 354 | + | |**DIMM_A0**|**DIMM_B0**| |
|---|
| 354 | 355 | +------------+-----------+-----------+ |
|---|
| 355 | | - | ``csrow2`` | DIMM_A1 | DIMM_B1 | |
|---|
| 356 | | - +------------+ | | |
|---|
| 357 | | - | ``csrow3`` | | | |
|---|
| 356 | + | ``csrow0`` | rank0 | rank0 | |
|---|
| 357 | + +------------+-----------+-----------+ |
|---|
| 358 | + | ``csrow1`` | rank1 | rank1 | |
|---|
| 359 | + +------------+-----------+-----------+ |
|---|
| 360 | + | |**DIMM_A1**|**DIMM_B1**| |
|---|
| 361 | + +------------+-----------+-----------+ |
|---|
| 362 | + | ``csrow2`` | rank0 | rank0 | |
|---|
| 363 | + +------------+-----------+-----------+ |
|---|
| 364 | + | ``csrow3`` | rank1 | rank1 | |
|---|
| 358 | 365 | +------------+-----------+-----------+ |
|---|
| 359 | 366 | |
|---|
| 360 | 367 | In the above example, there are 4 physical slots on the motherboard |
|---|
| .. | .. |
|---|
| 374 | 381 | Channel, the csrows cross both DIMMs. |
|---|
| 375 | 382 | |
|---|
| 376 | 383 | Memory DIMMs come single or dual "ranked". A rank is a populated csrow. |
|---|
| 377 | | -Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above |
|---|
| 378 | | -will have just one csrow (csrow0). csrow1 will be empty. On the other |
|---|
| 379 | | -hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0 |
|---|
| 380 | | -and csrow1 will be populated. The pattern repeats itself for csrow2 and |
|---|
| 381 | | -csrow3. |
|---|
| 384 | +In the example above 2 dual ranked DIMMs are similarly placed. Thus, |
|---|
| 385 | +both csrow0 and csrow1 are populated. On the other hand, when 2 single |
|---|
| 386 | +ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will |
|---|
| 387 | +have just one csrow (csrow0) and csrow1 will be empty. The pattern |
|---|
| 388 | +repeats itself for csrow2 and csrow3. Also note that some memory |
|---|
| 389 | +controllers don't have any logic to identify the memory module, see |
|---|
| 390 | +``rankX`` directories below. |
|---|
| 382 | 391 | |
|---|
| 383 | 392 | The representation of the above is reflected in the directory |
|---|
| 384 | 393 | tree in EDAC's sysfs interface. Starting in directory |
|---|