hc
2024-10-12 a5969cabbb4660eab42b6ef0412cbbd1200cf14d
kernel/Documentation/admin-guide/ras.rst
....@@ -54,7 +54,7 @@
5454 Types of errors
5555 ---------------
5656
57
-Most mechanisms used on modern systems use use technologies like Hamming
57
+Most mechanisms used on modern systems use technologies like Hamming
5858 Codes that allow error correction when the number of errors on a bit packet
5959 is below a threshold. If the number of errors is above, those mechanisms
6060 can indicate with a high degree of confidence that an error happened, but
....@@ -156,11 +156,11 @@
156156 ECC memory
157157 ----------
158158
159
-As mentioned on the previous section, ECC memory has extra bits to be
160
-used for error correction. So, on 64 bit systems, a memory module
161
-has 64 bits of *data width*, and 74 bits of *total width*. So, there are
162
-8 bits extra bits to be used for the error detection and correction
163
-mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_.
159
+As mentioned in the previous section, ECC memory has extra bits to be
160
+used for error correction. In the above example, a memory module has
161
+64 bits of *data width*, and 72 bits of *total width*. The extra 8
162
+bits which are used for the error detection and correction mechanisms
163
+are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
164164
165165 So, when the cpu requests the memory controller to write a word with
166166 *data width*, the memory controller calculates the *syndrome* in real time,
....@@ -199,7 +199,7 @@
199199 mode).
200200
201201 .. [#f3] For more details about the Machine Check Architecture (MCA),
202
- please read Documentation/x86/x86_64/machinecheck at the Kernel tree.
202
+ please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree.
203203
204204 EDAC - Error Detection And Correction
205205 *************************************
....@@ -212,7 +212,7 @@
212212 purposes.
213213
214214 When the subsystem was pushed upstream for the first time, on
215
- Kernel 2.6.16, for the first time, it was renamed to ``EDAC``.
215
+ Kernel 2.6.16, it was renamed to ``EDAC``.
216216
217217 Purpose
218218 -------
....@@ -330,9 +330,12 @@
330330
331331 .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
332332 used to refer to a memory module, although there are other memory
333
- packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
334
- and inside the EDAC system, the term "dimm" is used for all memory
335
- modules, even when they use a different kind of packaging.
333
+ packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
334
+ specification (Version 2.7) defines a memory module in the Common
335
+ Platform Error Record (CPER) section to be an SMBIOS Memory Device
336
+ (Type 17). Along this document, and inside the EDAC subsystem, the term
337
+ "dimm" is used for all memory modules, even when they use a
338
+ different kind of packaging.
336339
337340 Memory controllers allow for several csrows, with 8 csrows being a
338341 typical value. Yet, the actual number of csrows depends on the layout of
....@@ -348,13 +351,17 @@
348351 +------------+-----------+-----------+
349352 | | ``ch0`` | ``ch1`` |
350353 +============+===========+===========+
351
- | ``csrow0`` | DIMM_A0 | DIMM_B0 |
352
- +------------+ | |
353
- | ``csrow1`` | | |
354
+ | |**DIMM_A0**|**DIMM_B0**|
354355 +------------+-----------+-----------+
355
- | ``csrow2`` | DIMM_A1 | DIMM_B1 |
356
- +------------+ | |
357
- | ``csrow3`` | | |
356
+ | ``csrow0`` | rank0 | rank0 |
357
+ +------------+-----------+-----------+
358
+ | ``csrow1`` | rank1 | rank1 |
359
+ +------------+-----------+-----------+
360
+ | |**DIMM_A1**|**DIMM_B1**|
361
+ +------------+-----------+-----------+
362
+ | ``csrow2`` | rank0 | rank0 |
363
+ +------------+-----------+-----------+
364
+ | ``csrow3`` | rank1 | rank1 |
358365 +------------+-----------+-----------+
359366
360367 In the above example, there are 4 physical slots on the motherboard
....@@ -374,11 +381,13 @@
374381 Channel, the csrows cross both DIMMs.
375382
376383 Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
377
-Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
378
-will have just one csrow (csrow0). csrow1 will be empty. On the other
379
-hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
380
-and csrow1 will be populated. The pattern repeats itself for csrow2 and
381
-csrow3.
384
+In the example above 2 dual ranked DIMMs are similarly placed. Thus,
385
+both csrow0 and csrow1 are populated. On the other hand, when 2 single
386
+ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
387
+have just one csrow (csrow0) and csrow1 will be empty. The pattern
388
+repeats itself for csrow2 and csrow3. Also note that some memory
389
+controllers don't have any logic to identify the memory module, see
390
+``rankX`` directories below.
382391
383392 The representation of the above is reflected in the directory
384393 tree in EDAC's sysfs interface. Starting in directory