hc
2024-02-20 102a0743326a03cd1a1202ceda21e175b7d3575c
kernel/tools/perf/Documentation/perf-c2c.txt
....@@ -19,8 +19,11 @@
1919 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
2020 you to track down the cacheline contentions.
2121
22
-The tool is based on x86's load latency and precise store facility events
23
-provided by Intel CPUs. These events provide:
22
+On x86, the tool is based on load latency and precise store facility events
23
+provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
24
+with thresholding feature.
25
+
26
+These events provide:
2427 - memory address of the access
2528 - type of the access (load and store details)
2629 - latency (in cycles) of the load access
....@@ -37,7 +40,7 @@
3740 --------------
3841 -e::
3942 --event=::
40
- Select the PMU event. Use 'perf mem record -e list'
43
+ Select the PMU event. Use 'perf c2c record -e list'
4144 to list available events.
4245
4346 -v::
....@@ -46,7 +49,7 @@
4649
4750 -l::
4851 --ldlat::
49
- Configure mem-loads latency.
52
+ Configure mem-loads latency. (x86 only)
5053
5154 -k::
5255 --all-kernel::
....@@ -108,6 +111,17 @@
108111 --display::
109112 Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
110113
114
+--stitch-lbr::
115
+ Show callgraph with stitched LBRs, which may have more complete
116
+ callgraph. The perf.data file must have been obtained using
117
+ perf c2c record --call-graph lbr.
118
+ Disabled by default. In common cases with call stack overflows,
119
+ it can recreate better call stacks than the default lbr call stack
120
+ output. But this approach is not full proof. There can be cases
121
+ where it creates incorrect call stacks from incorrect matches.
122
+ The known limitations include exception handing such as
123
+ setjmp/longjmp will have calls/returns not match.
124
+
111125 C2C RECORD
112126 ----------
113127 The perf c2c record command setup options related to HITM cacheline analysis
....@@ -119,10 +133,15 @@
119133 -W,-d,--phys-data,--sample-cpu
120134
121135 Unless specified otherwise with '-e' option, following events are monitored by
122
-default:
136
+default on x86:
123137
124138 cpu/mem-loads,ldlat=30/P
125139 cpu/mem-stores/P
140
+
141
+and following on PowerPC:
142
+
143
+ cpu/mem-loads/
144
+ cpu/mem-stores/
126145
127146 User can pass any 'perf record' option behind '--' mark, like (to enable
128147 callchains and system wide monitoring):
....@@ -155,34 +174,36 @@
155174 Cacheline
156175 - cacheline address (hex number)
157176
158
- Total records
159
- - sum of all cachelines accesses
160
-
161177 Rmt/Lcl Hitm
162178 - cacheline percentage of all Remote/Local HITM accesses
163179
164
- LLC Load Hitm - Total, Lcl, Rmt
180
+ LLC Load Hitm - Total, LclHitm, RmtHitm
165181 - count of Total/Local/Remote load HITMs
166182
167
- Store Reference - Total, L1Hit, L1Miss
168
- Total - all store accesses
169
- L1Hit - store accesses that hit L1
170
- L1Hit - store accesses that missed L1
183
+ Total records
184
+ - sum of all cachelines accesses
171185
172
- Load Dram
173
- - count of local and remote DRAM accesses
174
-
175
- LLC Ld Miss
176
- - count of all accesses that missed LLC
177
-
178
- Total Loads
186
+ Total loads
179187 - sum of all load accesses
188
+
189
+ Total stores
190
+ - sum of all store accesses
191
+
192
+ Store Reference - L1Hit, L1Miss
193
+ L1Hit - store accesses that hit L1
194
+ L1Miss - store accesses that missed L1
180195
181196 Core Load Hit - FB, L1, L2
182197 - count of load hits in FB (Fill Buffer), L1 and L2 cache
183198
184
- LLC Load Hit - Llc, Rmt
185
- - count of LLC and Remote load hits
199
+ LLC Load Hit - LlcHit, LclHitm
200
+ - count of LLC load accesses, includes LLC hits and LLC HITMs
201
+
202
+ RMT Load Hit - RmtHit, RmtHitm
203
+ - count of remote load accesses, includes remote hits and remote HITMs
204
+
205
+ Load Dram - Lcl, Rmt
206
+ - count of local and remote DRAM accesses
186207
187208 For each offset in the 2) list we display following data:
188209