.. | .. |
---|
19 | 19 | The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows |
---|
20 | 20 | you to track down the cacheline contentions. |
---|
21 | 21 | |
---|
22 | | -The tool is based on x86's load latency and precise store facility events |
---|
23 | | -provided by Intel CPUs. These events provide: |
---|
| 22 | +On x86, the tool is based on load latency and precise store facility events |
---|
| 23 | +provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling |
---|
| 24 | +with thresholding feature. |
---|
| 25 | + |
---|
| 26 | +These events provide: |
---|
24 | 27 | - memory address of the access |
---|
25 | 28 | - type of the access (load and store details) |
---|
26 | 29 | - latency (in cycles) of the load access |
---|
.. | .. |
---|
37 | 40 | -------------- |
---|
38 | 41 | -e:: |
---|
39 | 42 | --event=:: |
---|
40 | | - Select the PMU event. Use 'perf mem record -e list' |
---|
| 43 | + Select the PMU event. Use 'perf c2c record -e list' |
---|
41 | 44 | to list available events. |
---|
42 | 45 | |
---|
43 | 46 | -v:: |
---|
.. | .. |
---|
46 | 49 | |
---|
47 | 50 | -l:: |
---|
48 | 51 | --ldlat:: |
---|
49 | | - Configure mem-loads latency. |
---|
| 52 | + Configure mem-loads latency. (x86 only) |
---|
50 | 53 | |
---|
51 | 54 | -k:: |
---|
52 | 55 | --all-kernel:: |
---|
.. | .. |
---|
108 | 111 | --display:: |
---|
109 | 112 | Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default. |
---|
110 | 113 | |
---|
| 114 | +--stitch-lbr:: |
---|
| 115 | + Show callgraph with stitched LBRs, which may have more complete |
---|
| 116 | + callgraph. The perf.data file must have been obtained using |
---|
| 117 | + perf c2c record --call-graph lbr. |
---|
| 118 | + Disabled by default. In common cases with call stack overflows, |
---|
| 119 | + it can recreate better call stacks than the default lbr call stack |
---|
| 120 | + output. But this approach is not full proof. There can be cases |
---|
| 121 | + where it creates incorrect call stacks from incorrect matches. |
---|
| 122 | + The known limitations include exception handing such as |
---|
| 123 | + setjmp/longjmp will have calls/returns not match. |
---|
| 124 | + |
---|
111 | 125 | C2C RECORD |
---|
112 | 126 | ---------- |
---|
113 | 127 | The perf c2c record command setup options related to HITM cacheline analysis |
---|
.. | .. |
---|
119 | 133 | -W,-d,--phys-data,--sample-cpu |
---|
120 | 134 | |
---|
121 | 135 | Unless specified otherwise with '-e' option, following events are monitored by |
---|
122 | | -default: |
---|
| 136 | +default on x86: |
---|
123 | 137 | |
---|
124 | 138 | cpu/mem-loads,ldlat=30/P |
---|
125 | 139 | cpu/mem-stores/P |
---|
| 140 | + |
---|
| 141 | +and following on PowerPC: |
---|
| 142 | + |
---|
| 143 | + cpu/mem-loads/ |
---|
| 144 | + cpu/mem-stores/ |
---|
126 | 145 | |
---|
127 | 146 | User can pass any 'perf record' option behind '--' mark, like (to enable |
---|
128 | 147 | callchains and system wide monitoring): |
---|
.. | .. |
---|
155 | 174 | Cacheline |
---|
156 | 175 | - cacheline address (hex number) |
---|
157 | 176 | |
---|
158 | | - Total records |
---|
159 | | - - sum of all cachelines accesses |
---|
160 | | - |
---|
161 | 177 | Rmt/Lcl Hitm |
---|
162 | 178 | - cacheline percentage of all Remote/Local HITM accesses |
---|
163 | 179 | |
---|
164 | | - LLC Load Hitm - Total, Lcl, Rmt |
---|
| 180 | + LLC Load Hitm - Total, LclHitm, RmtHitm |
---|
165 | 181 | - count of Total/Local/Remote load HITMs |
---|
166 | 182 | |
---|
167 | | - Store Reference - Total, L1Hit, L1Miss |
---|
168 | | - Total - all store accesses |
---|
169 | | - L1Hit - store accesses that hit L1 |
---|
170 | | - L1Hit - store accesses that missed L1 |
---|
| 183 | + Total records |
---|
| 184 | + - sum of all cachelines accesses |
---|
171 | 185 | |
---|
172 | | - Load Dram |
---|
173 | | - - count of local and remote DRAM accesses |
---|
174 | | - |
---|
175 | | - LLC Ld Miss |
---|
176 | | - - count of all accesses that missed LLC |
---|
177 | | - |
---|
178 | | - Total Loads |
---|
| 186 | + Total loads |
---|
179 | 187 | - sum of all load accesses |
---|
| 188 | + |
---|
| 189 | + Total stores |
---|
| 190 | + - sum of all store accesses |
---|
| 191 | + |
---|
| 192 | + Store Reference - L1Hit, L1Miss |
---|
| 193 | + L1Hit - store accesses that hit L1 |
---|
| 194 | + L1Miss - store accesses that missed L1 |
---|
180 | 195 | |
---|
181 | 196 | Core Load Hit - FB, L1, L2 |
---|
182 | 197 | - count of load hits in FB (Fill Buffer), L1 and L2 cache |
---|
183 | 198 | |
---|
184 | | - LLC Load Hit - Llc, Rmt |
---|
185 | | - - count of LLC and Remote load hits |
---|
| 199 | + LLC Load Hit - LlcHit, LclHitm |
---|
| 200 | + - count of LLC load accesses, includes LLC hits and LLC HITMs |
---|
| 201 | + |
---|
| 202 | + RMT Load Hit - RmtHit, RmtHitm |
---|
| 203 | + - count of remote load accesses, includes remote hits and remote HITMs |
---|
| 204 | + |
---|
| 205 | + Load Dram - Lcl, Rmt |
---|
| 206 | + - count of local and remote DRAM accesses |
---|
186 | 207 | |
---|
187 | 208 | For each offset in the 2) list we display following data: |
---|
188 | 209 | |
---|