| .. | .. |
|---|
| 19 | 19 | The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows |
|---|
| 20 | 20 | you to track down the cacheline contentions. |
|---|
| 21 | 21 | |
|---|
| 22 | | -The tool is based on x86's load latency and precise store facility events |
|---|
| 23 | | -provided by Intel CPUs. These events provide: |
|---|
| 22 | +On x86, the tool is based on load latency and precise store facility events |
|---|
| 23 | +provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling |
|---|
| 24 | +with thresholding feature. |
|---|
| 25 | + |
|---|
| 26 | +These events provide: |
|---|
| 24 | 27 | - memory address of the access |
|---|
| 25 | 28 | - type of the access (load and store details) |
|---|
| 26 | 29 | - latency (in cycles) of the load access |
|---|
| .. | .. |
|---|
| 37 | 40 | -------------- |
|---|
| 38 | 41 | -e:: |
|---|
| 39 | 42 | --event=:: |
|---|
| 40 | | - Select the PMU event. Use 'perf mem record -e list' |
|---|
| 43 | + Select the PMU event. Use 'perf c2c record -e list' |
|---|
| 41 | 44 | to list available events. |
|---|
| 42 | 45 | |
|---|
| 43 | 46 | -v:: |
|---|
| .. | .. |
|---|
| 46 | 49 | |
|---|
| 47 | 50 | -l:: |
|---|
| 48 | 51 | --ldlat:: |
|---|
| 49 | | - Configure mem-loads latency. |
|---|
| 52 | + Configure mem-loads latency. (x86 only) |
|---|
| 50 | 53 | |
|---|
| 51 | 54 | -k:: |
|---|
| 52 | 55 | --all-kernel:: |
|---|
| .. | .. |
|---|
| 108 | 111 | --display:: |
|---|
| 109 | 112 | Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default. |
|---|
| 110 | 113 | |
|---|
| 114 | +--stitch-lbr:: |
|---|
| 115 | + Show callgraph with stitched LBRs, which may have more complete |
|---|
| 116 | + callgraph. The perf.data file must have been obtained using |
|---|
| 117 | + perf c2c record --call-graph lbr. |
|---|
| 118 | + Disabled by default. In common cases with call stack overflows, |
|---|
| 119 | + it can recreate better call stacks than the default lbr call stack |
|---|
| 120 | + output. But this approach is not full proof. There can be cases |
|---|
| 121 | + where it creates incorrect call stacks from incorrect matches. |
|---|
| 122 | + The known limitations include exception handing such as |
|---|
| 123 | + setjmp/longjmp will have calls/returns not match. |
|---|
| 124 | + |
|---|
| 111 | 125 | C2C RECORD |
|---|
| 112 | 126 | ---------- |
|---|
| 113 | 127 | The perf c2c record command setup options related to HITM cacheline analysis |
|---|
| .. | .. |
|---|
| 119 | 133 | -W,-d,--phys-data,--sample-cpu |
|---|
| 120 | 134 | |
|---|
| 121 | 135 | Unless specified otherwise with '-e' option, following events are monitored by |
|---|
| 122 | | -default: |
|---|
| 136 | +default on x86: |
|---|
| 123 | 137 | |
|---|
| 124 | 138 | cpu/mem-loads,ldlat=30/P |
|---|
| 125 | 139 | cpu/mem-stores/P |
|---|
| 140 | + |
|---|
| 141 | +and following on PowerPC: |
|---|
| 142 | + |
|---|
| 143 | + cpu/mem-loads/ |
|---|
| 144 | + cpu/mem-stores/ |
|---|
| 126 | 145 | |
|---|
| 127 | 146 | User can pass any 'perf record' option behind '--' mark, like (to enable |
|---|
| 128 | 147 | callchains and system wide monitoring): |
|---|
| .. | .. |
|---|
| 155 | 174 | Cacheline |
|---|
| 156 | 175 | - cacheline address (hex number) |
|---|
| 157 | 176 | |
|---|
| 158 | | - Total records |
|---|
| 159 | | - - sum of all cachelines accesses |
|---|
| 160 | | - |
|---|
| 161 | 177 | Rmt/Lcl Hitm |
|---|
| 162 | 178 | - cacheline percentage of all Remote/Local HITM accesses |
|---|
| 163 | 179 | |
|---|
| 164 | | - LLC Load Hitm - Total, Lcl, Rmt |
|---|
| 180 | + LLC Load Hitm - Total, LclHitm, RmtHitm |
|---|
| 165 | 181 | - count of Total/Local/Remote load HITMs |
|---|
| 166 | 182 | |
|---|
| 167 | | - Store Reference - Total, L1Hit, L1Miss |
|---|
| 168 | | - Total - all store accesses |
|---|
| 169 | | - L1Hit - store accesses that hit L1 |
|---|
| 170 | | - L1Hit - store accesses that missed L1 |
|---|
| 183 | + Total records |
|---|
| 184 | + - sum of all cachelines accesses |
|---|
| 171 | 185 | |
|---|
| 172 | | - Load Dram |
|---|
| 173 | | - - count of local and remote DRAM accesses |
|---|
| 174 | | - |
|---|
| 175 | | - LLC Ld Miss |
|---|
| 176 | | - - count of all accesses that missed LLC |
|---|
| 177 | | - |
|---|
| 178 | | - Total Loads |
|---|
| 186 | + Total loads |
|---|
| 179 | 187 | - sum of all load accesses |
|---|
| 188 | + |
|---|
| 189 | + Total stores |
|---|
| 190 | + - sum of all store accesses |
|---|
| 191 | + |
|---|
| 192 | + Store Reference - L1Hit, L1Miss |
|---|
| 193 | + L1Hit - store accesses that hit L1 |
|---|
| 194 | + L1Miss - store accesses that missed L1 |
|---|
| 180 | 195 | |
|---|
| 181 | 196 | Core Load Hit - FB, L1, L2 |
|---|
| 182 | 197 | - count of load hits in FB (Fill Buffer), L1 and L2 cache |
|---|
| 183 | 198 | |
|---|
| 184 | | - LLC Load Hit - Llc, Rmt |
|---|
| 185 | | - - count of LLC and Remote load hits |
|---|
| 199 | + LLC Load Hit - LlcHit, LclHitm |
|---|
| 200 | + - count of LLC load accesses, includes LLC hits and LLC HITMs |
|---|
| 201 | + |
|---|
| 202 | + RMT Load Hit - RmtHit, RmtHitm |
|---|
| 203 | + - count of remote load accesses, includes remote hits and remote HITMs |
|---|
| 204 | + |
|---|
| 205 | + Load Dram - Lcl, Rmt |
|---|
| 206 | + - count of local and remote DRAM accesses |
|---|
| 186 | 207 | |
|---|
| 187 | 208 | For each offset in the 2) list we display following data: |
|---|
| 188 | 209 | |
|---|