| .. | .. | 
|---|
| 19 | 19 | The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows | 
|---|
| 20 | 20 | you to track down the cacheline contentions. | 
|---|
| 21 | 21 |  | 
|---|
| 22 |  | -The tool is based on x86's load latency and precise store facility events | 
|---|
| 23 |  | -provided by Intel CPUs. These events provide: | 
|---|
|  | 22 | +On x86, the tool is based on load latency and precise store facility events | 
|---|
|  | 23 | +provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling | 
|---|
|  | 24 | +with thresholding feature. | 
|---|
|  | 25 | + | 
|---|
|  | 26 | +These events provide: | 
|---|
| 24 | 27 | - memory address of the access | 
|---|
| 25 | 28 | - type of the access (load and store details) | 
|---|
| 26 | 29 | - latency (in cycles) of the load access | 
|---|
| .. | .. | 
|---|
| 37 | 40 | -------------- | 
|---|
| 38 | 41 | -e:: | 
|---|
| 39 | 42 | --event=:: | 
|---|
| 40 |  | -	Select the PMU event. Use 'perf mem record -e list' | 
|---|
|  | 43 | +	Select the PMU event. Use 'perf c2c record -e list' | 
|---|
| 41 | 44 | to list available events. | 
|---|
| 42 | 45 |  | 
|---|
| 43 | 46 | -v:: | 
|---|
| .. | .. | 
|---|
| 46 | 49 |  | 
|---|
| 47 | 50 | -l:: | 
|---|
| 48 | 51 | --ldlat:: | 
|---|
| 49 |  | -	Configure mem-loads latency. | 
|---|
|  | 52 | +	Configure mem-loads latency. (x86 only) | 
|---|
| 50 | 53 |  | 
|---|
| 51 | 54 | -k:: | 
|---|
| 52 | 55 | --all-kernel:: | 
|---|
| .. | .. | 
|---|
| 108 | 111 | --display:: | 
|---|
| 109 | 112 | Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default. | 
|---|
| 110 | 113 |  | 
|---|
|  | 114 | +--stitch-lbr:: | 
|---|
|  | 115 | +	Show callgraph with stitched LBRs, which may have more complete | 
|---|
|  | 116 | +	callgraph. The perf.data file must have been obtained using | 
|---|
|  | 117 | +	perf c2c record --call-graph lbr. | 
|---|
|  | 118 | +	Disabled by default. In common cases with call stack overflows, | 
|---|
|  | 119 | +	it can recreate better call stacks than the default lbr call stack | 
|---|
|  | 120 | +	output. But this approach is not full proof. There can be cases | 
|---|
|  | 121 | +	where it creates incorrect call stacks from incorrect matches. | 
|---|
|  | 122 | +	The known limitations include exception handing such as | 
|---|
|  | 123 | +	setjmp/longjmp will have calls/returns not match. | 
|---|
|  | 124 | + | 
|---|
| 111 | 125 | C2C RECORD | 
|---|
| 112 | 126 | ---------- | 
|---|
| 113 | 127 | The perf c2c record command setup options related to HITM cacheline analysis | 
|---|
| .. | .. | 
|---|
| 119 | 133 | -W,-d,--phys-data,--sample-cpu | 
|---|
| 120 | 134 |  | 
|---|
| 121 | 135 | Unless specified otherwise with '-e' option, following events are monitored by | 
|---|
| 122 |  | -default: | 
|---|
|  | 136 | +default on x86: | 
|---|
| 123 | 137 |  | 
|---|
| 124 | 138 | cpu/mem-loads,ldlat=30/P | 
|---|
| 125 | 139 | cpu/mem-stores/P | 
|---|
|  | 140 | + | 
|---|
|  | 141 | +and following on PowerPC: | 
|---|
|  | 142 | + | 
|---|
|  | 143 | +  cpu/mem-loads/ | 
|---|
|  | 144 | +  cpu/mem-stores/ | 
|---|
| 126 | 145 |  | 
|---|
| 127 | 146 | User can pass any 'perf record' option behind '--' mark, like (to enable | 
|---|
| 128 | 147 | callchains and system wide monitoring): | 
|---|
| .. | .. | 
|---|
| 155 | 174 | Cacheline | 
|---|
| 156 | 175 | - cacheline address (hex number) | 
|---|
| 157 | 176 |  | 
|---|
| 158 |  | -  Total records | 
|---|
| 159 |  | -  - sum of all cachelines accesses | 
|---|
| 160 |  | - | 
|---|
| 161 | 177 | Rmt/Lcl Hitm | 
|---|
| 162 | 178 | - cacheline percentage of all Remote/Local HITM accesses | 
|---|
| 163 | 179 |  | 
|---|
| 164 |  | -  LLC Load Hitm - Total, Lcl, Rmt | 
|---|
|  | 180 | +  LLC Load Hitm - Total, LclHitm, RmtHitm | 
|---|
| 165 | 181 | - count of Total/Local/Remote load HITMs | 
|---|
| 166 | 182 |  | 
|---|
| 167 |  | -  Store Reference - Total, L1Hit, L1Miss | 
|---|
| 168 |  | -    Total - all store accesses | 
|---|
| 169 |  | -    L1Hit - store accesses that hit L1 | 
|---|
| 170 |  | -    L1Hit - store accesses that missed L1 | 
|---|
|  | 183 | +  Total records | 
|---|
|  | 184 | +  - sum of all cachelines accesses | 
|---|
| 171 | 185 |  | 
|---|
| 172 |  | -  Load Dram | 
|---|
| 173 |  | -  - count of local and remote DRAM accesses | 
|---|
| 174 |  | - | 
|---|
| 175 |  | -  LLC Ld Miss | 
|---|
| 176 |  | -  - count of all accesses that missed LLC | 
|---|
| 177 |  | - | 
|---|
| 178 |  | -  Total Loads | 
|---|
|  | 186 | +  Total loads | 
|---|
| 179 | 187 | - sum of all load accesses | 
|---|
|  | 188 | + | 
|---|
|  | 189 | +  Total stores | 
|---|
|  | 190 | +  - sum of all store accesses | 
|---|
|  | 191 | + | 
|---|
|  | 192 | +  Store Reference - L1Hit, L1Miss | 
|---|
|  | 193 | +    L1Hit - store accesses that hit L1 | 
|---|
|  | 194 | +    L1Miss - store accesses that missed L1 | 
|---|
| 180 | 195 |  | 
|---|
| 181 | 196 | Core Load Hit - FB, L1, L2 | 
|---|
| 182 | 197 | - count of load hits in FB (Fill Buffer), L1 and L2 cache | 
|---|
| 183 | 198 |  | 
|---|
| 184 |  | -  LLC Load Hit - Llc, Rmt | 
|---|
| 185 |  | -  - count of LLC and Remote load hits | 
|---|
|  | 199 | +  LLC Load Hit - LlcHit, LclHitm | 
|---|
|  | 200 | +  - count of LLC load accesses, includes LLC hits and LLC HITMs | 
|---|
|  | 201 | + | 
|---|
|  | 202 | +  RMT Load Hit - RmtHit, RmtHitm | 
|---|
|  | 203 | +  - count of remote load accesses, includes remote hits and remote HITMs | 
|---|
|  | 204 | + | 
|---|
|  | 205 | +  Load Dram - Lcl, Rmt | 
|---|
|  | 206 | +  - count of local and remote DRAM accesses | 
|---|
| 186 | 207 |  | 
|---|
| 187 | 208 | For each offset in the 2) list we display following data: | 
|---|
| 188 | 209 |  | 
|---|