.. | .. |
---|
47 | 47 | param1 and param2 are defined as formats for the PMU in |
---|
48 | 48 | /sys/bus/event_source/devices/<pmu>/format/* |
---|
49 | 49 | |
---|
| 50 | + 'percore' is a event qualifier that sums up the event counts for both |
---|
| 51 | + hardware threads in a core. For example: |
---|
| 52 | + perf stat -A -a -e cpu/event,percore=1/,otherevent ... |
---|
| 53 | + |
---|
50 | 54 | - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' |
---|
51 | 55 | where M, N, K are numbers (in decimal, hex, octal format). |
---|
52 | 56 | Acceptable values for each of 'config', 'config1' and 'config2' |
---|
.. | .. |
---|
54 | 58 | /sys/bus/event_source/devices/<pmu>/format/* |
---|
55 | 59 | |
---|
56 | 60 | Note that the last two syntaxes support prefix and glob matching in |
---|
57 | | - the PMU name to simplify creation of events accross multiple instances |
---|
| 61 | + the PMU name to simplify creation of events across multiple instances |
---|
58 | 62 | of the same type of PMU in large systems (e.g. memory controller PMUs). |
---|
59 | 63 | Multiple PMU instances are typical for uncore PMUs, so the prefix |
---|
60 | 64 | 'uncore_' is also ignored when performing this match. |
---|
.. | .. |
---|
71 | 75 | --tid=<tid>:: |
---|
72 | 76 | stat events on existing thread id (comma separated list) |
---|
73 | 77 | |
---|
| 78 | +ifdef::HAVE_LIBPFM[] |
---|
| 79 | +--pfm-events events:: |
---|
| 80 | +Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) |
---|
| 81 | +including support for event filters. For example '--pfm-events |
---|
| 82 | +inst_retired:any_p:u:c=1:i'. More than one event can be passed to the |
---|
| 83 | +option using the comma separator. Hardware events and generic hardware |
---|
| 84 | +events cannot be mixed together. The latter must be used with the -e |
---|
| 85 | +option. The -e option and this one can be mixed and matched. Events |
---|
| 86 | +can be grouped using the {} notation. |
---|
| 87 | +endif::HAVE_LIBPFM[] |
---|
74 | 88 | |
---|
75 | 89 | -a:: |
---|
76 | 90 | --all-cpus:: |
---|
77 | 91 | system-wide collection from all CPUs (default if no target is specified) |
---|
78 | 92 | |
---|
79 | | --c:: |
---|
80 | | ---scale:: |
---|
81 | | - scale/normalize counter values |
---|
| 93 | +--no-scale:: |
---|
| 94 | + Don't scale/normalize counter values |
---|
82 | 95 | |
---|
83 | 96 | -d:: |
---|
84 | 97 | --detailed:: |
---|
.. | .. |
---|
94 | 107 | |
---|
95 | 108 | -B:: |
---|
96 | 109 | --big-num:: |
---|
97 | | - print large numbers with thousands' separators according to locale |
---|
| 110 | + print large numbers with thousands' separators according to locale. |
---|
| 111 | + Enabled by default. Use "--no-big-num" to disable. |
---|
| 112 | + Default setting can be changed with "perf config stat.big-num=false". |
---|
98 | 113 | |
---|
99 | 114 | -C:: |
---|
100 | 115 | --cpu=:: |
---|
.. | .. |
---|
151 | 166 | If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this |
---|
152 | 167 | command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. |
---|
153 | 168 | |
---|
| 169 | +--for-each-cgroup name:: |
---|
| 170 | +Expand event list for each cgroup in "name" (allow multiple cgroups separated |
---|
| 171 | +by comma). This has same effect that repeating -e option and -G option for |
---|
| 172 | +each event x name. This option cannot be used with -G/--cgroup option. |
---|
| 173 | + |
---|
154 | 174 | -o file:: |
---|
155 | 175 | --output file:: |
---|
156 | 176 | Print the output into the designated file. |
---|
.. | .. |
---|
165 | 185 | 3>results perf stat --log-fd 3 -- $cmd |
---|
166 | 186 | 3>>results perf stat --log-fd 3 --append -- $cmd |
---|
167 | 187 | |
---|
| 188 | +--control=fifo:ctl-fifo[,ack-fifo]:: |
---|
| 189 | +--control=fd:ctl-fd[,ack-fd]:: |
---|
| 190 | +ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows. |
---|
| 191 | +Listen on ctl-fd descriptor for command to control measurement ('enable': enable events, |
---|
| 192 | +'disable': disable events). Measurements can be started with events disabled using |
---|
| 193 | +--delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor |
---|
| 194 | +to synchronize with the controlling process. Example of bash shell script to enable and |
---|
| 195 | +disable events during measurements: |
---|
| 196 | + |
---|
| 197 | + #!/bin/bash |
---|
| 198 | + |
---|
| 199 | + ctl_dir=/tmp/ |
---|
| 200 | + |
---|
| 201 | + ctl_fifo=${ctl_dir}perf_ctl.fifo |
---|
| 202 | + test -p ${ctl_fifo} && unlink ${ctl_fifo} |
---|
| 203 | + mkfifo ${ctl_fifo} |
---|
| 204 | + exec {ctl_fd}<>${ctl_fifo} |
---|
| 205 | + |
---|
| 206 | + ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo |
---|
| 207 | + test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo} |
---|
| 208 | + mkfifo ${ctl_ack_fifo} |
---|
| 209 | + exec {ctl_fd_ack}<>${ctl_ack_fifo} |
---|
| 210 | + |
---|
| 211 | + perf stat -D -1 -e cpu-cycles -a -I 1000 \ |
---|
| 212 | + --control fd:${ctl_fd},${ctl_fd_ack} \ |
---|
| 213 | + -- sleep 30 & |
---|
| 214 | + perf_pid=$! |
---|
| 215 | + |
---|
| 216 | + sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})" |
---|
| 217 | + sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})" |
---|
| 218 | + |
---|
| 219 | + exec {ctl_fd_ack}>&- |
---|
| 220 | + unlink ${ctl_ack_fifo} |
---|
| 221 | + |
---|
| 222 | + exec {ctl_fd}>&- |
---|
| 223 | + unlink ${ctl_fifo} |
---|
| 224 | + |
---|
| 225 | + wait -n ${perf_pid} |
---|
| 226 | + exit $? |
---|
| 227 | + |
---|
| 228 | + |
---|
168 | 229 | --pre:: |
---|
169 | 230 | --post:: |
---|
170 | 231 | Pre and post measurement hooks, e.g.: |
---|
.. | .. |
---|
176 | 237 | Print count deltas every N milliseconds (minimum: 1ms) |
---|
177 | 238 | The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. |
---|
178 | 239 | example: 'perf stat -I 1000 -e cycles -a sleep 5' |
---|
| 240 | + |
---|
| 241 | +If the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #. |
---|
179 | 242 | |
---|
180 | 243 | --interval-count times:: |
---|
181 | 244 | Print count deltas for fixed number of times. |
---|
.. | .. |
---|
201 | 264 | socket number and the number of online processors on that socket. This is |
---|
202 | 265 | useful to gauge the amount of aggregation. |
---|
203 | 266 | |
---|
| 267 | +--per-die:: |
---|
| 268 | +Aggregate counts per processor die for system-wide mode measurements. This |
---|
| 269 | +is a useful mode to detect imbalance between dies. To enable this mode, |
---|
| 270 | +use --per-die in addition to -a. (system-wide). The output includes the |
---|
| 271 | +die number and the number of online processors on that die. This is |
---|
| 272 | +useful to gauge the amount of aggregation. |
---|
| 273 | + |
---|
204 | 274 | --per-core:: |
---|
205 | 275 | Aggregate counts per physical processor for system-wide mode measurements. This |
---|
206 | 276 | is a useful mode to detect imbalance between physical cores. To enable this mode, |
---|
.. | .. |
---|
211 | 281 | Aggregate counts per monitored threads, when monitoring threads (-t option) |
---|
212 | 282 | or processes (-p option). |
---|
213 | 283 | |
---|
| 284 | +--per-node:: |
---|
| 285 | +Aggregate counts per NUMA nodes for system-wide mode measurements. This |
---|
| 286 | +is a useful mode to detect imbalance between NUMA nodes. To enable this |
---|
| 287 | +mode, use --per-node in addition to -a. (system-wide). |
---|
| 288 | + |
---|
214 | 289 | -D msecs:: |
---|
215 | 290 | --delay msecs:: |
---|
216 | | -After starting the program, wait msecs before measuring. This is useful to |
---|
217 | | -filter out the startup phase of the program, which is often very different. |
---|
| 291 | +After starting the program, wait msecs before measuring (-1: start with events |
---|
| 292 | +disabled). This is useful to filter out the startup phase of the program, |
---|
| 293 | +which is often very different. |
---|
218 | 294 | |
---|
219 | 295 | -T:: |
---|
220 | 296 | --transaction:: |
---|
221 | 297 | |
---|
222 | 298 | Print statistics of transactional execution if supported. |
---|
| 299 | + |
---|
| 300 | +--metric-no-group:: |
---|
| 301 | +By default, events to compute a metric are placed in weak groups. The |
---|
| 302 | +group tries to enforce scheduling all or none of the events. The |
---|
| 303 | +--metric-no-group option places events outside of groups and may |
---|
| 304 | +increase the chance of the event being scheduled - leading to more |
---|
| 305 | +accuracy. However, as events may not be scheduled together accuracy |
---|
| 306 | +for metrics like instructions per cycle can be lower - as both metrics |
---|
| 307 | +may no longer be being measured at the same time. |
---|
| 308 | + |
---|
| 309 | +--metric-no-merge:: |
---|
| 310 | +By default metric events in different weak groups can be shared if one |
---|
| 311 | +group contains all the events needed by another. In such cases one |
---|
| 312 | +group will be eliminated reducing event multiplexing and making it so |
---|
| 313 | +that certain groups of metrics sum to 100%. A downside to sharing a |
---|
| 314 | +group is that the group may require multiplexing and so accuracy for a |
---|
| 315 | +small group that need not have multiplexing is lowered. This option |
---|
| 316 | +forbids the event merging logic from sharing events between groups and |
---|
| 317 | +may be used to increase accuracy in this case. |
---|
223 | 318 | |
---|
224 | 319 | STAT RECORD |
---|
225 | 320 | ----------- |
---|
.. | .. |
---|
239 | 334 | |
---|
240 | 335 | --per-socket:: |
---|
241 | 336 | Aggregate counts per processor socket for system-wide mode measurements. |
---|
| 337 | + |
---|
| 338 | +--per-die:: |
---|
| 339 | +Aggregate counts per processor die for system-wide mode measurements. |
---|
242 | 340 | |
---|
243 | 341 | --per-core:: |
---|
244 | 342 | Aggregate counts per physical processor for system-wide mode measurements. |
---|
.. | .. |
---|
270 | 368 | For best results it is usually a good idea to use it with interval |
---|
271 | 369 | mode like -I 1000, as the bottleneck of workloads can change often. |
---|
272 | 370 | |
---|
| 371 | +This enables --metric-only, unless overridden with --no-metric-only. |
---|
| 372 | + |
---|
| 373 | +The following restrictions only apply to older Intel CPUs and Atom, |
---|
| 374 | +on newer CPUs (IceLake and later) TopDown can be collected for any thread: |
---|
| 375 | + |
---|
273 | 376 | The top down metrics are collected per core instead of per |
---|
274 | 377 | CPU thread. Per core mode is automatically enabled |
---|
275 | 378 | and -a (global monitoring) is needed, requiring root rights or |
---|
.. | .. |
---|
280 | 383 | echo 0 > /proc/sys/kernel/nmi_watchdog |
---|
281 | 384 | for best results. Otherwise the bottlenecks may be inconsistent |
---|
282 | 385 | on workload with changing phases. |
---|
283 | | - |
---|
284 | | -This enables --metric-only, unless overriden with --no-metric-only. |
---|
285 | 386 | |
---|
286 | 387 | To interpret the results it is usually needed to know on which |
---|
287 | 388 | CPUs the workload runs on. If needed the CPUs can be forced using |
---|
.. | .. |
---|
314 | 415 | |
---|
315 | 416 | Users who wants to get the actual value can apply --no-metric-only. |
---|
316 | 417 | |
---|
| 418 | +--all-kernel:: |
---|
| 419 | +Configure all used events to run in kernel space. |
---|
| 420 | + |
---|
| 421 | +--all-user:: |
---|
| 422 | +Configure all used events to run in user space. |
---|
| 423 | + |
---|
| 424 | +--percore-show-thread:: |
---|
| 425 | +The event modifier "percore" has supported to sum up the event counts |
---|
| 426 | +for all hardware threads in a core and show the counts per core. |
---|
| 427 | + |
---|
| 428 | +This option with event modifier "percore" enabled also sums up the event |
---|
| 429 | +counts for all hardware threads in a core but show the sum counts per |
---|
| 430 | +hardware thread. This is essentially a replacement for the any bit and |
---|
| 431 | +convenient for post processing. |
---|
| 432 | + |
---|
| 433 | +--summary:: |
---|
| 434 | +Print summary for interval mode (-I). |
---|
| 435 | + |
---|
317 | 436 | EXAMPLES |
---|
318 | 437 | -------- |
---|
319 | 438 | |
---|