.. | .. |
---|
89 | 89 | - socket: processor socket number the task ran at the time of sample |
---|
90 | 90 | - srcline: filename and line number executed at the time of sample. The |
---|
91 | 91 | DWARF debugging info must be provided. |
---|
92 | | - - srcfile: file name of the source file of the same. Requires dwarf |
---|
| 92 | + - srcfile: file name of the source file of the samples. Requires dwarf |
---|
93 | 93 | information. |
---|
94 | 94 | - weight: Event specific weight, e.g. memory latency or transaction |
---|
95 | 95 | abort cost. This is the global weight. |
---|
96 | 96 | - local_weight: Local weight version of the weight above. |
---|
97 | 97 | - cgroup_id: ID derived from cgroup namespace device and inode numbers. |
---|
| 98 | + - cgroup: cgroup pathname in the cgroupfs. |
---|
98 | 99 | - transaction: Transaction abort flags. |
---|
99 | 100 | - overhead: Overhead percentage of sample |
---|
100 | 101 | - overhead_sys: Overhead percentage of sample running in system mode |
---|
.. | .. |
---|
105 | 106 | guest machine |
---|
106 | 107 | - sample: Number of sample |
---|
107 | 108 | - period: Raw number of event count of sample |
---|
| 109 | + - time: Separate the samples by time stamp with the resolution specified by |
---|
| 110 | + --time-quantum (default 100ms). Specify with overhead and before it. |
---|
108 | 111 | |
---|
109 | 112 | By default, comm, dso and symbol keys are used. |
---|
110 | 113 | (i.e. --sort comm,dso,symbol) |
---|
.. | .. |
---|
125 | 128 | |
---|
126 | 129 | And default sort keys are changed to comm, dso_from, symbol_from, dso_to |
---|
127 | 130 | and symbol_to, see '--branch-stack'. |
---|
| 131 | + |
---|
| 132 | + When the sort key symbol is specified, columns "IPC" and "IPC Coverage" |
---|
| 133 | + are enabled automatically. Column "IPC" reports the average IPC per function |
---|
| 134 | + and column "IPC coverage" reports the percentage of instructions with |
---|
| 135 | + sampled IPC in this function. IPC means Instruction Per Cycle. If it's low, |
---|
| 136 | + it indicates there may be a performance bottleneck when the function is |
---|
| 137 | + executed, such as a memory access bottleneck. If a function has high overhead |
---|
| 138 | + and low IPC, it's worth further analyzing it to optimize its performance. |
---|
128 | 139 | |
---|
129 | 140 | If the --mem-mode option is used, the following sort keys are also available |
---|
130 | 141 | (incompatible with --branch-stack): |
---|
.. | .. |
---|
244 | 255 | Usually more convenient to use --branch-history for this. |
---|
245 | 256 | |
---|
246 | 257 | value can be: |
---|
247 | | - - percent: diplay overhead percent (default) |
---|
| 258 | + - percent: display overhead percent (default) |
---|
248 | 259 | - period: display event period |
---|
249 | 260 | - count: display event count |
---|
250 | 261 | |
---|
.. | .. |
---|
357 | 368 | --objdump=<path>:: |
---|
358 | 369 | Path to objdump binary. |
---|
359 | 370 | |
---|
| 371 | +--prefix=PREFIX:: |
---|
| 372 | +--prefix-strip=N:: |
---|
| 373 | + Remove first N entries from source file path names in executables |
---|
| 374 | + and add PREFIX. This allows to display source code compiled on systems |
---|
| 375 | + with different file system layout. |
---|
| 376 | + |
---|
360 | 377 | --group:: |
---|
361 | 378 | Show event group information together. It forces group output also |
---|
362 | 379 | if there are no groups defined in data file. |
---|
| 380 | + |
---|
| 381 | +--group-sort-idx:: |
---|
| 382 | + Sort the output by the event at the index n in group. If n is invalid, |
---|
| 383 | + sort by the first event. It can support multiple groups with different |
---|
| 384 | + amount of events. WARNING: This should be used on grouped events. |
---|
363 | 385 | |
---|
364 | 386 | --demangle:: |
---|
365 | 387 | Demangle symbol names to human readable form. It's enabled by default, |
---|
.. | .. |
---|
402 | 424 | |
---|
403 | 425 | --time:: |
---|
404 | 426 | Only analyze samples within given time window: <start>,<stop>. Times |
---|
405 | | - have the format seconds.microseconds. If start is not given (i.e., time |
---|
| 427 | + have the format seconds.nanoseconds. If start is not given (i.e. time |
---|
406 | 428 | string is ',x.y') then analysis starts at the beginning of the file. If |
---|
407 | | - stop time is not given (i.e, time string is 'x.y,') then analysis goes |
---|
408 | | - to end of file. |
---|
| 429 | + stop time is not given (i.e. time string is 'x.y,') then analysis goes |
---|
| 430 | + to end of file. Multiple ranges can be separated by spaces, which |
---|
| 431 | + requires the argument to be quoted e.g. --time "1234.567,1234.789 1235," |
---|
409 | 432 | |
---|
410 | | - Also support time percent with multiple time range. Time string is |
---|
| 433 | + Also support time percent with multiple time ranges. Time string is |
---|
411 | 434 | 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. |
---|
412 | 435 | |
---|
413 | 436 | For example: |
---|
.. | .. |
---|
426 | 449 | Select from 0% to 10% and 30% to 40% slices: |
---|
427 | 450 | |
---|
428 | 451 | perf report --time 0%-10%,30%-40% |
---|
| 452 | + |
---|
| 453 | +--switch-on EVENT_NAME:: |
---|
| 454 | + Only consider events after this event is found. |
---|
| 455 | + |
---|
| 456 | + This may be interesting to measure a workload only after some initialization |
---|
| 457 | + phase is over, i.e. insert a perf probe at that point and then using this |
---|
| 458 | + option with that probe. |
---|
| 459 | + |
---|
| 460 | +--switch-off EVENT_NAME:: |
---|
| 461 | + Stop considering events after this event is found. |
---|
| 462 | + |
---|
| 463 | +--show-on-off-events:: |
---|
| 464 | + Show the --switch-on/off events too. This has no effect in 'perf report' now |
---|
| 465 | + but probably we'll make the default not to show the switch-on/off events |
---|
| 466 | + on the --group mode and if there is only one event besides the off/on ones, |
---|
| 467 | + go straight to the histogram browser, just like 'perf report' with no events |
---|
| 468 | + explicitely specified does. |
---|
429 | 469 | |
---|
430 | 470 | --itrace:: |
---|
431 | 471 | Options for decoding instruction tracing data. The options are: |
---|
.. | .. |
---|
448 | 488 | This option extends the perf report to show reference callgraphs, |
---|
449 | 489 | which collected by reference event, in no callgraph event. |
---|
450 | 490 | |
---|
| 491 | +--stitch-lbr:: |
---|
| 492 | + Show callgraph with stitched LBRs, which may have more complete |
---|
| 493 | + callgraph. The perf.data file must have been obtained using |
---|
| 494 | + perf record --call-graph lbr. |
---|
| 495 | + Disabled by default. In common cases with call stack overflows, |
---|
| 496 | + it can recreate better call stacks than the default lbr call stack |
---|
| 497 | + output. But this approach is not full proof. There can be cases |
---|
| 498 | + where it creates incorrect call stacks from incorrect matches. |
---|
| 499 | + The known limitations include exception handing such as |
---|
| 500 | + setjmp/longjmp will have calls/returns not match. |
---|
| 501 | + |
---|
451 | 502 | --socket-filter:: |
---|
452 | 503 | Only report the samples on the processor socket that match with this filter |
---|
| 504 | + |
---|
| 505 | +--samples=N:: |
---|
| 506 | + Save N individual samples for each histogram entry to show context in perf |
---|
| 507 | + report tui browser. |
---|
453 | 508 | |
---|
454 | 509 | --raw-trace:: |
---|
455 | 510 | When displaying traceevent output, do not use print fmt or plugins. |
---|
.. | .. |
---|
469 | 524 | Please note that not all mmaps are stored, options affecting which ones |
---|
470 | 525 | are include 'perf record --data', for instance. |
---|
471 | 526 | |
---|
| 527 | +--ns:: |
---|
| 528 | + Show time stamps in nanoseconds. |
---|
| 529 | + |
---|
472 | 530 | --stats:: |
---|
473 | 531 | Display overall events statistics without any further processing. |
---|
474 | 532 | (like the one at the end of the perf report -D command) |
---|
.. | .. |
---|
486 | 544 | The period/hits keywords set the base the percentage is computed |
---|
487 | 545 | on - the samples period or the number of samples (hits). |
---|
488 | 546 | |
---|
| 547 | +--time-quantum:: |
---|
| 548 | + Configure time quantum for time sort key. Default 100ms. |
---|
| 549 | + Accepts s, us, ms, ns units. |
---|
| 550 | + |
---|
| 551 | +--total-cycles:: |
---|
| 552 | + When --total-cycles is specified, it supports sorting for all blocks by |
---|
| 553 | + 'Sampled Cycles%'. This is useful to concentrate on the globally hottest |
---|
| 554 | + blocks. In output, there are some new columns: |
---|
| 555 | + |
---|
| 556 | + 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles |
---|
| 557 | + 'Sampled Cycles' - block sampled cycles aggregation |
---|
| 558 | + 'Avg Cycles%' - block average sampled cycles / sum of total block average |
---|
| 559 | + sampled cycles |
---|
| 560 | + 'Avg Cycles' - block average sampled cycles |
---|
| 561 | + |
---|
489 | 562 | include::callchain-overhead-calculation.txt[] |
---|
490 | 563 | |
---|
491 | 564 | SEE ALSO |
---|
492 | 565 | -------- |
---|
493 | | -linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1] |
---|
| 566 | +linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1], |
---|
| 567 | +linkperf:perf-intel-pt[1] |
---|