| .. | .. |
|---|
| 89 | 89 | - socket: processor socket number the task ran at the time of sample |
|---|
| 90 | 90 | - srcline: filename and line number executed at the time of sample. The |
|---|
| 91 | 91 | DWARF debugging info must be provided. |
|---|
| 92 | | - - srcfile: file name of the source file of the same. Requires dwarf |
|---|
| 92 | + - srcfile: file name of the source file of the samples. Requires dwarf |
|---|
| 93 | 93 | information. |
|---|
| 94 | 94 | - weight: Event specific weight, e.g. memory latency or transaction |
|---|
| 95 | 95 | abort cost. This is the global weight. |
|---|
| 96 | 96 | - local_weight: Local weight version of the weight above. |
|---|
| 97 | 97 | - cgroup_id: ID derived from cgroup namespace device and inode numbers. |
|---|
| 98 | + - cgroup: cgroup pathname in the cgroupfs. |
|---|
| 98 | 99 | - transaction: Transaction abort flags. |
|---|
| 99 | 100 | - overhead: Overhead percentage of sample |
|---|
| 100 | 101 | - overhead_sys: Overhead percentage of sample running in system mode |
|---|
| .. | .. |
|---|
| 105 | 106 | guest machine |
|---|
| 106 | 107 | - sample: Number of sample |
|---|
| 107 | 108 | - period: Raw number of event count of sample |
|---|
| 109 | + - time: Separate the samples by time stamp with the resolution specified by |
|---|
| 110 | + --time-quantum (default 100ms). Specify with overhead and before it. |
|---|
| 108 | 111 | |
|---|
| 109 | 112 | By default, comm, dso and symbol keys are used. |
|---|
| 110 | 113 | (i.e. --sort comm,dso,symbol) |
|---|
| .. | .. |
|---|
| 125 | 128 | |
|---|
| 126 | 129 | And default sort keys are changed to comm, dso_from, symbol_from, dso_to |
|---|
| 127 | 130 | and symbol_to, see '--branch-stack'. |
|---|
| 131 | + |
|---|
| 132 | + When the sort key symbol is specified, columns "IPC" and "IPC Coverage" |
|---|
| 133 | + are enabled automatically. Column "IPC" reports the average IPC per function |
|---|
| 134 | + and column "IPC coverage" reports the percentage of instructions with |
|---|
| 135 | + sampled IPC in this function. IPC means Instruction Per Cycle. If it's low, |
|---|
| 136 | + it indicates there may be a performance bottleneck when the function is |
|---|
| 137 | + executed, such as a memory access bottleneck. If a function has high overhead |
|---|
| 138 | + and low IPC, it's worth further analyzing it to optimize its performance. |
|---|
| 128 | 139 | |
|---|
| 129 | 140 | If the --mem-mode option is used, the following sort keys are also available |
|---|
| 130 | 141 | (incompatible with --branch-stack): |
|---|
| .. | .. |
|---|
| 244 | 255 | Usually more convenient to use --branch-history for this. |
|---|
| 245 | 256 | |
|---|
| 246 | 257 | value can be: |
|---|
| 247 | | - - percent: diplay overhead percent (default) |
|---|
| 258 | + - percent: display overhead percent (default) |
|---|
| 248 | 259 | - period: display event period |
|---|
| 249 | 260 | - count: display event count |
|---|
| 250 | 261 | |
|---|
| .. | .. |
|---|
| 357 | 368 | --objdump=<path>:: |
|---|
| 358 | 369 | Path to objdump binary. |
|---|
| 359 | 370 | |
|---|
| 371 | +--prefix=PREFIX:: |
|---|
| 372 | +--prefix-strip=N:: |
|---|
| 373 | + Remove first N entries from source file path names in executables |
|---|
| 374 | + and add PREFIX. This allows to display source code compiled on systems |
|---|
| 375 | + with different file system layout. |
|---|
| 376 | + |
|---|
| 360 | 377 | --group:: |
|---|
| 361 | 378 | Show event group information together. It forces group output also |
|---|
| 362 | 379 | if there are no groups defined in data file. |
|---|
| 380 | + |
|---|
| 381 | +--group-sort-idx:: |
|---|
| 382 | + Sort the output by the event at the index n in group. If n is invalid, |
|---|
| 383 | + sort by the first event. It can support multiple groups with different |
|---|
| 384 | + amount of events. WARNING: This should be used on grouped events. |
|---|
| 363 | 385 | |
|---|
| 364 | 386 | --demangle:: |
|---|
| 365 | 387 | Demangle symbol names to human readable form. It's enabled by default, |
|---|
| .. | .. |
|---|
| 402 | 424 | |
|---|
| 403 | 425 | --time:: |
|---|
| 404 | 426 | Only analyze samples within given time window: <start>,<stop>. Times |
|---|
| 405 | | - have the format seconds.microseconds. If start is not given (i.e., time |
|---|
| 427 | + have the format seconds.nanoseconds. If start is not given (i.e. time |
|---|
| 406 | 428 | string is ',x.y') then analysis starts at the beginning of the file. If |
|---|
| 407 | | - stop time is not given (i.e, time string is 'x.y,') then analysis goes |
|---|
| 408 | | - to end of file. |
|---|
| 429 | + stop time is not given (i.e. time string is 'x.y,') then analysis goes |
|---|
| 430 | + to end of file. Multiple ranges can be separated by spaces, which |
|---|
| 431 | + requires the argument to be quoted e.g. --time "1234.567,1234.789 1235," |
|---|
| 409 | 432 | |
|---|
| 410 | | - Also support time percent with multiple time range. Time string is |
|---|
| 433 | + Also support time percent with multiple time ranges. Time string is |
|---|
| 411 | 434 | 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. |
|---|
| 412 | 435 | |
|---|
| 413 | 436 | For example: |
|---|
| .. | .. |
|---|
| 426 | 449 | Select from 0% to 10% and 30% to 40% slices: |
|---|
| 427 | 450 | |
|---|
| 428 | 451 | perf report --time 0%-10%,30%-40% |
|---|
| 452 | + |
|---|
| 453 | +--switch-on EVENT_NAME:: |
|---|
| 454 | + Only consider events after this event is found. |
|---|
| 455 | + |
|---|
| 456 | + This may be interesting to measure a workload only after some initialization |
|---|
| 457 | + phase is over, i.e. insert a perf probe at that point and then using this |
|---|
| 458 | + option with that probe. |
|---|
| 459 | + |
|---|
| 460 | +--switch-off EVENT_NAME:: |
|---|
| 461 | + Stop considering events after this event is found. |
|---|
| 462 | + |
|---|
| 463 | +--show-on-off-events:: |
|---|
| 464 | + Show the --switch-on/off events too. This has no effect in 'perf report' now |
|---|
| 465 | + but probably we'll make the default not to show the switch-on/off events |
|---|
| 466 | + on the --group mode and if there is only one event besides the off/on ones, |
|---|
| 467 | + go straight to the histogram browser, just like 'perf report' with no events |
|---|
| 468 | + explicitely specified does. |
|---|
| 429 | 469 | |
|---|
| 430 | 470 | --itrace:: |
|---|
| 431 | 471 | Options for decoding instruction tracing data. The options are: |
|---|
| .. | .. |
|---|
| 448 | 488 | This option extends the perf report to show reference callgraphs, |
|---|
| 449 | 489 | which collected by reference event, in no callgraph event. |
|---|
| 450 | 490 | |
|---|
| 491 | +--stitch-lbr:: |
|---|
| 492 | + Show callgraph with stitched LBRs, which may have more complete |
|---|
| 493 | + callgraph. The perf.data file must have been obtained using |
|---|
| 494 | + perf record --call-graph lbr. |
|---|
| 495 | + Disabled by default. In common cases with call stack overflows, |
|---|
| 496 | + it can recreate better call stacks than the default lbr call stack |
|---|
| 497 | + output. But this approach is not full proof. There can be cases |
|---|
| 498 | + where it creates incorrect call stacks from incorrect matches. |
|---|
| 499 | + The known limitations include exception handing such as |
|---|
| 500 | + setjmp/longjmp will have calls/returns not match. |
|---|
| 501 | + |
|---|
| 451 | 502 | --socket-filter:: |
|---|
| 452 | 503 | Only report the samples on the processor socket that match with this filter |
|---|
| 504 | + |
|---|
| 505 | +--samples=N:: |
|---|
| 506 | + Save N individual samples for each histogram entry to show context in perf |
|---|
| 507 | + report tui browser. |
|---|
| 453 | 508 | |
|---|
| 454 | 509 | --raw-trace:: |
|---|
| 455 | 510 | When displaying traceevent output, do not use print fmt or plugins. |
|---|
| .. | .. |
|---|
| 469 | 524 | Please note that not all mmaps are stored, options affecting which ones |
|---|
| 470 | 525 | are include 'perf record --data', for instance. |
|---|
| 471 | 526 | |
|---|
| 527 | +--ns:: |
|---|
| 528 | + Show time stamps in nanoseconds. |
|---|
| 529 | + |
|---|
| 472 | 530 | --stats:: |
|---|
| 473 | 531 | Display overall events statistics without any further processing. |
|---|
| 474 | 532 | (like the one at the end of the perf report -D command) |
|---|
| .. | .. |
|---|
| 486 | 544 | The period/hits keywords set the base the percentage is computed |
|---|
| 487 | 545 | on - the samples period or the number of samples (hits). |
|---|
| 488 | 546 | |
|---|
| 547 | +--time-quantum:: |
|---|
| 548 | + Configure time quantum for time sort key. Default 100ms. |
|---|
| 549 | + Accepts s, us, ms, ns units. |
|---|
| 550 | + |
|---|
| 551 | +--total-cycles:: |
|---|
| 552 | + When --total-cycles is specified, it supports sorting for all blocks by |
|---|
| 553 | + 'Sampled Cycles%'. This is useful to concentrate on the globally hottest |
|---|
| 554 | + blocks. In output, there are some new columns: |
|---|
| 555 | + |
|---|
| 556 | + 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles |
|---|
| 557 | + 'Sampled Cycles' - block sampled cycles aggregation |
|---|
| 558 | + 'Avg Cycles%' - block average sampled cycles / sum of total block average |
|---|
| 559 | + sampled cycles |
|---|
| 560 | + 'Avg Cycles' - block average sampled cycles |
|---|
| 561 | + |
|---|
| 489 | 562 | include::callchain-overhead-calculation.txt[] |
|---|
| 490 | 563 | |
|---|
| 491 | 564 | SEE ALSO |
|---|
| 492 | 565 | -------- |
|---|
| 493 | | -linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1] |
|---|
| 566 | +linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1], |
|---|
| 567 | +linkperf:perf-intel-pt[1] |
|---|