hc
2024-02-20 102a0743326a03cd1a1202ceda21e175b7d3575c
kernel/tools/perf/Documentation/perf-report.txt
....@@ -89,12 +89,13 @@
8989 - socket: processor socket number the task ran at the time of sample
9090 - srcline: filename and line number executed at the time of sample. The
9191 DWARF debugging info must be provided.
92
- - srcfile: file name of the source file of the same. Requires dwarf
92
+ - srcfile: file name of the source file of the samples. Requires dwarf
9393 information.
9494 - weight: Event specific weight, e.g. memory latency or transaction
9595 abort cost. This is the global weight.
9696 - local_weight: Local weight version of the weight above.
9797 - cgroup_id: ID derived from cgroup namespace device and inode numbers.
98
+ - cgroup: cgroup pathname in the cgroupfs.
9899 - transaction: Transaction abort flags.
99100 - overhead: Overhead percentage of sample
100101 - overhead_sys: Overhead percentage of sample running in system mode
....@@ -105,6 +106,8 @@
105106 guest machine
106107 - sample: Number of sample
107108 - period: Raw number of event count of sample
109
+ - time: Separate the samples by time stamp with the resolution specified by
110
+ --time-quantum (default 100ms). Specify with overhead and before it.
108111
109112 By default, comm, dso and symbol keys are used.
110113 (i.e. --sort comm,dso,symbol)
....@@ -125,6 +128,14 @@
125128
126129 And default sort keys are changed to comm, dso_from, symbol_from, dso_to
127130 and symbol_to, see '--branch-stack'.
131
+
132
+ When the sort key symbol is specified, columns "IPC" and "IPC Coverage"
133
+ are enabled automatically. Column "IPC" reports the average IPC per function
134
+ and column "IPC coverage" reports the percentage of instructions with
135
+ sampled IPC in this function. IPC means Instruction Per Cycle. If it's low,
136
+ it indicates there may be a performance bottleneck when the function is
137
+ executed, such as a memory access bottleneck. If a function has high overhead
138
+ and low IPC, it's worth further analyzing it to optimize its performance.
128139
129140 If the --mem-mode option is used, the following sort keys are also available
130141 (incompatible with --branch-stack):
....@@ -244,7 +255,7 @@
244255 Usually more convenient to use --branch-history for this.
245256
246257 value can be:
247
- - percent: diplay overhead percent (default)
258
+ - percent: display overhead percent (default)
248259 - period: display event period
249260 - count: display event count
250261
....@@ -357,9 +368,20 @@
357368 --objdump=<path>::
358369 Path to objdump binary.
359370
371
+--prefix=PREFIX::
372
+--prefix-strip=N::
373
+ Remove first N entries from source file path names in executables
374
+ and add PREFIX. This allows to display source code compiled on systems
375
+ with different file system layout.
376
+
360377 --group::
361378 Show event group information together. It forces group output also
362379 if there are no groups defined in data file.
380
+
381
+--group-sort-idx::
382
+ Sort the output by the event at the index n in group. If n is invalid,
383
+ sort by the first event. It can support multiple groups with different
384
+ amount of events. WARNING: This should be used on grouped events.
363385
364386 --demangle::
365387 Demangle symbol names to human readable form. It's enabled by default,
....@@ -402,12 +424,13 @@
402424
403425 --time::
404426 Only analyze samples within given time window: <start>,<stop>. Times
405
- have the format seconds.microseconds. If start is not given (i.e., time
427
+ have the format seconds.nanoseconds. If start is not given (i.e. time
406428 string is ',x.y') then analysis starts at the beginning of the file. If
407
- stop time is not given (i.e, time string is 'x.y,') then analysis goes
408
- to end of file.
429
+ stop time is not given (i.e. time string is 'x.y,') then analysis goes
430
+ to end of file. Multiple ranges can be separated by spaces, which
431
+ requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
409432
410
- Also support time percent with multiple time range. Time string is
433
+ Also support time percent with multiple time ranges. Time string is
411434 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
412435
413436 For example:
....@@ -426,6 +449,23 @@
426449 Select from 0% to 10% and 30% to 40% slices:
427450
428451 perf report --time 0%-10%,30%-40%
452
+
453
+--switch-on EVENT_NAME::
454
+ Only consider events after this event is found.
455
+
456
+ This may be interesting to measure a workload only after some initialization
457
+ phase is over, i.e. insert a perf probe at that point and then using this
458
+ option with that probe.
459
+
460
+--switch-off EVENT_NAME::
461
+ Stop considering events after this event is found.
462
+
463
+--show-on-off-events::
464
+ Show the --switch-on/off events too. This has no effect in 'perf report' now
465
+ but probably we'll make the default not to show the switch-on/off events
466
+ on the --group mode and if there is only one event besides the off/on ones,
467
+ go straight to the histogram browser, just like 'perf report' with no events
468
+ explicitely specified does.
429469
430470 --itrace::
431471 Options for decoding instruction tracing data. The options are:
....@@ -448,8 +488,23 @@
448488 This option extends the perf report to show reference callgraphs,
449489 which collected by reference event, in no callgraph event.
450490
491
+--stitch-lbr::
492
+ Show callgraph with stitched LBRs, which may have more complete
493
+ callgraph. The perf.data file must have been obtained using
494
+ perf record --call-graph lbr.
495
+ Disabled by default. In common cases with call stack overflows,
496
+ it can recreate better call stacks than the default lbr call stack
497
+ output. But this approach is not full proof. There can be cases
498
+ where it creates incorrect call stacks from incorrect matches.
499
+ The known limitations include exception handing such as
500
+ setjmp/longjmp will have calls/returns not match.
501
+
451502 --socket-filter::
452503 Only report the samples on the processor socket that match with this filter
504
+
505
+--samples=N::
506
+ Save N individual samples for each histogram entry to show context in perf
507
+ report tui browser.
453508
454509 --raw-trace::
455510 When displaying traceevent output, do not use print fmt or plugins.
....@@ -469,6 +524,9 @@
469524 Please note that not all mmaps are stored, options affecting which ones
470525 are include 'perf record --data', for instance.
471526
527
+--ns::
528
+ Show time stamps in nanoseconds.
529
+
472530 --stats::
473531 Display overall events statistics without any further processing.
474532 (like the one at the end of the perf report -D command)
....@@ -486,8 +544,24 @@
486544 The period/hits keywords set the base the percentage is computed
487545 on - the samples period or the number of samples (hits).
488546
547
+--time-quantum::
548
+ Configure time quantum for time sort key. Default 100ms.
549
+ Accepts s, us, ms, ns units.
550
+
551
+--total-cycles::
552
+ When --total-cycles is specified, it supports sorting for all blocks by
553
+ 'Sampled Cycles%'. This is useful to concentrate on the globally hottest
554
+ blocks. In output, there are some new columns:
555
+
556
+ 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
557
+ 'Sampled Cycles' - block sampled cycles aggregation
558
+ 'Avg Cycles%' - block average sampled cycles / sum of total block average
559
+ sampled cycles
560
+ 'Avg Cycles' - block average sampled cycles
561
+
489562 include::callchain-overhead-calculation.txt[]
490563
491564 SEE ALSO
492565 --------
493
-linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1]
566
+linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1],
567
+linkperf:perf-intel-pt[1]