~hc/RK356X_SDK_RELEASE.git

..	..	@@ -47,6 +47,10 @@
47	47	param1 and param2 are defined as formats for the PMU in
48	48	/sys/bus/event_source/devices/<pmu>/format/*
49	49
	50	+ 'percore' is a event qualifier that sums up the event counts for both
	51	+ hardware threads in a core. For example:
	52	+ perf stat -A -a -e cpu/event,percore=1/,otherevent ...
	53	+
50	54	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
51	55	where M, N, K are numbers (in decimal, hex, octal format).
52	56	Acceptable values for each of 'config', 'config1' and 'config2'
..	..	@@ -54,7 +58,7 @@
54	58	/sys/bus/event_source/devices/<pmu>/format/*
55	59
56	60	Note that the last two syntaxes support prefix and glob matching in
57		- the PMU name to simplify creation of events accross multiple instances
	61	+ the PMU name to simplify creation of events across multiple instances
58	62	of the same type of PMU in large systems (e.g. memory controller PMUs).
59	63	Multiple PMU instances are typical for uncore PMUs, so the prefix
60	64	'uncore_' is also ignored when performing this match.
..	..	@@ -71,14 +75,23 @@
71	75	--tid=<tid>::
72	76	stat events on existing thread id (comma separated list)
73	77
	78	+ifdef::HAVE_LIBPFM[]
	79	+--pfm-events events::
	80	+Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
	81	+including support for event filters. For example '--pfm-events
	82	+inst_retired:any_p:u:c=1:i'. More than one event can be passed to the
	83	+option using the comma separator. Hardware events and generic hardware
	84	+events cannot be mixed together. The latter must be used with the -e
	85	+option. The -e option and this one can be mixed and matched. Events
	86	+can be grouped using the {} notation.
	87	+endif::HAVE_LIBPFM[]
74	88
75	89	-a::
76	90	--all-cpus::
77	91	system-wide collection from all CPUs (default if no target is specified)
78	92
79		--c::
80		---scale::
81		- scale/normalize counter values
	93	+--no-scale::
	94	+ Don't scale/normalize counter values
82	95
83	96	-d::
84	97	--detailed::
..	..	@@ -94,7 +107,9 @@
94	107
95	108	-B::
96	109	--big-num::
97		- print large numbers with thousands' separators according to locale
	110	+ print large numbers with thousands' separators according to locale.
	111	+ Enabled by default. Use "--no-big-num" to disable.
	112	+ Default setting can be changed with "perf config stat.big-num=false".
98	113
99	114	-C::
100	115	--cpu=::
..	..	@@ -151,6 +166,11 @@
151	166	If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this
152	167	command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
153	168
	169	+--for-each-cgroup name::
	170	+Expand event list for each cgroup in "name" (allow multiple cgroups separated
	171	+by comma). This has same effect that repeating -e option and -G option for
	172	+each event x name. This option cannot be used with -G/--cgroup option.
	173	+
154	174	-o file::
155	175	--output file::
156	176	Print the output into the designated file.
..	..	@@ -165,6 +185,47 @@
165	185	3>results perf stat --log-fd 3 -- $cmd
166	186	3>>results perf stat --log-fd 3 --append -- $cmd
167	187
	188	+--control=fifo:ctl-fifo[,ack-fifo]::
	189	+--control=fd:ctl-fd[,ack-fd]::
	190	+ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
	191	+Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
	192	+'disable': disable events). Measurements can be started with events disabled using
	193	+--delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
	194	+to synchronize with the controlling process. Example of bash shell script to enable and
	195	+disable events during measurements:
	196	+
	197	+ #!/bin/bash
	198	+
	199	+ ctl_dir=/tmp/
	200	+
	201	+ ctl_fifo=${ctl_dir}perf_ctl.fifo
	202	+ test -p ${ctl_fifo} && unlink ${ctl_fifo}
	203	+ mkfifo ${ctl_fifo}
	204	+ exec {ctl_fd}<>${ctl_fifo}
	205	+
	206	+ ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
	207	+ test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
	208	+ mkfifo ${ctl_ack_fifo}
	209	+ exec {ctl_fd_ack}<>${ctl_ack_fifo}
	210	+
	211	+ perf stat -D -1 -e cpu-cycles -a -I 1000 \
	212	+ --control fd:${ctl_fd},${ctl_fd_ack} \
	213	+ -- sleep 30 &
	214	+ perf_pid=$!
	215	+
	216	+ sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
	217	+ sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
	218	+
	219	+ exec {ctl_fd_ack}>&-
	220	+ unlink ${ctl_ack_fifo}
	221	+
	222	+ exec {ctl_fd}>&-
	223	+ unlink ${ctl_fifo}
	224	+
	225	+ wait -n ${perf_pid}
	226	+ exit $?
	227	+
	228	+
168	229	--pre::
169	230	--post::
170	231	Pre and post measurement hooks, e.g.:
..	..	@@ -176,6 +237,8 @@
176	237	Print count deltas every N milliseconds (minimum: 1ms)
177	238	The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution.
178	239	example: 'perf stat -I 1000 -e cycles -a sleep 5'
	240	+
	241	+If the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #.
179	242
180	243	--interval-count times::
181	244	Print count deltas for fixed number of times.
..	..	@@ -201,6 +264,13 @@
201	264	socket number and the number of online processors on that socket. This is
202	265	useful to gauge the amount of aggregation.
203	266
	267	+--per-die::
	268	+Aggregate counts per processor die for system-wide mode measurements. This
	269	+is a useful mode to detect imbalance between dies. To enable this mode,
	270	+use --per-die in addition to -a. (system-wide). The output includes the
	271	+die number and the number of online processors on that die. This is
	272	+useful to gauge the amount of aggregation.
	273	+
204	274	--per-core::
205	275	Aggregate counts per physical processor for system-wide mode measurements. This
206	276	is a useful mode to detect imbalance between physical cores. To enable this mode,
..	..	@@ -211,15 +281,40 @@
211	281	Aggregate counts per monitored threads, when monitoring threads (-t option)
212	282	or processes (-p option).
213	283
	284	+--per-node::
	285	+Aggregate counts per NUMA nodes for system-wide mode measurements. This
	286	+is a useful mode to detect imbalance between NUMA nodes. To enable this
	287	+mode, use --per-node in addition to -a. (system-wide).
	288	+
214	289	-D msecs::
215	290	--delay msecs::
216		-After starting the program, wait msecs before measuring. This is useful to
217		-filter out the startup phase of the program, which is often very different.
	291	+After starting the program, wait msecs before measuring (-1: start with events
	292	+disabled). This is useful to filter out the startup phase of the program,
	293	+which is often very different.
218	294
219	295	-T::
220	296	--transaction::
221	297
222	298	Print statistics of transactional execution if supported.
	299	+
	300	+--metric-no-group::
	301	+By default, events to compute a metric are placed in weak groups. The
	302	+group tries to enforce scheduling all or none of the events. The
	303	+--metric-no-group option places events outside of groups and may
	304	+increase the chance of the event being scheduled - leading to more
	305	+accuracy. However, as events may not be scheduled together accuracy
	306	+for metrics like instructions per cycle can be lower - as both metrics
	307	+may no longer be being measured at the same time.
	308	+
	309	+--metric-no-merge::
	310	+By default metric events in different weak groups can be shared if one
	311	+group contains all the events needed by another. In such cases one
	312	+group will be eliminated reducing event multiplexing and making it so
	313	+that certain groups of metrics sum to 100%. A downside to sharing a
	314	+group is that the group may require multiplexing and so accuracy for a
	315	+small group that need not have multiplexing is lowered. This option
	316	+forbids the event merging logic from sharing events between groups and
	317	+may be used to increase accuracy in this case.
223	318
224	319	STAT RECORD
225	320	-----------
..	..	@@ -239,6 +334,9 @@
239	334
240	335	--per-socket::
241	336	Aggregate counts per processor socket for system-wide mode measurements.
	337	+
	338	+--per-die::
	339	+Aggregate counts per processor die for system-wide mode measurements.
242	340
243	341	--per-core::
244	342	Aggregate counts per physical processor for system-wide mode measurements.
..	..	@@ -270,6 +368,11 @@
270	368	For best results it is usually a good idea to use it with interval
271	369	mode like -I 1000, as the bottleneck of workloads can change often.
272	370
	371	+This enables --metric-only, unless overridden with --no-metric-only.
	372	+
	373	+The following restrictions only apply to older Intel CPUs and Atom,
	374	+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
	375	+
273	376	The top down metrics are collected per core instead of per
274	377	CPU thread. Per core mode is automatically enabled
275	378	and -a (global monitoring) is needed, requiring root rights or
..	..	@@ -280,8 +383,6 @@
280	383	echo 0 > /proc/sys/kernel/nmi_watchdog
281	384	for best results. Otherwise the bottlenecks may be inconsistent
282	385	on workload with changing phases.
283		-
284		-This enables --metric-only, unless overriden with --no-metric-only.
285	386
286	387	To interpret the results it is usually needed to know on which
287	388	CPUs the workload runs on. If needed the CPUs can be forced using
..	..	@@ -314,6 +415,24 @@
314	415
315	416	Users who wants to get the actual value can apply --no-metric-only.
316	417
	418	+--all-kernel::
	419	+Configure all used events to run in kernel space.
	420	+
	421	+--all-user::
	422	+Configure all used events to run in user space.
	423	+
	424	+--percore-show-thread::
	425	+The event modifier "percore" has supported to sum up the event counts
	426	+for all hardware threads in a core and show the counts per core.
	427	+
	428	+This option with event modifier "percore" enabled also sums up the event
	429	+counts for all hardware threads in a core but show the sum counts per
	430	+hardware thread. This is essentially a replacement for the any bit and
	431	+convenient for post processing.
	432	+
	433	+--summary::
	434	+Print summary for interval mode (-I).
	435	+
317	436	EXAMPLES
318	437	--------
319	438