summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2016-05-11perf scripting python: Use Py_FatalError instead of die()Arnaldo Carvalho de Melo
It probably is equivalent, but that seems to be the "pythonic" way of dieing? Anyway, one less die() in the tools/perf codebase. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Chris Phlipot <cphlipot0@gmail.com> Link: http://lkml.kernel.org/n/tip-nlzgepdv2818zs4e7faif9tu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11Merge tag 'perf-core-for-mingo-20160510' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Recording 'dwarf' callchains do not need DWARF unwinding support (He Kuang) - Print recently added perf_event_attr.write_backward bit flag in -vv verbose mode (Arnaldo Carvalho de Melo) - Fix incorrect python db-export error message in 'perf script' (Chris Phlipot) - Fix handling of zero-length symbols (Chris Phlipot) - perf stat: Scale values by unit before metrics (Andi Kleen) Infrastructure changes: - Rewrite strbuf not to die(), making tools using it to check its return value instead (Masami Hiramatsu) - Support reading from backward ring buffer, add a 'perf test' entry for it (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-11Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-11perf diff: Fix duplicated output columnNamhyung Kim
The commit b97511c5bc94 ("perf tools: Add overhead/overhead_children keys defaults via string") moved initialization of column headers but it missed to check the sort__mode. As 'perf diff' doesn't call perf_hpp__init(), the setup_overhead() also should not be called. Before: # Baseline Delta Children Overhead Shared Object Symbol # ........ ....... ........ ........ ................... ....................... # 28.48% -28.47% 28.48% 28.48% [kernel.vmlinux ] [k] intel_idle 11.51% -11.47% 11.51% 11.51% libxul.so [.] 0x0000000001a360f7 3.49% -3.49% 3.49% 3.49% [kernel.vmlinux] [k] generic_exec_single 2.91% -2.89% 2.91% 2.91% libdbus-1.so.3.8.11 [.] 0x000000000000cdc2 2.86% -2.85% 2.86% 2.86% libxcb.so.1.1.0 [.] 0x000000000000c890 2.44% -2.39% 2.44% 2.44% [kernel.vmlinux] [k] perf_event_aux_ctx After: # Baseline Delta Shared Object Symbol # ........ ....... ................... ....................... # 28.48% -28.47% [kernel.vmlinux] [k] intel_idle 11.51% -11.47% libxul.so [.] 0x0000000001a360f7 3.49% -3.49% [kernel.vmlinux] [k] generic_exec_single 2.91% -2.89% libdbus-1.so.3.8.11 [.] 0x000000000000cdc2 2.86% -2.85% libxcb.so.1.1.0 [.] 0x000000000000c890 2.44% -2.39% [kernel.vmlinux] [k] perf_event_aux_ctx Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: <stable@vger.kernel.org> # 4.5+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b97511c5bc94 ("perf tools: Add overhead/overhead_children keys defaults via string") Link: http://lkml.kernel.org/r/1462890384-12486-2-git-send-email-acme@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10perf tools: Remove xrealloc and ALLOC_GROWMasami Hiramatsu
Remove unused xrealloc() and ALLOC_GROW() from libperf. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054801.6158.6204.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf help: Do not use ALLOC_GROW in add_cmd_listMasami Hiramatsu
Replace ALLOC_GROW with normal realloc code in add_cmd_list() so that it can handle errors directly. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054752.6158.30562.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf pmu: Make pmu_formats_string to check return value of strbufMasami Hiramatsu
Make pmu_formats_string() to check return value of strbuf APIs so that it can detect errors in it. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054744.6158.37810.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf header: Make topology checkers to check return value of strbufMasami Hiramatsu
Make topology checkers to check the return value of strbuf APIs so that it can detect errors in it. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054735.6158.98650.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf tools: Make alias handler to check return value of strbufMasami Hiramatsu
Make alias handler and sq_quote_argv to check the return value of strbuf APIs. In sq_quote_argv() calls die(), but this fix handles strbuf failure as a special case and returns to caller, since the caller - handle_alias() also has to check the return value of other strbuf APIs and those checks can be merged to one if() statement. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054725.6158.84597.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf help: Make check_emacsclient_version to check strbuf APIsMasami Hiramatsu
Make check_emacsclient_version() to check the return value of strbuf APIs so that it can handle errors in strbuf. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054716.6158.11755.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf probe: Check the return value of strbuf APIsMasami Hiramatsu
Check the return value of strbuf APIs in perf-probe related code, so that it can handle errors in strbuf. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054707.6158.69861.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10perf tools: Rewrite strbuf not to die()Masami Hiramatsu
Rewrite strbuf implementation not to use die() nor xrealloc(). Instead of die(), now most of the API returns error code or 0 if succeeded. Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160510054658.6158.24080.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf symbols: Fix handling of zero-length symbols.Chris Phlipot
This change introduces a fix to symbols__find, so that it is able to find symbols of length zero (where start == end). The current code has the following problem: - The current implementation of symbols__find is unable to find any symbols of length zero. - The db-export framework explicitly creates zero length symbols at locations where no symbol currently exists. The combination of the two above behaviors results in behavior similar to the example below. 1. addr_location is created for a sample, but symbol is unable to be resolved. 2. db export creates an "unknown" symbol of length zero at that address and inserts it into the dso. 3. A new sample comes in at the same address, but symbol__find is unable to find the zero length symbol, so it is still unresolved. 4. db export sees the symbol is unresolved, and allocated a duplicate symbol, even though it already did this in step 2. This behavior continues every time an address without symbol information is seen, which causes a very large number of these symbols to be allocated. The effect of this fix can be observed by looking at the contents of an exported database before/after the fix (generated with scripts/python/export-to-postgresql.py) Ex. BEFORE THE CHANGE: example_db=# select count(*) from symbols; count -------- 900213 (1 row) example_db=# select count(*) from symbols where symbols.name='unknown'; count -------- 897355 (1 row) example_db=# select count(*) from symbols where symbols.name!='unknown'; count ------- 2858 (1 row) AFTER THE CHANGE: example_db=# select count(*) from symbols; count ------- 25217 (1 row) example_db=# select count(*) from symbols where name='unknown'; count ------- 22359 (1 row) example_db=# select count(*) from symbols where name!='unknown'; count ------- 2858 (1 row) Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1462612620-25008-1-git-send-email-cphlipot0@gmail.com [ Moved the test to later in the rb_tree tests, as this not the likely case ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf evsel: Print state of perf_event_attr.write_backwardArnaldo Carvalho de Melo
Now we can see if it is set when using verbose mode in various tools, such as 'perf test': # perf test -vv back 45: Test backward reading from ring buffer : --- start --- <SNIP> ------------------------------------------------------------ perf_event_attr: type 2 size 112 config 0x98 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW disabled 1 mmap 1 comm 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 write_backward 1 ------------------------------------------------------------ sys_perf_event_open: pid 20911 cpu -1 group_fd -1 flags 0x8 <SNIP> ---- end ---- Test backward reading from ring buffer: Ok # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-kxv05kv9qwl5of7rzfeiiwbv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf tests: Add test to check backward ring bufferWang Nan
This test checks reading from backward ring buffer. Test result: # ~/perf test 'ring buffer' 45: Test backward reading from ring buffer : Ok The test case is a while loop which calls prctl(PR_SET_NAME) multiple times. Each prctl should issue 2 events: one PERF_RECORD_SAMPLE, one PERF_RECORD_COMM. The first round creates a relative large ring buffer (256 pages). It can afford all events. Read from it and check the count of each type of events. The second round creates a small ring buffer (1 page) and makes it overwritable. Check the correctness of the buffer. Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1462758471-89706-3-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf tools: Support reading from backward ring bufferWang Nan
perf_evlist__mmap_read_backward() is introduced for reading backward ring buffer. Since direction for reading such ring buffer is different from the direction kernel writing to it, and since user need to fetch most recent record from it, a perf_evlist__mmap_read_catchup() is introduced to move the reading pointer to the end of the buffer. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1462758471-89706-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf script: Fix incorrect python db-export error messageChris Phlipot
Fix the error message printed when attempting and failing to create the call path root incorrectly references the call return process. This change fixes the message to properly reference the failure to create the call path root. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1462612620-25008-2-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf stat: Scale values by unit before metricsAndi Kleen
Scale values by unit before passing them to the metrics printing functions. This is needed for TopDown, because it needs to scale the slots correctly by pipeline width / SMTness. For existing metrics it shouldn't make any difference, as those generally use events that don't have any units. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1462489447-31832-8-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-09perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding supportHe Kuang
There is no need to check for DWARF unwinding support when using the 'dwarf' callchain record method, as this will only ask the kernel to collect stack dumps for later DWARF CFI processing, which can be done in another machine, where the support for DWARF unwinding need to be present. Signed-off-by: He Kuang <hekuang@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1462525154-125656-2-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-08tools: bpf_jit_disasm: check for klogctl failureColin Ian King
klogctl can fail and return -ve len, so check for this and return NULL to avoid passing a (size_t)-1 to malloc. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06perf trace: Move futex_op beautifier to tools/perf/trace/beauty/Arnaldo Carvalho de Melo
To reduce the size of builtin-trace.c. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vb8dpy7bptkf219q5c25ulfp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf trace: Move open_flags beautifier to tools/perf/trace/beauty/Arnaldo Carvalho de Melo
To reduce the size of builtin-trace.c. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-jt293541hv9od7gqw6lilioh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf trace: Move signum beautifier to tools/perf/trace/beauty/Arnaldo Carvalho de Melo
To reduce the size of builtin-trace.c. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qecqxwwtreio6eaatfv58yq5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf stat: Add extra output of counter values with -vvAndi Kleen
Add debug output of raw counter values per CPU when perf stat -v is specified, together with their cpu numbers. This is very useful to debug problems with per core counters, where we can normally only see aggregated values. v2: Make it depend on -vv, not -v Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461787251-6702-12-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf script: Update export-to-postgresql to support callchain exportChris Phlipot
Update the export-to-postgresql.py to support the newly introduced callchain export. callchains are added into the existing call_paths table and can now be associated with samples when the "callpaths" commandline option is used with the script. Ex.: $ perf script -s export-to-postgresql.py example_db all callchains Includes the following changes to enable callchain export via the python export APIs: - Add the "callchains" commandline option, which is used to enable callchain export by setting the perf_db_export_callchains global - Add perf_db_export_callchains checks for call_path table creation and population. - Add call_path_id to samples_table to conform with the new API example usage and output using a small test app: test_app.c: volatile int x = 0; void inc_x_loop() { int i; for(i=0; i<100000000; i++) x++; } void a() { inc_x_loop(); } void b() { inc_x_loop(); } int main() { a(); b(); return 0; } example usage: $ gcc -g -O0 test_app.c $ perf record --call-graph=dwarf ./a.out [ perf record: Woken up 77 times to write data ] [ perf record: Captured and wrote 19.373 MB perf.data (2404 samples) ] $ perf script -s scripts/python/export-to-postgresql.py example_db all callchains $ psql example_db example_db=# SELECT (SELECT name FROM symbols WHERE id = cps.symbol_id) as symbol, (SELECT name FROM symbols WHERE id = (SELECT symbol_id from call_paths where id = cps.parent_id)) as parent_symbol, sum(period) as event_count FROM samples join call_paths as cps on call_path_id = cps.id GROUP BY cps.id,evsel_id ORDER BY event_count DESC LIMIT 5; symbol | parent_symbol | event_count ------------------+--------------------------+------------- inc_x_loop | a | 734250982 inc_x_loop | b | 731028057 unknown | unknown | 1335858 task_tick_fair | scheduler_tick | 1238842 update_wall_time | tick_do_update_jiffies64 | 650373 (5 rows) The above data shows total "self time" in cycles for each call path that was sampled. It is intended to demonstrate how it accounts separately for the two ways to reach the "inc_x_loop" function(via "a" and "b"). Recursive common table expressions can be used as well to get cumulative time spent in a function as well, but that is beyond the scope of this basic example. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-7-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf script: Expose usage of the callchain db export via the python apiChris Phlipot
This change allows python scripts to be able to utilize the recent changes to the db export api allowing the export of call_paths derived from sampled callchains. These call paths are also now associated with the samples from which they were derived. - This feature is enabled by setting "perf_db_export_callchains" to true - When enabled, samples that have callchain information will have the callchains exported via call_path_table - The call_path_id field is added to sample_table to enable association of samples with the corresponding callchain stored in the call paths table. A call_path_id of 0 will be exported if there is no corresponding callchain. - When "perf_db_export_callchains" and "perf_db_export_calls" are both set to True, the call path root data structure will be shared. This prevents duplicating of data and call path ids that would result from building two separate call path trees in memory. - The call_return_processor structure definition was relocated to the header file to make its contents visible to db-export.c. This enables the sharing of call path trees between the two features, as mentioned above. This change is visible to python scripts using the python db export api. The change is backwards compatible with scripts written against the previous API, assuming that the scripts model the sample_table function after the one in export-to-postgresql.py script by allowing for additional arguments to be added in the future. ie. using *x as the final argument of the sample_table function. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-6-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf script: Add call path id to exported sample in db exportChris Phlipot
The exported sample now contains a reference to the call_path_id that represents its callchain. While callchains themselves are nice to have, being able to associate them with samples makes them much more useful, and can allow for such things as determining how much cumulative time is spent in a particular function. This information is normally possible to get from the call return processor. However, when doing normal sampling, call/return information is not available, thus necessitating the need for associating samples directly with call paths. This commit include changes to db-export layer to make this information available for subsequent patches in this change set, but by itself, does not make any changes visible to the user. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-5-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf script: Enable db export to output sampled callchainsChris Phlipot
This change enables the db export api to export callchains. This is accomplished by adding callchains obtained from samples to the call_path_root structure and exporting them via the current call path export API. While the current API does support exporting call paths, this is not supported when sampling. This commit addresses that missing feature by allowing the export of call paths when callchains are present in samples. Summary: - This feature is activated by initializing the call_path_root member inside the db_export structure to a non-null value. - Callchains are resolved with thread__resolve_callchain() and then stored and exported by adding a call path under call path root. - Symbol and DSO for each callchain node are exported via db_ids_from_al() This commit puts in place infrastructure to be used by subsequent commits, and by itself, does not introduce any user-visible changes. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-4-git-send-email-cphlipot0@gmail.com [ Made adjustments suggested by Adrian Hunter, see thread via this cset's Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf tools: Refactor code to move call path handling out of thread-stackChris Phlipot
Move the call path handling code out of thread-stack.c and thread-stack.h to allow other components that are not part of thread-stack to create call paths. Summary: - Create call-path.c and call-path.h and add them to the build. - Move all call path related code out of thread-stack.c and thread-stack.h and into call-path.c and call-path.h. - A small subset of structures and functions are now visible through call-path.h, which is required for thread-stack.c to continue to compile. This change is a prerequisite for subsequent patches in this change set and by itself contains no user-visible changes. Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-3-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf callchain: Fix incorrect ordering of entriesChris Phlipot
The existing implementation of thread__resolve_callchain, under certain circumstances, can assemble callchain entries in the incorrect order. The callchain entries are resolved incorrectly for a sample when all of the following conditions are met: 1. callchain_param.order is set to ORDER_CALLER 2. thread__resolve_callchain_sample is able to resolve callchain entries for the sample. 3. unwind__get_entries is also able to resolve callchain entries for the sample. The fix is accomplished by reversing the order in which thread__resolve_callchain_sample and unwind__get_entries are called when callchain_param.order is set to ORDER_CALLER. Unwind specific code from thread__resolve_callchain is also moved into a new static function to improve readability of the fix. How to Reproduce the Existing Bug: Modifying perf script to print call trees in the opposite order or applying the remaining patches from this series and comparing the results output from export-to-postgtresql.py are the easiest ways to see the bug, however it can still be seen in current builds using perf report. Here is how i can reproduce the bug using perf report: # perf record --call-graph=dwarf stress -c 1 -t 5 when i run this command: # perf report --call-graph=flat,0,0,callee This callchain, containing kernel (handle_irq_event, etc) and userspace samples (__libc_start_main, etc) is contained in the output, which looks correct (callee order): gen8_irq_handler handle_irq_event_percpu handle_irq_event handle_edge_irq handle_irq do_IRQ ret_from_intr __random rand 0x558f2a04dded 0x558f2a04c774 __libc_start_main 0x558f2a04dcd9 Now run this command using caller order: # perf report --call-graph=flat,0,0,caller It is expected to see the exact reverse of the above when using caller order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom) in the output, but it is nowhere to be found. instead you see this: ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random Notice how internally the kernel symbols are reversed and the user space symbols are reversed, but the kernel symbols still appear above the user space symbols. if this patch is applied and perf script is re-run, you will see the expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the bottom): 0x558f2a04dcd9 __libc_start_main 0x558f2a04c774 0x558f2a04dded rand __random ret_from_intr do_IRQ handle_irq handle_edge_irq handle_irq_event handle_irq_event_percpu gen8_irq_handler Signed-off-by: Chris Phlipot <cphlipot0@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-06perf trace: Do not print raw args list for syscalls with no argsArnaldo Carvalho de Melo
The test to check if the arg format had been read from the syscall:sys_enter_name/format file was looking at the list of non-commom fields, and if that is empty, it would think it had failed to read it, because it doesn't exist, for instance, for the clone() syscall. So instead before dumping the raw syscall args list check IS_ERR(sc->tp_format), if that is true, then an attempt was made to read the format file and failed, in which case dump the raw arg list values. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ls7pmdqb2xy9339vdburwvnk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf evlist: Rename variable in perf_mmap__read()Wang Nan
In perf_mmap__read(), give better names to pointers. Original name 'old' and 'head' directly related to pointers in ring buffer control page. For backward ring buffer, the meaning of 'head' point is not 'the first byte of free space', but 'the first byte of the last record'. To reduce confusion, rename 'old' to 'start', 'head' to 'end'. 'start' -> 'end' is the direction the records should be read from. Change parameter order. Change 'overwrite' to 'check_messup'. When reading from 'head', no need to check messup for for backward ring buffer. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1461723563-67451-3-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf evlist: Extract perf_mmap__read()Wang Nan
Extract event reader from perf_evlist__mmap_read() to perf__mmap_read(). Future commit will feed it with manually computed 'head' and 'old' pointers. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1461723563-67451-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf symbols: Fix kallsyms perf test on ppc64leNaveen N. Rao
ppc64le functions have a Global Entry Point (GEP) and a Local Entry Point (LEP). While placing a probe, we always prefer the LEP since it catches function calls through both the GEP and the LEP. In order to do this, we fixup the function entry points during elf symbol table lookup to point to the LEPs. This works, but breaks 'perf test kallsyms' since the symbols loaded from the symbol table (pointing to the LEP) do not match the symbols in kallsyms. To fix this, we do not adjust all the symbols during symbol table load. Instead, we note down st_other in a newly introduced arch-specific member of perf symbol structure, and later use this to adjust the probe trace point. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Mark Wielaard <mjw@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/6be7c2b17e370100c2f79dd444509df7929bdd3e.1460451721.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf powerpc: Fix kprobe and kretprobe handling with kallsyms on ppc64leNaveen N. Rao
So far, we used to treat probe point offsets as being offset from the LEP. However, userspace applications (objdump/readelf) always show disassembly and offsets from the function GEP. This is confusing to the user as we will end up probing at an address different from what the user expects when looking at the function disassembly with readelf/objdump. Fix this by changing how we modify probe address with perf. If only the function name is provided, we assume the user needs the LEP. Otherwise, if an offset is specified, we assume that the user knows the exact address to probe based on function disassembly, and so we just place the probe from the GEP offset. Finally, kretprobe was also broken with kallsyms as we were trying to specify an offset. This patch also fixes that issue. Reported-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Mark Wielaard <mjw@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/75df860aad8216bf4b9bcd10c6351ecc0e3dee54.1460451721.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_comm into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_comm into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-8-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_thread into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_thread into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_socket into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_socket into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_dso into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_dso into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_sym into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_sym into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__has_parent into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__has_parent into struct perf_hpp_list. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__need_collapse into struct perf_hpp_listJiri Olsa
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__need_collapse into struct perf_hpp_list. Adding hists__has macro to easily access this info perf struct hists object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf tools powerpc: Add support for generating bpf prologueNaveen N. Rao
Generalize existing macros to serve the purpose. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Wang Nan <wangnan0@huawei.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1462461799-17518-1-git-send-email-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf trace: Do not show the runtime_ms for a thread when not collecting itArnaldo Carvalho de Melo
That field is only updated when we use the "sched:sched_stat_runtime" tracepoint, and that is only done so far when we use the '--stat' command line option, without it we get just zeros, confusing the users: Without this patch: # trace -a -s sleep 1 <SNIP> qemu-system-x86 (9931), 468 events, 9.6%, 0.000 msec syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) ---------- ------ --------- --------- --------- --------- ------ ppoll 98 982.374 0.000 10.024 29.983 12.65% write 34 0.401 0.005 0.012 0.027 5.49% ioctl 102 0.347 0.002 0.003 0.007 3.08% firefox (10871), 1856 events, 38.2%, 0.000 msec (msec) (msec) (msec) (msec) (%) ---------- ------ --------- --------- --------- --------- ------ poll 395 934.873 0.000 2.367 17.120 11.51% recvmsg 395 0.988 0.001 0.003 0.021 4.20% read 106 0.460 0.002 0.004 0.007 3.17% futex 24 0.108 0.001 0.004 0.010 10.05% mmap 2 0.041 0.016 0.021 0.026 23.92% write 6 0.027 0.004 0.004 0.005 2.52% After this patch that ', 0.000 msecs' gets suppressed when --stat is not in use. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-p7emqrsw7900tdkg43v9l1e1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf trace: Sort syscalls stats by msecs in --summaryArnaldo Carvalho de Melo
# trace -a -s sleep 1 <SNIP> Xorg (1965), 788 events, 19.0%, 0.000 msec syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ select 89 731.038 0.000 8.214 175.218 36.71% ioctl 22 0.661 0.010 0.030 0.072 10.43% writev 42 0.253 0.002 0.006 0.011 5.94% recvmsg 60 0.185 0.001 0.003 0.009 5.90% setitimer 60 0.127 0.001 0.002 0.006 6.14% read 52 0.102 0.001 0.002 0.005 8.55% rt_sigprocmask 45 0.092 0.001 0.002 0.023 23.65% poll 12 0.021 0.001 0.002 0.003 7.21% epoll_wait 12 0.019 0.001 0.002 0.002 2.71% firefox (10871), 1080 events, 26.1%, 0.000 msec syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ poll 240 979.562 0.000 4.082 17.132 11.33% recvmsg 240 0.532 0.001 0.002 0.007 3.69% read 60 0.303 0.003 0.005 0.029 8.50% Suggested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-52kdkuyxihq0kvc0n2aalhay@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf trace: Sort summary output by number of eventsArnaldo Carvalho de Melo
# trace -a -s sleep 1 |& grep events | tail gmain (1733), 34 events, 1.0%, 0.000 msec hexchat (9765), 46 events, 1.4%, 0.000 msec ssh (11109), 80 events, 2.4%, 0.000 msec sleep (32631), 81 events, 2.4%, 0.000 msec qemu-system-x86 (10021), 272 events, 8.2%, 0.000 msec Xorg (1965), 322 events, 9.7%, 0.000 msec SoftwareVsyncTh (10922), 366 events, 11.1%, 0.000 msec gnome-shell (2231), 446 events, 13.5%, 0.000 msec qemu-system-x86 (9931), 468 events, 14.1%, 0.000 msec firefox (10871), 1098 events, 33.2%, 0.000 msec [root@jouet ~]# Suggested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ye4cnprhfeiq32ar4lt60dqs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf tools: Add template for generating rbtree resort classArnaldo Carvalho de Melo
Sometimes we want to sort an existing rbtree by a different key, introduce a template for that, that needs only to be provided the rbtree root and the number of entries in it. To do that a new rbtree will be created with extra space for each entry, where possibly pre-calculated keys will be stored to be used in the resort process and also later, when using the newly sorted rbtree. Please check the following two changesets to see it in use for resorting stats for threads and its syscalls in 'perf trace --summary'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-9l6e1q34lmf3wwdeewstyakg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf machine: Introduce number of threads memberArnaldo Carvalho de Melo
To be used, for instance, for pre-allocating an rb_tree array for sorting by other keys besides the current pid one. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ja0ifkwue7ttjhbwijn6g6eu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf tests: Do not use sizeof on pointer typeVaishali Thakkar
Using sizeof on a malloced pointer type will return the wordsize which can often cause one to allocate a buffer much smaller than it is needed. So, here do not use sizeof on pointer type. Note that this has no effect on runtime because 'dsos' is a pointer to a pointer. Problem found using Coccinelle. Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1461862017-23358-1-git-send-email-vaishali.thakkar@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>