summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* perf evlist: Factor out a function to propagate maps for a single evselAdrian Hunter2015-09-151-22/+27
| | | | | | | | | | | | Subsequent fixes will need a function that just propagates maps for a single evsel so factor it out. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-11-git-send-email-adrian.hunter@intel.com [ Moved them to before perf_evlist__add() to avoid having to move it in the next patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Make create_maps() use set_maps()Adrian Hunter2015-09-151-9/+10
| | | | | | | | | | | Since there is a function to set maps, perf_evlist__create_maps() should use it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Make set_maps() more resilientAdrian Hunter2015-09-151-4/+15
| | | | | | | | | | | | Make perf_evlist__set_maps() more resilient by allowing for the possibility that one or another of the maps isn't being changed and therefore should not be "put". Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-9-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Add own_cpus memberAdrian Hunter2015-09-154-3/+8
| | | | | | | | | | | | perf_evlist__propagate_maps() cannot easily tell if an evsel has its own cpu map. To make that simpler, keep a copy of the PMU cpu map and adjust the propagation logic accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-8-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Fix missing thread_map__put in propagate_maps()Adrian Hunter2015-09-151-0/+1
| | | | | | | | | | | | | perf_evlist__propagate_maps() incorrectly assumes evsel->threads is NULL before reassigning it, but it won't be NULL when perf_evlist__set_maps() is used to set different (or NULL) maps. Thus thread_map__put must be used, which works even if evsel->threads is NULL. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-7-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Fix splice_list_tail() not setting evlistAdrian Hunter2015-09-153-12/+9
| | | | | | | | | | | | | | | | Commit d49e46950772 ("perf evsel: Add a backpointer to the evlist a evsel is in") updated perf_evlist__add() but not perf_evlist__splice_list_tail(). This illustrates that it is better if perf_evlist__splice_list_tail() calls perf_evlist__add() instead of duplicating the logic, so do that. This will also simplify a subsequent fix for propagating maps. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Add has_user_cpus memberAdrian Hunter2015-09-152-5/+7
| | | | | | | | | | | | | | Subsequent patches will need to call perf_evlist__propagate_maps without reference to a "target". Add evlist->has_user_cpus to record whether the user has specified which cpus to target (and therefore whether that list of cpus should override the default settings for a selected event i.e. the cpu maps should be propagated) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Remove redundant validation from propagate_maps()Adrian Hunter2015-09-152-16/+10
| | | | | | | | | | | | | | | | | The validation checks that the values that were just assigned, got assigned i.e. the error can't ever happen. Subsequent patches will call this code in places where errors are not being returned. Changing those code paths to return this non-existent error is counter-productive, so just remove it. That in turn results in perf_evlist__set_maps not needing to return an error, but callers aren't checking it either, so remove that too. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Simplify set_maps() logicAdrian Hunter2015-09-151-6/+2
| | | | | | | | | | | Don't need to check for NULL when "putting" evlist->maps and evlist->threads because the "put" functions already do that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Simplify propagate_maps() logicAdrian Hunter2015-09-151-3/+2
| | | | | | | | | | | | If evsel->cpus is to be reassigned then the current value must be "put", which works even if it is NULL. Simplify the current logic by moving the "put" next to the assignment. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1441699142-18905-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf top: Fix segfault pressing -> with no hist entriesWang Nan2015-09-141-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf top' segfaults with following operation: # perf top -e page-faults -p 11400 # 11400 never generates page-fault Then on the resulting empty interface, press right key: # ./perf top -e page-faults -p 11400 perf: Segmentation fault -------- backtrace -------- ./perf[0x535428] /lib64/libc.so.6(+0x3545f)[0x7f0dd360745f] ./perf[0x531d46] ./perf(perf_evlist__tui_browse_hists+0x96)[0x5340d6] ./perf[0x44ba2f] /lib64/libpthread.so.0(+0x81d0)[0x7f0dd49dc1d0] /lib64/libc.so.6(clone+0x6c)[0x7f0dd36b90dc] The bug resides in perf_evsel__hists_browse() that, in the above circumstance browser->selection can be NULL, but code after skip_annotation doesn't consider it. This patch fix it by checking browser->selection before fetching browser->selection->map. Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1442226235-117265-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2015-09-141-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fix from Arnaldo Carvalho de Melo: - The values of _SC_NPROCESSORS_CONF and _SC_NPROCESSORS_ONLN (sysconf(3)) were being read from perf.data files in the inverse order they are written, fix it. (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf header: Fixup reading of HEADER_NRCPUS featureArnaldo Carvalho de Melo2015-09-131-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original patch introducing this header wrote the number of CPUs available and online in one order and then swapped those values when reading, fix it. Before: # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 # echo 0 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 3 # echo 0 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 2 After the fix, bringing back the CPUs online: # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 2 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 3 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)") Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf/x86/intel: Fix constraint accessPeter Zijlstra2015-09-131-1/+4
| | | | | | | | | | | | | | | Sasha reported that we can get here with .idx==-1, and cpuc->event_constraints unallocated. Suggested-by: Stephane Eranian <eranian@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Fixes: b371b5943178 ("perf/x86: Fix event/group validation") Signed-off-by: Ingo Molnar <mingo@kernel.org>
* perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the ↵Alexander Shishkin2015-09-111-0/+1
| | | | | | | | | | | | | | | | | | new logic Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of the trace, do so also in the intel_bts driver. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2015-09-044-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - In some cases where perf_event.fork.{pid,tid} should be used we were instead using perf_event.comm.{pid,tid}, which is not a problem for for the 'pid' case, that sits in the same place in these union_perf_event members, but comm.tid sits where fork.ppid is, oops. These cases were considered as (potentially) problematic: - 'perf script' with !sample_id_all, i.e. only non old kernels without perf_event_attr.sample_id_all. - intel_pt could be affected when decoding without timestamps, as the exit event is only used to flush out data which anyway gets flushed at the end of the session. - intel_bts also uses the exit event to flush data which would probably not cause errors as it would get flushed at the end of the session instead. Fix it. (Adrian Hunter) - Due to relaxing the compiler checks for bison generated files, we missed updating one parse_events_add_pmu() caller when this function had its prototype changed, fix it. (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf tools: Fix use of wrong event when processing exit eventsAdrian Hunter2015-09-023-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a couple of cases the 'comm' member of 'union event' has been used instead of the correct member ('fork') when processing exit events. In the cases where it has been used incorrectly, only the 'pid' and 'tid' are affected. The 'pid' value would be correct anyway because it is in the same position in 'comm' and 'fork' events, but the 'tid' would have been incorrectly assigned from 'ppid'. However, for exit events, the kernel puts the current task in the 'ppid' and 'ttid' which is the same as the exiting task. That is 'ppid' == 'pid' and if the task is not multi-threaded, 'pid' == 'tid' i.e. the data goes wrong only when tracing multi-threaded programs. It is hard to find an example of how this would produce an error in practice. There are 3 occurences of the fix: 1. perf script is only affected if !sample_id_all which only happens on old kernels. 2. intel_pt is only affected when decoding without timestamps and would probably still decode correctly - the exit event is only used to flush out data which anyway gets flushed at the end of the session 3. intel_bts also uses the exit event to flush data which would probably not cause errors as it would get flushed at the end of the session instead Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1439888825-27708-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf tools: Fix parse_events_add_pmu callerJiri Olsa2015-09-021-1/+1
|/ | | | | | | | | | | | | | | | | | | Following commit changed parse_events_add_pmu interface: 36adec85a86f perf tools: Change parse_events_add_pmu interface but forgot to change one caller. Because of lessen compilation rules for the bison parser, the compiler did not warn on that. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Raphael Beamonte <raphael.beamonte@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Fixes: 36adec85a86f ("perf tools: Change parse_events_add_pmu interface") Link: http://lkml.kernel.org/r/1441180605-24737-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2015-09-027-26/+31
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix link time error with sample_reg_masks on non-x86. (Stephane Eranian) - Fix potential array out of bounds access. (Wang Nan) - Fix Intel PT instruction decoder dependency problem. (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf tools: Fix link time error with sample_reg_masks on non x86Stephane Eranian2015-09-013-23/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes perf compile on non x86 platforms by defining a weak symbol for sample_reg_masks[] in util/perf_regs.c. The patch also moves the REG() and REG_END() macros into the util/per_regs.h header file. The macros are renamed to SMPL_REG/SMPL_REG_END to avoid clashes with other header files. Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441099814-26783-1-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf build: Fix Intel PT instruction decoder dependency problemWang Nan2015-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I hit following building error randomly: ... /bin/sh: /path/to/kernel/buildperf/util/intel-pt-decoder/inat-tables.c: No such file or directory ... LINK /path/to/kernel/buildperf/plugin_mac80211.so LINK /path/to/kernel/buildperf/plugin_kmem.so LINK /path/to/kernel/buildperf/plugin_xen.so LINK /path/to/kernel/buildperf/plugin_hrtimer.so In file included from util/intel-pt-decoder/intel-pt-insn-decoder.c:25:0: util/intel-pt-decoder/inat.c:24:25: fatal error: inat-tables.c: No such file or directory #include "inat-tables.c" ^ compilation terminated. make[4]: *** [/path/to/kernel/buildperf/util/intel-pt-decoder/intel-pt-insn-decoder.o] Error 1 make[4]: *** Waiting for unfinished jobs.... LINK /path/to/kernel/buildperf/plugin_function.so This is caused by tools/perf/util/intel-pt-decoder/Build that, it tries to generate $(OUTPUT)util/intel-pt-decoder/inat-tables.c atomatically but forget to ensure the existance of $(OUTPUT)util/intel-pt-decoder directory. This patch fixes it by adding $(call rule_mkdir) like other similar rules. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1441087005-107540-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf dwarf: Fix potential array out of bounds accessWang Nan2015-09-013-3/+3
|/ | | | | | | | | | | | | | | | | | | There is a problem in the dwarf-regs.c files for sh, sparc and x86 where it is possible to make an out-of-bounds array access when searching for register names. This patch fixes it by replacing '<=' to '<', so when register (number == XXX_MAX_REGS), get_arch_regstr() will return NULL. Signed-off-by: Wang Nan <wangnan0@huawei.com> Reviewed-by: Matt Fleming <matt@console-pimps.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David S. Miller <davem@davemloft.net> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@huawei.com Link: http://lkml.kernel.org/r/1441078184-105038-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar2015-09-0117-13/+193
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Add ability to specify to select which registers to record, to reduce the size of perf.data files, and also allow printing the registers in 'perf script': (Stephane Eranian) # perf record --intr-regs=AX,SP usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (8 samples) ] # perf script -F ip,sym,iregs | tail -5 ffffffff8105f42a native_write_msr_safe AX:0xf SP:0xffff8802629c3c00 ffffffff8105f42a native_write_msr_safe AX:0xf SP:0xffff8802629c3c00 ffffffff81761ac0 _raw_spin_lock AX:0xffff8801bfcf8020 SP:0xffff8802629c3ce8 ffffffff81202bf8 __vma_adjust_trans_huge AX:0x7ffc75200000 SP:0xffff8802629c3b30 ffffffff8122b089 dput AX:0x101 SP:0xffff8802629c3c78 # Infrastructure changes: - Open event on evsel cpus and threads. (Kan Liang) - Add new bpf API to get name from a BPF object. (Wang Nan) Build fixes: - Fix build on powerpc broken by pt/bts. (Adrian Hunter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf record: Add ability to name registers to recordStephane Eranian2015-08-317-5/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the -I/--int-regs option to enablepassing the name of the registers to sample on interrupt. Registers can be specified by their symbolic names. For instance on x86, --intr-regs=ax,si. The motivation is to reduce the size of the perf.data file and the overhead of sampling by only collecting the registers useful to a specific analysis. For instance, for value profiling, sampling only the registers used to passed arguements to functions. With no parameter, the --intr-regs still records all possible registers based on the architecture. To name registers, it is necessary to use the long form of the option, i.e., --intr-regs: $ perf record --intr-regs=si,di,r8,r9 ..... To record any possible registers: $ perf record -I ..... $ perf report --intr-regs ... To display the register, one can use perf report -D To list the available registers: $ perf record --intr-regs=\? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf/x86: Add list of register namesStephane Eranian2015-08-313-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a way to locate a register identifier (PERF_X86_REG_*) based on its name, e.g., AX. This will be used by a subsequent patch to improved flexibility of perf record. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441039273-16260-3-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf script: Enable printing of interrupted machine stateStephane Eranian2015-08-312-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the output of the interrupted machine state (iregs) to perf script. It presents them as NAME:VALUE so this is easy to parse during post processing. To capture the interrupted machine state: $ perf record -I .... to display iregs, use the -F option: $ perf script -F ip,iregs 40afc2 AX:0x6c5770 BX:0x1e CX:0x5f4d80a DX:0x101010101010101 SI:0x1 Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441039273-16260-2-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf evlist: Open event on evsel cpus and threadsKan Liang2015-08-312-1/+5
| | | | | | | | | | | | | | | | | | | | | | An evsel may have different cpus and threads than the evlist it is in. Use it's own cpus and threads, when opening the evsel in 'perf record'. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1440138194-17001-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * bpf tools: New API to get name from a BPF objectWang Nan2015-08-313-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch there's no way to connect a loaded bpf object to its source file. However, during applying perf's '--filter' to BPF object, without this connection makes things harder, because perf loads all programs together, but '--filter' setting is for each object. The API of bpf_object__open_buffer() is changed to allow passing a name. Fortunately, at this time there's only one user of it (perf test LLVM), so we change it together. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1440742821-44548-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf tools: Fix build on powerpc broken by pt/btsAdrian Hunter2015-08-312-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is theoretically possible to process perf.data files created on x86 and that contain Intel PT or Intel BTS data, on any other architecture, which is why it is possible for there to be build errors on powerpc caused by pt/bts. The errors were: util/intel-pt-decoder/intel-pt-insn-decoder.c: In function ‘intel_pt_insn_decoder’: util/intel-pt-decoder/intel-pt-insn-decoder.c:138:3: error: switch missing default case [-Werror=switch-default] switch (insn->immediate.nbytes) { ^ cc1: all warnings being treated as errors linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_synth_branch_sample': sources/linux-acme.git/tools/perf/util/intel-pt.c:871: undefined reference to `tsc_to_perf_time' linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_sample': sources/linux-acme.git/tools/perf/util/intel-pt.c:915: undefined reference to `tsc_to_perf_time' sources/linux-acme.git/tools/perf/util/intel-pt.c:962: undefined reference to `tsc_to_perf_time' linux-acme.git/tools/perf/perf-obj/libperf.a(libperf-in.o): In function `intel_pt_process_event': sources/linux-acme.git/tools/perf/util/intel-pt.c:1454: undefined reference to `perf_time_to_tsc' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1441046384-28663-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds2015-08-318-75/+62
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull NOHZ updates from Ingo Molnar: "The main changes, mostly written by Frederic Weisbecker, include: - Fix some jiffies based cputime assumptions. (No real harm because the concerned code isn't used by full dynticks.) - Simplify jiffies <-> usecs conversions. Remove dead code. - Remove early hacks on nohz full code that avoided messing up idle nohz internals. Now nohz integrates well full and idle and such hack have become needless. - Restart nohz full tick from irq exit. (A simplification and a preparation for future optimization on scheduler kick to nohz full) - Code cleanups. - Tile driver isolation enhancement on top of nohz. (Chris Metcalf)" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: Remove useless argument on tick_nohz_task_switch() nohz: Move tick_nohz_restart_sched_tick() above its users nohz: Restart nohz full tick from irq exit nohz: Remove idle task special case nohz: Prevent tilegx network driver interrupts alpha: Fix jiffies based cputime assumption apm32: Fix cputime == jiffies assumption jiffies: Remove HZ > USEC_PER_SEC special case
| * \ Merge branch 'timers/nohz-for-tip' of ↵Ingo Molnar2015-07-318-75/+62
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz Pull NOHZ updates from Frederic Weisbecker: - Fix some jiffies based cputime assumptions. No real harm because the concerned code isn't used by full dynticks. - Simplify jiffies <-> usecs conversions. Remove dead code. - Remove early hacks on nohz full code that avoided messing up idle nohz internals. Now nohz integrates well full and idle and such hack have become needless. - Restart nohz full tick from irq exit. A simplification and a preparation for future optimization on scheduler kick to nohz full. - Simple code cleanups. - Tile driver isolation enhancement on top of nohz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | nohz: Remove useless argument on tick_nohz_task_switch()Frederic Weisbecker2015-07-293-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Leftover from early code. Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | nohz: Move tick_nohz_restart_sched_tick() above its usersFrederic Weisbecker2015-07-291-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the function declaration/definition dance. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | nohz: Restart nohz full tick from irq exitFrederic Weisbecker2015-07-292-32/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restart the tick when necessary from the irq exit path. It makes nohz full more flexible, simplify the related IPIs and doesn't bring significant overhead on irq exit. In a longer term view, it will allow us to piggyback the nohz kick on the scheduler IPI in the future instead of sending a dedicated IPI that often doubles the scheduler IPI on task wakeup. This will require more changes though including careful review of resched_curr() callers to include nohz full needs. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | nohz: Remove idle task special caseFrederic Weisbecker2015-07-291-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On nohz full early days, idle dynticks and full dynticks weren't well integrated and we couldn't risk full dynticks calls on idle without risking messing up tick idle statistics. This is why we prevented such thing to happen. Nowadays full dynticks and idle dynticks are better integrated and interact without known issue. So lets remove that. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | nohz: Prevent tilegx network driver interruptsChris Metcalf2015-07-292-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the tilegx networking shim sends irqs to all the cores to distribute the load of processing incoming-packet interrupts, so that you can get to multiple Gb's of traffic inbound. However, in nohz_full mode we don't want to interrupt the nohz_full cores by default, so we limit the set of cores we use to only the online housekeeping cores. To make client code easier to read, we introduce a new nohz_full accessor, housekeeping_cpumask(), which returns a pointer to the housekeeping_mask if nohz_full is enabled, and otherwise returns the cpu_possible_mask. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | alpha: Fix jiffies based cputime assumptionFrederic Weisbecker2015-07-291-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That code wrongly assumes that cputime_t wraps jiffies_t. Lets use the correct accessors/mutators. In practice there should be no harm yet because alpha currently only support tick based cputime accounting which is always jiffies based. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc; John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | apm32: Fix cputime == jiffies assumptionFrederic Weisbecker2015-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That code wrongly assumes that cputime_t wraps jiffies_t. Lets use the correct accessors/mutators. No real harm now as that code can't be used with full dynticks. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc; John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | * | jiffies: Remove HZ > USEC_PER_SEC special caseFrederic Weisbecker2015-07-292-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HZ never goes much further 1000 and a bit. And if we ever reach one tick per microsecond, we might be having a problem. Lets stop maintaining this special case, just leave a paranoid check. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc; John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2015-08-311-0/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "This is a leftover scheduler fix from the v4.2 cycle" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix cpu_active_mask/cpu_online_mask race
| * | | | sched: Fix cpu_active_mask/cpu_online_mask raceJan H. Schönherr2015-08-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race condition in SMP bootup code, which may result in WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() or kernel BUG at kernel/smpboot.c:135! It can be triggered with a bit of luck in Linux guests running on busy hosts. CPU0 CPUn ==== ==== _cpu_up() __cpu_up() start_secondary() set_cpu_online() cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); cpu_notify(CPU_ONLINE) <do stuff, see below> cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); During the various CPU_ONLINE callbacks CPUn is online but not active. Several things can go wrong at that point, depending on the scheduling of tasks on CPU0. Variant 1: cpu_notify(CPU_ONLINE) workqueue_cpu_up_callback() rebind_workers() set_cpus_allowed_ptr() This call fails because it requires an active CPU; rebind_workers() ends with a warning: WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() Variant 2: cpu_notify(CPU_ONLINE) smpboot_thread_call() smpboot_unpark_threads() .. __kthread_unpark() __kthread_bind() wake_up_state() .. select_task_rq() select_fallback_rq() The ->wake_cpu of the unparked thread is not allowed, making a call to select_fallback_rq() necessary. Then, select_fallback_rq() cannot find an allowed, active CPU and promptly resets the allowed CPUs, so that the task in question ends up on CPU0. When those unparked tasks are eventually executed, they run immediately into a BUG: kernel BUG at kernel/smpboot.c:135! Just changing the order in which the online/active bits are set (and adding some memory barriers), would solve the two issues above. However, it would change the order of operations back to the one before commit 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()"), thus, reintroducing that particular problem. Going further back into history, we have at least the following commits touching this topic: - commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online") - commit 5fbd036b552f ("sched: Cleanup cpu_active madness") Together, these give us the following non-working solutions: - secondary CPU sets active before online, because active is assumed to be a subset of online; - secondary CPU sets online before active, because the primary CPU assumes that an online CPU is also active; - secondary CPU sets online and waits for primary CPU to set active, because it might deadlock. Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are active & online") introduces an arch-specific solution to this arch-independent problem. Now, go for a more general solution without explicit waiting and simply set active twice: once on the secondary CPU after online was set and once on the primary CPU after online was seen. set_cpus_allowed_ptr()") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Cc: Anton Blanchard <anton@samba.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wilson <msw@amazon.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()") Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2015-08-3137-959/+762
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...
| * | | | | sched/deadline: Fix comment in enqueue_task_dl()Andrea Parri2015-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "dl_boosted" flag is set by comparing *absolute* deadlines (c.f., rt_mutex_setprio()). Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438782979-9057-2-git-send-email-parri.andrea@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched/deadline: Fix comment in push_dl_tasks()Andrea Parri2015-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment is "misleading"; fix it by adapting a comment from push_rt_tasks(). Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438782979-9057-1-git-send-email-parri.andrea@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched: Change the sched_class::set_cpus_allowed() calling contextPeter Zijlstra2015-08-123-81/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the calling context of sched_class::set_cpus_allowed() such that we can assume the task is inactive. This allows us to easily make changes that affect accounting done by enqueue/dequeue. This does in fact completely remove set_cpus_allowed_rt() and greatly reduces set_cpus_allowed_dl(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.667516139@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched: Make sched_class::set_cpus_allowed() unconditionalPeter Zijlstra2015-08-127-18/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Give every class a set_cpus_allowed() method, this enables some small optimization in the RT,DL implementation by avoiding a double cpumask_weight() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.614517487@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched: Fix a race between __kthread_bind() and sched_setaffinity()Peter Zijlstra2015-08-125-18/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY without locks, a caller might observe an old value and race with the set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo it: __kthread_bind() do_set_cpus_allowed() <SYSCALL> sched_setaffinity() if (p->flags & PF_NO_SETAFFINITIY) set_cpus_allowed_ptr() p->flags |= PF_NO_SETAFFINITY Fix the bug by putting everything under the regular scheduler locks. This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched: Ensure a task has a non-normalized vruntime when returning back to CFSByungchul Park2015-08-121-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code ensures that a task has a normalized vruntime when switching away from the fair class, but it does not ensure the task has a non-normalized vruntime when switching back to the fair class. This is an example breaking this consistency: 1. a task is in fair class and !queued 2. changes its class to RT class (still !queued) 3. changes its class to fair class again (still !queued) Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439197375-27927-1-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched/numa: Fix NUMA_DIRECT topology identificationAravind Gopalakrishnan2015-08-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Systems which have all nodes at a distance of at most 1 hop should be identified as 'NUMA_DIRECT'. However, the scheduler incorrectly identifies it as 'NUMA_BACKPLANE'. This is because 'n' is assigned to sched_max_numa_distance but the code (mis)interprets it to mean 'number of hops'. Rik had actually used sched_domains_numa_levels for detecting a 'NUMA_DIRECT' topology: http://marc.info/?l=linux-kernel&m=141279712429834&w=2 But that was changed when he removed the hops table in the subsequent version: http://marc.info/?l=linux-kernel&m=141353106106771&w=2 Fixing the issue here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439256048-3748-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | tile: Reorganize _switch_to()Chris Metcalf2015-08-082-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the simulator bits into finish_arch_post_lock_switch() and properly call __switch_to() from _switch_to(). Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: <efault@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438783412-10990-1-git-send-email-cmetcalf@ezchip.com [ Made it a delta to: fe363adb9225 ("sched, tile: Remove finish_arch_switch"). ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
OpenPOWER on IntegriCloud