summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | tracing: Make tracing_set_clock() non-staticTom Zanussi2018-03-102-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow tracing code outside of trace.c to access tracing_set_clock(). Some applications may require a particular clock in order to function properly, such as latency calculations. Also, add an accessor returning the current clock string. Link: http://lkml.kernel.org/r/6d1c53e9ee2163f54e1849f5376573f54f0e6009.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add 'last error' error facility for hist triggersTom Zanussi2018-03-101-14/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of variables and actions, it's become necessary to provide more detailed error information to users about syntax errors. Add a 'last error' facility accessible via the erroring event's 'hist' file. Reading the hist file after an error will display more detailed information about what went wrong, if information is available. This extended error information will be available until the next hist trigger command for that event. # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger echo: write error: Invalid argument # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist ERROR: Couldn't yyy: zzz Last command: xxx Also add specific error messages for variable and action errors. Link: http://lkml.kernel.org/r/64e9c422fc8aeafcc2f7a3b4328c0cffe7969129.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add hist trigger support for variable reference aliasesTom Zanussi2018-03-101-7/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for alias=$somevar where alias can be used as onmatch.xxx($alias). Aliases are a way of creating a new name for an existing variable, for flexibly in making naming more clear in certain cases. For example in the below the user perhaps feels that using $new_lat in the synthetic event invocation is opaque or doesn't fit well stylistically with previous triggers, so creates an alias of $new_lat named $latency and uses that in the call instead: # echo 'hist:keys=next_pid:new_lat=common_timestamp.usecs' > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=pid:latency=$new_lat: onmatch(sched.sched_switch).wake2($latency,pid)' > /sys/kernel/debug/tracing/events/synthetic/wake1/trigger Link: http://lkml.kernel.org/r/ef20a65d921af3a873a6f1e8c71407c926d5586f.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add cpu field for hist triggersTom Zanussi2018-03-101-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common key to use in a histogram is the cpuid - add a new cpu 'synthetic' field named 'cpu' for that purpose. Link: http://lkml.kernel.org/r/89537645bfc957e0d76e2cacf5f0ada88691a6cc.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Allow whitespace to surround hist trigger filterTom Zanussi2018-03-101-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code only allows for one space before and after the 'if' specifying the filter for a hist trigger. Add code to make that more permissive as far as whitespace goes. Specifically, we want to allow spaces in the trigger itself now that we have additional syntax (onmatch/onmax) where spaces are more natural e.g. spaces after commas in param lists. Link: http://lkml.kernel.org/r/1053090c3c308d4f431accdeb59dff4b511d4554.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add 'onmax' hist trigger action supportTom Zanussi2018-03-101-35/+296
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an 'onmax(var).save(field,...)' hist trigger action which is invoked whenever an event exceeds the current maximum. The end result is that the trace event fields or variables specified as the onmax.save() params will be saved if 'var' exceeds the current maximum for that hist trigger entry. This allows context from the event that exhibited the new maximum to be saved for later reference. When the histogram is displayed, additional fields displaying the saved values will be printed. As an example the below defines a couple of hist triggers, one for sched_wakeup and another for sched_switch, keyed on pid. Whenever a sched_wakeup occurs, the timestamp is saved in the entry corresponding to the current pid, and when the scheduler switches back to that pid, the timestamp difference is calculated. If the resulting latency exceeds the current maximum latency, the specified save() values are saved: # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ if comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:\ wakeup_lat=common_timestamp.usecs-$ts0:\ onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \ if next_comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger When the histogram is displayed, the max value and the saved values corresponding to the max are displayed following the rest of the fields: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist { next_pid: 3728 } hitcount: 199 \ max: 123 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/3 { next_pid: 3730 } hitcount: 1321 \ max: 15 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/1 { next_pid: 3729 } hitcount: 1973\ max: 25 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/0 Totals: Hits: 3493 Entries: 3 Dropped: 0 Link: http://lkml.kernel.org/r/006907f71b1e839bb059337ec3c496f84fcb71de.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add 'onmatch' hist trigger action supportTom Zanussi2018-03-101-13/+475
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an 'onmatch(matching.event).<synthetic_event_name>(param list)' hist trigger action which is invoked with the set of variables or event fields named in the 'param list'. The result is the generation of a synthetic event that consists of the values contained in those variables and/or fields at the time the invoking event was hit. As an example the below defines a simple synthetic event using a variable defined on the sched_wakeup_new event, and shows the event definition with unresolved fields, since the sched_wakeup_new event with the testpid variable hasn't been defined yet: # echo 'wakeup_new_test pid_t pid; int prio' >> \ /sys/kernel/debug/tracing/synthetic_events # cat /sys/kernel/debug/tracing/synthetic_events wakeup_new_test pid_t pid; int prio The following hist trigger both defines a testpid variable and specifies an onmatch() trace action that uses that variable along with a non-variable field to generate a wakeup_new_test synthetic event whenever a sched_wakeup_new event occurs, which because of the 'if comm == "cyclictest"' filter only happens when the executable is cyclictest: # echo 'hist:testpid=pid:keys=$testpid:\ onmatch(sched.sched_wakeup_new).wakeup_new_test($testpid, prio) \ if comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger Creating and displaying a histogram based on those events is now just a matter of using the fields and new synthetic event in the tracing/events/synthetic directory, as usual: # echo 'hist:keys=pid,prio:sort=pid,prio' >> \ /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger Link: http://lkml.kernel.org/r/8c2a574bcb7530c876629c901ecd23911b14afe8.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add support for 'field variables'Tom Zanussi2018-03-101-1/+530
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users should be able to directly specify event fields in hist trigger 'actions' rather than being forced to explicitly create a variable for that purpose. Add support allowing fields to be used directly in actions, which essentially does just that - creates 'invisible' variables for each bare field specified in an action. If a bare field refers to a field on another (matching) event, it even creates a special histogram for the purpose (since variables can't be defined on an existing histogram after histogram creation). Here's a simple example that demonstrates both. Basically the onmatch() action creates a list of variables corresponding to the parameters of the synthetic event to be generated, and then uses those values to generate the event. So for the wakeup_latency synthetic event 'call' below the first param, $wakeup_lat, is a variable defined explicitly on sched_switch, where 'next_pid' is just a normal field on sched_switch, and prio is a normal field on sched_waking. Since the mechanism works on variables, those two normal fields just have 'invisible' variables created internally for them. In the case of 'prio', which is on another event, we actually need to create an additional hist trigger and define the invisible variable on that, since once a hist trigger is defined, variables can't be added to it later. echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> /sys/kernel/debug/tracing/synthetic_events echo 'hist:keys=pid:ts0=common_timestamp.usecs >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger Link: http://lkml.kernel.org/r/8e8dcdac1ea180ed7a3689e1caeeccede9dc42b3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add support for 'synthetic' eventsTom Zanussi2018-03-101-2/+893
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synthetic events are user-defined events generated from hist trigger variables saved from one or more other events. To define a synthetic event, the user writes a simple specification consisting of the name of the new event along with one or more variables and their type(s), to the tracing/synthetic_events file. For instance, the following creates a new event named 'wakeup_latency' with 3 fields: lat, pid, and prio: # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \ /sys/kernel/debug/tracing/synthetic_events Reading the tracing/synthetic_events file lists all the currently-defined synthetic events, in this case the event we defined above: # cat /sys/kernel/debug/tracing/synthetic_events wakeup_latency u64 lat; pid_t pid; int prio At this point, the synthetic event is ready to use, and a histogram can be defined using it: # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \ /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger The new event is created under the tracing/events/synthetic/ directory and looks and behaves just like any other event: # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency enable filter format hist id trigger Although a histogram can be defined for it, nothing will happen until an action tracing that event via the trace_synth() function occurs. The trace_synth() function is very similar to all the other trace_* invocations spread throughout the kernel, except in this case the trace_ function and its corresponding tracepoint isn't statically generated but defined by the user at run-time. How this can be automatically hooked up via a hist trigger 'action' is discussed in a subsequent patch. Link: http://lkml.kernel.org/r/c68df2284b7d172669daf9be29db62ad49bbc559.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [fix noderef.cocci warnings, sizeof pointer for kcalloc of event->fields] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add hist trigger action hookTom Zanussi2018-03-101-2/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a hook for executing extra actions whenever a histogram entry is added or updated. The default 'action' when a hist entry is added to a histogram is to update the set of values associated with it. Some applications may want to perform additional actions at that point, such as generate another event, or compare and save a maximum. Add a simple framework for doing that; specific actions will be implemented on top of it in later patches. Link: http://lkml.kernel.org/r/9482ba6a3eaf5ca6e60954314beacd0e25c05b24.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add variable reference handling to hist triggersTom Zanussi2018-03-104-16/+656
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the necessary infrastructure to allow the variables defined on one event to be referenced in another. This allows variables set by a previous event to be referenced and used in expressions combining the variable values saved by that previous event and the event fields of the current event. For example, here's how a latency can be calculated and saved into yet another variable named 'wakeup_lat': # echo 'hist:keys=pid,prio:ts0=common_timestamp ... # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ... In the first event, the event's timetamp is saved into the variable ts0. In the next line, ts0 is subtracted from the second event's timestamp to produce the latency. Further users of variable references will be described in subsequent patches, such as for instance how the 'wakeup_lat' variable above can be displayed in a latency histogram. Link: http://lkml.kernel.org/r/b1d3e6975374e34d501ff417c20189c3f9b2c7b8.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add hist_field 'type' fieldTom Zanussi2018-03-101-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Future support for synthetic events requires hist_field 'type' information, so add a field for that. Also, make other hist_field attribute usage consistent (size, is_signed, etc). Link: http://lkml.kernel.org/r/3fd12a2e86316b05151ba0d7c68268e780af2c9d.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Pass tracing_map_elt to hist_field accessor functionsTom Zanussi2018-03-101-34/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some accessor functions, such as for variable references, require access to a corrsponding tracing_map_elt. Add a tracing_map_elt param to the function signature and update the accessor functions accordingly. Link: http://lkml.kernel.org/r/e0f292b068e9e4948da1d5af21b5ae0efa9b5717.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Generalize per-element hist trigger dataTom Zanussi2018-03-101-33/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, hist triggers only needed per-element support for saving 'comm' data, which was saved directly as a private data pointer. In anticipation of the need to save other data besides 'comm', add a new hist_elt_data struct for the purpose, and switch the current 'comm'-related code over to that. Link: http://lkml.kernel.org/r/4502c338c965ddf5fc19fb1ec4764391e001ed4b.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add simple expression support to hist triggersTom Zanussi2018-03-101-74/+413
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for simple addition, subtraction, and unary expressions (-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in order to support a minimal set of useful inter-event calculations. These operations are needed for calculating latencies between events (timestamp1-timestamp0) and for combined latencies (latencies over 3 or more events). In the process, factor out some common code from key and value parsing. Link: http://lkml.kernel.org/r/9a9308ead4fe32a433d9c7e95921fb798394f6b2.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [kbuild test robot fix, add static to parse_atom()] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [ Replaced '//' comments with normal /* */ comments ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Move get_hist_field_flags()Tom Zanussi2018-03-101-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move get_hist_field_flags() to make it more easily accessible for new code (and keep the move separate from new functionality). Link: http://lkml.kernel.org/r/32470f0a7047ec7a6e84ba5ec89d6142cc6ede7d.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Account for variables in named trigger compatibilityTom Zanussi2018-03-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Named triggers must also have the same set of variables in order to be considered compatible - update the trigger match test to account for that. The reason for this requirement is that named triggers with variables are meant to allow one or more events to set the same variable. Link: http://lkml.kernel.org/r/a17eae6328a99917f9d5c66129c9fcd355279ee9.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add variable support to hist triggersTom Zanussi2018-03-101-39/+331
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for saving the value of a current event's event field by assigning it to a variable that can be read by a subsequent event. The basic syntax for saving a variable is to simply prefix a unique variable name not corresponding to any keyword along with an '=' sign to any event field. Both keys and values can be saved and retrieved in this way: # echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ... # echo 'hist:timer_pid=common_pid:key=$timer_pid ...' If a variable isn't a key variable or prefixed with 'vals=', the associated event field will be saved in a variable but won't be summed as a value: # echo 'hist:keys=next_pid:ts1=common_timestamp:... Multiple variables can be assigned at the same time: # echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=common_timestamp,b=field1 ... Multiple (or single) variables can also be assigned at the same time using separate assignments: # echo 'hist:keys=pid:vals=$ts0:ts0=common_timestamp:b=field1:c=field2 ... Variables set as above can be used by being referenced from another event, as described in a subsequent patch. Link: http://lkml.kernel.org/r/fc93c4944d9719dbcb1d0067be627d44e98e2adc.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add usecs modifier for hist trigger timestampsTom Zanussi2018-03-103-8/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Appending .usecs onto a common_timestamp field will cause the timestamp value to be in microseconds instead of the default nanoseconds. A typical latency histogram using usecs would look like this: # echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs ... # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 ... This also adds an external trace_clock_in_ns() to trace.c for the timestamp conversion. Link: http://lkml.kernel.org/r/4e813705a170b3e13e97dc3135047362fb1a39f3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add hist_data member to hist_fieldTom Zanussi2018-03-101-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow hist_data access via hist_field. Some users of hist_fields require or will require more access to the associated hist_data. Link: http://lkml.kernel.org/r/d04cd0768f5228ebb4ac0ba4a847bc4d14d4826f.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add per-element variable support to tracing_mapTom Zanussi2018-03-102-0/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to allow information to be passed between trace events, add support for per-element variables to tracing_map. This provides a means for histograms to associate a value or values with an entry when it's saved or updated, and retrieved by a subsequent event occurrences. Variables can be set using tracing_map_set_var() and read using tracing_map_read_var(). tracing_map_var_set() returns true or false depending on whether or not the variable has been set or not, which is important for event-matching applications. tracing_map_read_var_once() reads the variable and resets it to the 'unset' state, implementing read-once variables, which are also important for event-matching uses. Link: http://lkml.kernel.org/r/7fa001108252556f0c6dd9d63145eabfe3370d1a.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add hist trigger timestamp supportTom Zanussi2018-03-101-23/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a timestamp event field. This is actually a 'pseudo-' event field in that it behaves like it's part of the event record, but is really part of the corresponding ring buffer event. To make use of the timestamp field, users can specify "common_timestamp" as a field name for any histogram. Note that this doesn't make much sense on its own either as either a key or value, but needs to be supported even so, since follow-on patches will add support for making use of this field in time deltas. The common_timestamp 'field' is not a bona fide event field - so you won't find it in the event description - but rather it's a synthetic field that can be used like a real field. Note that the use of this field requires the ring buffer be put into 'absolute timestamp' mode, which saves the complete timestamp for each event rather than an offset. This mode will be enabled if and only if a histogram makes use of the "common_timestamp" field. Link: http://lkml.kernel.org/r/97afbd646ed146e26271f3458b4b33e16d7817c2.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> [kasan use-after-free fix] Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Break out hist trigger assignment parsingTom Zanussi2018-03-101-21/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will make it easier to add variables, and makes the parsing code cleaner regardless. Link: http://lkml.kernel.org/r/e574b3291bbe15e35a4dfc87e5395aa715701c98.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add ring buffer event param to hist field functionsTom Zanussi2018-03-101-15/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some events such as timestamps require access to a ring_buffer_event struct; add a param so that hist field functions can access that. Link: http://lkml.kernel.org/r/2ff4af18e72b6002eb86b26b2a7f39cef7d1dfe4.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Give event triggers access to ring_buffer_eventTom Zanussi2018-03-103-26/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ring_buffer event can provide a timestamp that may be useful to various triggers - pass it into the handlers for that purpose. Link: http://lkml.kernel.org/r/6de592683b59fa70ffa5d43d0109896623fc1367.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add timestamp_mode trace fileTom Zanussi2018-03-101-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new option flag indicating whether or not the ring buffer is in 'absolute timestamp' mode. Currently this is only set/unset by hist triggers that make use of a common_timestamp. As such, there's no reason to make this writeable for users - its purpose is only to allow users to determine unequivocally whether or not the ring buffer is in that mode (although absolute timestamps can coexist with the normal delta timestamps, when the ring buffer is in absolute mode, timestamps written while absolute mode is in effect take up more space in the buffer, and are not as efficient). Link: http://lkml.kernel.org/r/e8aa7b1cde1cf15014e66545d06ac6ef2ebba456.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMPTom Zanussi2018-03-101-28/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RINGBUF_TYPE_TIME_STAMP is defined but not used, and from what I can gather was reserved for something like an absolute timestamp feature for the ring buffer, if not a complete replacement of the current time_delta scheme. This code redefines RINGBUF_TYPE_TIME_STAMP to implement absolute time stamps. Another way to look at it is that it essentially forces extended time_deltas for all events. The motivation for doing this is to enable time_deltas that aren't dependent on previous events in the ring buffer, making it feasible to use the ring_buffer_event timetamps in a more random-access way, for purposes other than serial event printing. To set/reset this mode, use tracing_set_timestamp_abs() from the previous interface patch. Link: http://lkml.kernel.org/r/477b362dba1ce7fab9889a1a8e885a62c472f041.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | ring-buffer: Add interface for setting absolute time stampsTom Zanussi2018-03-103-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function, tracing_set_time_stamp_abs(), which can be used to enable or disable the use of absolute timestamps rather than time deltas for a trace array. Only the interface is added here; a subsequent patch will add the underlying implementation. Link: http://lkml.kernel.org/r/ce96119de44c7fe0ee44786d15254e9b493040d3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Remove code which merges duplicatesVedang Patel2018-03-103-95/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now have the logic to detect and remove duplicates in the tracing_map hash table. The code which merges duplicates in the histogram is redundant now. So, modify this code just to detect duplicates. The duplication detection code is still kept to ensure that any rare race condition which might cause duplicates does not go unnoticed. Link: http://lkml.kernel.org/r/55215cf59e2674391bdaf772fdafc4c393352b03.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | | | | tracing: Add support to detect and avoid duplicatesVedang Patel2018-03-101-5/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A duplicate in the tracing_map hash table is when 2 different entries have the same key and, as a result, the key_hash. This is possible due to a race condition in the algorithm. This race condition is inherent to the algorithm and not a bug. This was fine because, until now, we were only interested in the sum of all the values related to a particular key (the duplicates are dealt with in tracing_map_sort_entries()). But, with the inclusion of variables[1], we are interested in individual values. So, it will not be clear what value to choose when there are duplicates. So, the duplicates need to be removed. The duplicates can occur in the code in the following scenarios: - A thread is in the process of adding a new element. It has successfully executed cmpxchg() and inserted the key. But, it is still not done acquiring the trace_map_elt struct, populating it and storing the pointer to the struct in the value field of tracing_map hash table. If another thread comes in at this time and wants to add an element with the same key, it will not see the current element and add a new one. - There are multiple threads trying to execute cmpxchg at the same time, one of the threads will succeed and the others will fail. The ones which fail will go ahead increment 'idx' and add a new element there creating a duplicate. This patch detects and avoids the first condition by asking the thread which detects the duplicate to loop one more time. There is also a possibility of infinite loop if the thread which is trying to insert goes to sleep indefinitely and the one which is trying to insert a new element detects a duplicate. Which is why, the thread loops for map_size iterations before returning NULL. The second scenario is avoided by preventing the threads which failed cmpxchg() from incrementing idx. This way, they will loop around and check if the thread which succeeded in executing cmpxchg() had the same key. [1] http://lkml.kernel.org/r/cover.1498510759.git.tom.zanussi@linux.intel.com Link: http://lkml.kernel.org/r/e178e89ec399240331d383bd5913d649713110f4.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2018-04-092-14/+22
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) The sockmap code has to free socket memory on close if there is corked data, from John Fastabend. 2) Tunnel names coming from userspace need to be length validated. From Eric Dumazet. 3) arp_filter() has to take VRFs properly into account, from Miguel Fadon Perlines. 4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti. 5) Missing idr_remove() in u32_delete_key(), from Cong Wang. 6) More syzbot stuff. Several use of uninitialized value fixes all over, from Eric Dumazet. 7) Do not leak kernel memory to userspace in sctp, also from Eric Dumazet. 8) Discard frames from unused ports in DSA, from Andrew Lunn. 9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas Falcon. 10) Do not access dp83640 PHY registers prematurely after reset, from Esben Haabendal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vhost-net: set packet weight of tx polling to 2 * vq size net: thunderx: rework mac addresses list to u64 array inetpeer: fix uninit-value in inet_getpeer dp83640: Ensure against premature access to PHY registers after reset devlink: convert occ_get op to separate registration ARM: dts: ls1021a: Specify TBIPA register address net/fsl_pq_mdio: Allow explicit speficition of TBIPA address ibmvnic: Do not reset CRQ for Mobility driver resets ibmvnic: Fix failover case for non-redundant configuration ibmvnic: Fix reset scheduler error handling ibmvnic: Zero used TX descriptor counter on reset ibmvnic: Fix DMA mapping mistakes tipc: use the right skb in tipc_sk_fill_sock_diag() sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6 net: dsa: Discard frames from unused ports sctp: do not leak kernel memory to user space soreuseport: initialise timewait reuseport field ipv4: fix uninit-value in ip_route_output_key_hash_rcu() dccp: initialize ireq->ir_mark net: fix uninit-value in __hw_addr_add_ex() ...
| * \ \ \ \ \ \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2018-04-082-14/+22
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2018-04-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Two sockmap fixes: i) fix a potential warning when a socket with pending cork data is closed by freeing the memory right when the socket is closed instead of seeing still outstanding memory at garbage collector time, ii) fix a NULL pointer deref in case of duplicates release calls, so make sure to only reset the sk_prot pointer when it's in a valid state to do so, both from John. 2) Fix a compilation warning in bpf_prog_attach_check_attach_type() by moving the function under CONFIG_CGROUP_BPF ifdef since only used there, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | kernel/bpf/syscall: fix warning defined but not usedAnders Roxell2018-04-041-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There will be a build warning -Wunused-function if CONFIG_CGROUP_BPF isn't defined, since the only user is inside #ifdef CONFIG_CGROUP_BPF: kernel/bpf/syscall.c:1229:12: warning: ‘bpf_prog_attach_check_attach_type’ defined but not used [-Wunused-function] static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Current code moves function bpf_prog_attach_check_attach_type inside ifdef CONFIG_CGROUP_BPF. Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * | | | | | | | bpf: sockmap, duplicates release calls may NULL sk_protJohn Fastabend2018-04-041-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to have multiple ULP tcp_release call paths in flight if a sock is closed and simultaneously being removed from the sockmap control path. The result would be setting the sk_prot to the saved values on the first iteration and then on the second iteration setting the value to NULL. This patch resolves this by ensuring we only reset the sk_prot pointer if we have a valid saved state to set. Fixes: 4f738adba30a7 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * | | | | | | | bpf: sockmap, free memory on sock close with cork dataJohn Fastabend2018-04-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a socket with pending cork data is closed we do not return the memory to the socket until the garbage collector free's the psock structure. The garbage collector though can run after the sock has completed its close operation. If this ordering happens the sock code will through a WARN_ON because there is still outstanding memory accounted to the sock. To resolve this ensure we return memory to the sock when a socket is closed. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: 91843d540a13 ("bpf: sockmap, add msg_cork_bytes() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | | | | | | | | Merge branch 'next-general' of ↵Linus Torvalds2018-04-071-3/+3
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull general security layer updates from James Morris: - Convert security hooks from list to hlist, a nice cleanup, saving about 50% of space, from Sargun Dhillon. - Only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill (as the secid can be determined from the cred), from Stephen Smalley. - Close a potential race in kernel_read_file(), by making the file unwritable before calling the LSM check (vs after), from Kees Cook. * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: convert security hooks to use hlist exec: Set file unwritable before LSM check usb, signal, security: only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill
| * \ \ \ \ \ \ \ \ \ Merge tag 'v4.16-rc6' into next-generalJames Morris2018-03-2322-98/+149
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge to Linux 4.16-rc6 at the request of Jarkko, for his TPM updates.
| * | | | | | | | | | | usb, signal, security: only pass the cred, not the secid, to ↵Stephen Smalley2018-03-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kill_pid_info_as_cred and security_task_kill commit d178bc3a708f39cbfefc3fab37032d3f2511b4ec ("user namespace: usb: make usb urbs user namespace aware (v2)") changed kill_pid_info_as_uid to kill_pid_info_as_cred, saving and passing a cred structure instead of uids. Since the secid can be obtained from the cred, drop the secid fields from the usb_dev_state and async structures, and drop the secid argument to kill_pid_info_as_cred. Replace the secid argument to security_task_kill with the cred. Update SELinux, Smack, and AppArmor to use the cred, which avoids the need for Smack and AppArmor to use a secid at all in this hook. Further changes to Smack might still be required to take full advantage of this change, since it should now be possible to perform capability checking based on the supplied cred. The changes to Smack and AppArmor have only been compile-tested. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.morris@microsoft.com>
* | | | | | | | | | | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2018-04-061-0/+1
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull fw_cfg, vhost updates from Michael Tsirkin: "This cleans up the qemu fw cfg device driver. On top of this, vmcore is dumped there on crash to help debugging with kASLR enabled. Also included are some fixes in vhost" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: add vsock compat ioctl vhost: fix vhost ioctl signature to build with clang fw_cfg: write vmcoreinfo details crash: export paddr_vmcoreinfo_note() fw_cfg: add DMA register fw_cfg: add a public uapi header fw_cfg: handle fw_cfg_read_blob() error fw_cfg: remove inline from fw_cfg_read_blob() fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read fw_cfg: fix sparse warning reading FW_CFG_ID fw_cfg: fix sparse warnings with fw_cfg_file fw_cfg: fix sparse warnings in fw_cfg_sel_endianness() ptr_ring: fix build
| * | | | | | | | | | | | crash: export paddr_vmcoreinfo_note()Marc-André Lureau2018-03-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch is going to use the symbol from the fw_cfg module, to call the function and write the note location details in the vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note. CC: Andrew Morton <akpm@linux-foundation.org> CC: Hari Bathini <hbathini@linux.vnet.ibm.com> CC: Tony Luck <tony.luck@intel.com> CC: Vivek Goyal <vgoyal@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | | | | | | | | | | | Merge tag 'audit-pr-20180403' of ↵Linus Torvalds2018-04-065-52/+94
|\ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "We didn't have anything to send for v4.16, but we're back with a little more than usual for v4.17. Eleven patches in total, most fall into the small fix category, but there are three non-trivial changes worth calling out: - the audit entry filter is being removed after deprecating it for quite a while (years of no one really using it because it turns out to be not very practical) - created our own version of "__mutex_owner()" because the locking folks were upset we were using theirs - improved our handling of kernel command line parameters to make them more forgiving - we fixed auditing of symlink operations Everything passes the audit-testsuite and as of a few minutes ago it merges well with your tree" * tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: add refused symlink to audit_names audit: remove path param from link denied function audit: link denied should not directly generate PATH record audit: make ANOM_LINK obey audit_enabled and audit_dummy_context audit: do not panic on invalid boot parameter audit: track the owner of the command mutex ourselves audit: return on memory error to avoid null pointer dereference audit: bail before bug check if audit disabled audit: deprecate the AUDIT_FILTER_ENTRY filter audit: session ID should not set arch quick field pointer audit: update bugtracker and source URIs
| * | | | | | | | | | | | | audit: remove path param from link denied functionRichard Guy Briggs2018-03-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 45b578fe4c3cade6f4ca1fc934ce199afd857edc ("audit: link denied should not directly generate PATH record") the need for the struct path *link parameter was removed. Remove the now useless struct path argument. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: link denied should not directly generate PATH recordRichard Guy Briggs2018-03-081-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Audit link denied events generate duplicate PATH records which disagree in different ways from symlink and hardlink denials. audit_log_link_denied() should not directly generate PATH records. See: https://github.com/linux-audit/audit-kernel/issues/21 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: make ANOM_LINK obey audit_enabled and audit_dummy_contextRichard Guy Briggs2018-03-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Audit link denied events emit disjointed records when audit is disabled. No records should be emitted when audit is disabled. See: https://github.com/linux-audit/audit-kernel/issues/21 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: do not panic on invalid boot parameterGreg Edwards2018-03-061-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you pass in an invalid audit boot parameter value, e.g. "audit=off", the kernel panics very early in boot before the regular console is initialized. Unless you have earlyprintk enabled, there is no indication of what the problem is on the console. Convert the panic() calls to pr_err(), and leave auditing enabled if an invalid parameter value was passed in. Modify the parameter to also accept "on" or "off" as valid values, and update the documentation accordingly. Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: track the owner of the command mutex ourselvesPaul Moore2018-02-233-16/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Evidently the __mutex_owner() function was never intended for use outside the core mutex code, so build a thing locking wrapper around the mutex code which allows us to track the mutex owner. One, arguably positive, side effect is that this allows us to hide the audit_cmd_mutex inside of kernel/audit.c behind the lock/unlock functions. Reported-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: return on memory error to avoid null pointer dereferenceRichard Guy Briggs2018-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is a memory allocation error when trying to change an audit kernel feature value, the ignored allocation error will trigger a NULL pointer dereference oops on subsequent use of that pointer. Return instead. Passes audit-testsuite. See: https://github.com/linux-audit/audit-kernel/issues/76 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: not necessary (other funcs check for NULL), but a good practice] Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: bail before bug check if audit disabledRichard Guy Briggs2018-02-151-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If audit is disabled, who cares if there is a bug indicating syscall in process or names already recorded. Bail immediately on audit disabled. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: deprecate the AUDIT_FILTER_ENTRY filterRichard Guy Briggs2018-02-152-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The audit entry filter has been long deprecated with userspace support finally removed in audit-v2.6.7 and plans to remove kernel support have existed since kernel-v2.6.31. Remove it. Since removing the audit entry filter, test for early return before setting up any context state. Passes audit-testsuite. See: https://github.com/linux-audit/audit-kernel/issues/6 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | | | | | | | | | | audit: session ID should not set arch quick field pointerRichard Guy Briggs2018-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bug was introduced in 8fae47705685fcaa75a1fe4c8c3e18300a702979 ("audit: add support for session ID user filter") See: https://github.com/linux-audit/audit-kernel/issues/4 When setting a session ID filter, the session ID filter field overwrote the quick pointer reference to the arch field, potentially causing the arch field to be misinterpreted. Passes audit-testsuite. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
OpenPOWER on IntegriCloud