summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* timers: Reinitialize per cpu bases on hotplugThomas Gleixner2017-12-292-2/+17
| | | | | | | | | | | | | | | | | | | | | | | The timer wheel bases are not (re)initialized on CPU hotplug. That leaves them with a potentially stale clk and next_expiry valuem, which can cause trouble then the CPU is plugged. Add a prepare callback which forwards the clock, sets next_expiry to far in the future and reset the control flags to a known state. Set base->must_forward_clk so the first timer which is queued will try to forward the clock to current jiffies. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
* timers: Use deferrable base independent of base::nohz_activeAnna-Maria Gleixner2017-12-291-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | During boot and before base::nohz_active is set in the timer bases, deferrable timers are enqueued into the standard timer base. This works correctly as long as base::nohz_active is false. Once it base::nohz_active is set and a timer which was enqueued before that is accessed the lock selector code choses the lock of the deferred base. This causes unlocked access to the standard base and in case the timer is removed it does not clear the pending flag in the standard base bitmap which causes get_next_timer_interrupt() to return bogus values. To prevent that, the deferrable timers must be enqueued in the deferrable base, even when base::nohz_active is not set. Those deferrable timers also need to be expired unconditional. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Cc: rt@linutronix.de Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
* Merge tag 'pm-4.15-rc6' of ↵Linus Torvalds2017-12-292-1/+14
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "This fixes a schedutil cpufreq governor regression from the 4.14 cycle that may cause a CPU idleness check to return incorrect results in some cases which leads to suboptimal decisions (Joel Fernandes)" * tag 'pm-4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: schedutil: Use idle_calls counter of the remote CPU
| * cpufreq: schedutil: Use idle_calls counter of the remote CPUJoel Fernandes2017-12-282-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the recent remote cpufreq callback work, its possible that a cpufreq update is triggered from a remote CPU. For single policies however, the current code uses the local CPU when trying to determine if the remote sg_cpu entered idle or is busy. This is incorrect. To remedy this, compare with the nohz tick idle_calls counter of the remote CPU. Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks) Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Joel Fernandes <joelaf@google.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Merge tag 'trace-v4.15-rc4' of ↵Linus Torvalds2017-12-272-10/+15
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "While doing tests on tracing over the network, I found that the packets were getting corrupted. In the process I found three bugs. One was the culprit, but the other two scared me. After deeper investigation, they were not as major as I thought they were, due to a signed compared to an unsigned that prevented a negative number from doing actual harm. The two bigger bugs: - Mask the ring buffer data page length. There are data flags at the high bits of the length field. These were not cleared via the length function, and the length could return a negative number. (Although the number returned was unsigned, but was assigned to a signed number) Luckily, this value was compared to PAGE_SIZE which is unsigned and kept it from entering the path that could have caused damage. - Check the page usage before reusing the ring buffer reader page. TCP increments the page ref when passing the page off to the network. The page is passed back to the ring buffer for use on free. But the page could still be in use by the TCP stack. Minor bugs: - Related to the first bug. No need to clear out the unused ring buffer data before sending to user space. It is now done by the ring buffer code itself. - Reset pointers after free on error path. There were some cases in the error path that pointers were freed but not set to NULL, and could have them freed again, having a pointer freed twice" * tag 'trace-v4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix possible double free on failure of allocating trace buffer tracing: Fix crash when it fails to alloc ring buffer ring-buffer: Do no reuse reader page if still in use tracing: Remove extra zeroing out of the ring buffer page ring-buffer: Mask out the info bits when returning buffer page length
| * tracing: Fix possible double free on failure of allocating trace bufferSteven Rostedt (VMware)2017-12-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jing Xia and Chunyan Zhang reported that on failing to allocate part of the tracing buffer, memory is freed, but the pointers that point to them are not initialized back to NULL, and later paths may try to free the freed memory again. Jing and Chunyan fixed one of the locations that does this, but missed a spot. Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Cc: stable@vger.kernel.org Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: Jing Xia <jing.xia@spreadtrum.com> Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Fix crash when it fails to alloc ring bufferJing Xia2017-12-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Double free of the ring buffer happens when it fails to alloc new ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured. The root cause is that the pointer is not set to NULL after the buffer is freed in allocate_trace_buffers(), and the freeing of the ring buffer is invoked again later if the pointer is not equal to Null, as: instance_mkdir() |-allocate_trace_buffers() |-allocate_trace_buffer(tr, &tr->trace_buffer...) |-allocate_trace_buffer(tr, &tr->max_buffer...) // allocate fail(-ENOMEM),first free // and the buffer pointer is not set to null |-ring_buffer_free(tr->trace_buffer.buffer) // out_free_tr |-free_trace_buffers() |-free_trace_buffer(&tr->trace_buffer); //if trace_buffer is not null, free again |-ring_buffer_free(buf->buffer) |-rb_free_cpu_buffer(buffer->buffers[cpu]) // ring_buffer_per_cpu is null, and // crash in ring_buffer_per_cpu->pages Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Cc: stable@vger.kernel.org Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Signed-off-by: Jing Xia <jing.xia@spreadtrum.com> Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * ring-buffer: Do no reuse reader page if still in useSteven Rostedt (VMware)2017-12-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To free the reader page that is allocated with ring_buffer_alloc_read_page(), ring_buffer_free_read_page() must be called. For faster performance, this page can be reused by the ring buffer to avoid having to free and allocate new pages. The issue arises when the page is used with a splice pipe into the networking code. The networking code may up the page counter for the page, and keep it active while sending it is queued to go to the network. The incrementing of the page ref does not prevent it from being reused in the ring buffer, and this can cause the page that is being sent out to the network to be modified before it is sent by reading new data. Add a check to the page ref counter, and only reuse the page if it is not being used anywhere else. Cc: stable@vger.kernel.org Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Remove extra zeroing out of the ring buffer pageSteven Rostedt (VMware)2017-12-271-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | The ring_buffer_read_page() takes care of zeroing out any extra data in the page that it returns. There's no need to zero it out again from the consumer. It was removed from one consumer of this function, but read_buffers_splice_read() did not remove it, and worse, it contained a nasty bug because of it. Cc: stable@vger.kernel.org Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * ring-buffer: Mask out the info bits when returning buffer page lengthSteven Rostedt (VMware)2017-12-271-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two info bits were added to the "commit" part of the ring buffer data page when returned to be consumed. This was to inform the user space readers that events have been missed, and that the count may be stored at the end of the page. What wasn't handled, was the splice code that actually called a function to return the length of the data in order to zero out the rest of the page before sending it up to user space. These data bits were returned with the length making the value negative, and that negative value was not checked. It was compared to PAGE_SIZE, and only used if the size was less than PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an unsigned compare, meaning the negative size value did not end up causing a large portion of memory to be randomly zeroed out. Cc: stable@vger.kernel.org Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2017-12-231-2/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI preparatory patches from Thomas Gleixner: "Todays Advent calendar window contains twentyfour easy to digest patches. The original plan was to have twenty three matching the date, but a late fixup made that moot. - Move the cpu_entry_area mapping out of the fixmap into a separate address space. That's necessary because the fixmap becomes too big with NRCPUS=8192 and this caused already subtle and hard to diagnose failures. The top most patch is fresh from today and cures a brain slip of that tall grumpy german greybeard, who ignored the intricacies of 32bit wraparounds. - Limit the number of CPUs on 32bit to 64. That's insane big already, but at least it's small enough to prevent address space issues with the cpu_entry_area map, which have been observed and debugged with the fixmap code - A few TLB flush fixes in various places plus documentation which of the TLB functions should be used for what. - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for more than sysenter now and keeping the name makes backtraces confusing. - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(), which is only invoked on fork(). - Make vysycall more robust. - A few fixes and cleanups of the debug_pagetables code. Check PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the C89 initialization of the address hint array which already was out of sync with the index enums. - Move the ESPFIX init to a different place to prepare for PTI. - Several code moves with no functional change to make PTI integration simpler and header files less convoluted. - Documentation fixes and clarifications" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit init: Invoke init_espfix_bsp() from mm_init() x86/cpu_entry_area: Move it out of the fixmap x86/cpu_entry_area: Move it to a separate unit x86/mm: Create asm/invpcid.h x86/mm: Put MMU to hardware ASID translation in one place x86/mm: Remove hard-coded ASID limit checks x86/mm: Move the CR3 construction functions to tlbflush.h x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what x86/mm: Remove superfluous barriers x86/mm: Use __flush_tlb_one() for kernel memory x86/microcode: Dont abuse the TLB-flush interface x86/uv: Use the right TLB-flush API x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation x86/mm/64: Improve the memory map documentation x86/ldt: Prevent LDT inheritance on exec x86/ldt: Rework locking arch, mm: Allow arch_dup_mmap() to fail x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode ...
| * | arch, mm: Allow arch_dup_mmap() to failThomas Gleixner2017-12-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be allowed to fail. Fix up all instances. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: dan.j.williams@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon2017-12-173-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Note, this is a Git cherry-pick of the following commit: 506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-12-211-117/+166
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller" "What's a holiday weekend without some networking bug fixes? [1] 1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function calls, from Daniel Borkmann. 2) Fix regression from errata limiting change to marvell PHY driver, from Zhao Qiang. 3) Fix u16 overflow in SCTP, from Xin Long. 4) Fix potential memory leak during bridge newlink, from Nikolay Aleksandrov. 5) Fix BPF selftest build on s390, from Hendrik Brueckner. 6) Don't append to cfg80211 automatically generated certs file, always write new ones from scratch. From Thierry Reding. 7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai. 8) Fix hang on tg3 MTU change with certain chips, from Brian King. 9) Add stall detection to arc emac driver and reset chip when this happens, from Alexander Kochetkov. 10) Fix MTU limitng in GRE tunnel drivers, from Xin Long. 11) Fix stmmac timestamping bug due to mis-shifting of field. From Fredrik Hallenberg. 12) Fix metrics match when deleting an ipv4 route. The kernel sets some internal metrics bits which the user isn't going to set when it makes the delete request. From Phil Sutter. 13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix from Yelena Krivosheev. 14) Fix double free and memory corruption in get_net_ns_by_id, from Eric W. Biederman. 15) Flush ipv4 FIB tables in the reverse order. Some tables can share their actual backing data, in particular this happens for the MAIN and LOCAL tables. We have to kill the LOCAL table first, because it uses MAIN's backing memory. Fix from Ido Schimmel. 16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann Horn, and Alexei Starovoitov. 17) Make changes to ipv6 autoflowlabel sysctl really propagate to sockets, unless the socket has set the per-socket value explicitly. From Shaohua Li. 18) Fix leaks and double callback invocations of zerocopy SKBs, from Willem de Bruijn" [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) skbuff: skb_copy_ubufs must release uarg even without user frags skbuff: orphan frags before zerocopy clone net: reevalulate autoflowlabel setting after sysctl setting openvswitch: Fix pop_vlan action for double tagged frames ipv6: Honor specified parameters in fibmatch lookup bpf: do not allow root to mangle valid pointers selftests/bpf: add tests for recent bugfixes bpf: fix integer overflows bpf: don't prune branches when a scalar is replaced with a pointer bpf: force strict alignment checks for stack pointers bpf: fix missing error return in check_stack_boundary() bpf: fix 32-bit ALU op verification bpf: fix incorrect tracking of register size truncation bpf: fix incorrect sign extension in check_alu_op() bpf/verifier: fix bounds calculation on BPF_RSH ipv4: Fix use-after-free when flushing FIB tables s390/qeth: fix error handling in checksum cmd callback tipc: remove joining group member from congested list selftests: net: Adding config fragment CONFIG_NUMA=y nfp: bpf: keep track of the offloaded program ...
| * | bpf: do not allow root to mangle valid pointersAlexei Starovoitov2017-12-211-68/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not allow root to convert valid pointers into unknown scalars. In particular disallow: ptr &= reg ptr <<= reg ptr += ptr and explicitly allow: ptr -= ptr since pkt_end - pkt == length 1. This minimizes amount of address leaks root can do. In the future may need to further tighten the leaks with kptr_restrict. 2. If program has such pointer math it's likely a user mistake and when verifier complains about it right away instead of many instructions later on invalid memory access it's easier for users to fix their progs. 3. when register holding a pointer cannot change to scalar it allows JITs to optimize better. Like 32-bit archs could use single register for pointers instead of a pair required to hold 64-bit scalars. 4. reduces architecture dependent behavior. Since code: r1 = r10; r1 &= 0xff; if (r1 ...) will behave differently arm64 vs x64 and offloaded vs native. A significant chunk of ptr mangling was allowed by commit f1174f77b50c ("bpf/verifier: rework value tracking") yet some of it was allowed even earlier. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix integer overflowsAlexei Starovoitov2017-12-211-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: don't prune branches when a scalar is replaced with a pointerJann Horn2017-12-211-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This could be made safe by passing through a reference to env and checking for env->allow_ptr_leaks, but it would only work one way and is probably not worth the hassle - not doing it will not directly lead to program rejection. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: force strict alignment checks for stack pointersJann Horn2017-12-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Force strict alignment checks for stack pointers because the tracking of stack spills relies on it; unaligned stack accesses can lead to corruption of spilled registers, which is exploitable. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix missing error return in check_stack_boundary()Jann Horn2017-12-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent indirect stack accesses at non-constant addresses, which would permit reading and corrupting spilled pointers. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix 32-bit ALU op verificationJann Horn2017-12-211-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 32-bit ALU ops operate on 32-bit values and have 32-bit outputs. Adjust the verifier accordingly. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix incorrect tracking of register size truncationJann Horn2017-12-211-17/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Properly handle register truncation to a smaller size. The old code first mirrors the clearing of the high 32 bits in the bitwise tristate representation, which is correct. But then, it computes the new arithmetic bounds as the intersection between the old arithmetic bounds and the bounds resulting from the bitwise tristate representation. Therefore, when coerce_reg_to_32() is called on a number with bounds [0xffff'fff8, 0x1'0000'0007], the verifier computes [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number. This is incorrect: The truncated number could also be in the range [0, 7], and no meaningful arithmetic bounds can be computed in that case apart from the obvious [0, 0xffff'ffff]. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16996 for this issue. v2: - flip the mask during arithmetic bounds calculation (Ben Hutchings) v3: - add CVE number (Ben Hutchings) Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf: fix incorrect sign extension in check_alu_op()Jann Horn2017-12-211-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Distinguish between BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit) and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit); only perform sign extension in the first case. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16995 for this issue. v3: - add CVE number (Ben Hutchings) Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | bpf/verifier: fix bounds calculation on BPF_RSHEdward Cree2017-12-211-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incorrect signed bounds were being computed. If the old upper signed bound was positive and the old lower signed bound was negative, this could cause the new upper signed bound to be too low, leading to security issues. Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> [jannh@google.com: changed description to reflect bug impact] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2017-12-181-0/+6
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2017-12-17 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a corner case in generic XDP where we have non-linear skbs but enough tailroom in the skb to not miss to linearizing there, from Song. 2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when BPF context is not skb, from Daniel. 3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper call would use the wrong register for the skb, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_dataDaniel Borkmann2017-12-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some JITs don't cache skb context on stack in prologue, so when LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data() as true, then they temporarily save/restore skb pointer. However, the assumption that skb always has to be in r1 is a bit of a gamble. Right now it turned out to be true for all helpers listed in bpf_helper_changes_pkt_data(), but lets enforce that from verifier side, so that we make this a guarantee and bail out if the func proto is misconfigured in future helpers. In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data() is completely unrelevant here (since cBPF is context read-only) and therefore always false. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2017-12-171-12/+17
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single bugfix which prevents arbitrary sigev_notify values in posix-timers" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timer: Properly check sigevent->sigev_notify
| * | | posix-timer: Properly check sigevent->sigev_notifyThomas Gleixner2017-12-151-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timer_create() specifies via sigevent->sigev_notify the signal delivery for the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD and (SIGEV_SIGNAL | SIGEV_THREAD_ID). The sanity check in good_sigevent() is only checking the valid combination for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is not set it accepts any random value. This has no real effects on the posix timer and signal delivery code, but it affects show_timer() which handles the output of /proc/$PID/timers. That function uses a string array to pretty print sigev_notify. The access to that array has no bound checks, so random sigev_notify cause access beyond the array bounds. Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID masking from various code pathes as SIGEV_NONE can never be set in combination with SIGEV_THREAD_ID. Reported-by: Eric Biggers <ebiggers3@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: stable@vger.kernel.org
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-12-152-7/+14
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot. 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik Brueckner. 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes Berg. 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin. 5) Don't advertise gigabit in sh_eth when not available, from Thomas Petazzoni. 6) Check network namespace when delivering to netlink taps, from Kevin Cernekee. 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam. 8) Use correct address in TCP md5 lookups when replying to an incoming segment, from Christoph Paasch. 9) Add schedule points to BPF map alloc/free, from Eric Dumazet. 10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also from Eric Dumazet. 11) Fix SKB leak in tipc, from Jon Maloy. 12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz. 13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn. 14) Add some new qmi_wwan device IDs, from Daniele Palmas. 15) Fix static key imbalance in ingress qdisc, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: qcom/emac: Reduce timeout for mdio read/write net: sched: fix static key imbalance in case of ingress/clsact_init error net: sched: fix clsact init error path ip_gre: fix wrong return value of erspan_rcv net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support pkt_sched: Remove TC_RED_OFFLOADED from uapi net: sched: Move to new offload indication in RED net: sched: Add TCA_HW_OFFLOAD net: aquantia: Increment driver version net: aquantia: Fix typo in ethtool statistics names net: aquantia: Update hw counters on hw init net: aquantia: Improve link state and statistics check interval callback net: aquantia: Fill in multicast counter in ndev stats from hardware net: aquantia: Fill ndev stat couters from hardware net: aquantia: Extend stat counters to 64bit values net: aquantia: Fix hardware DMA stream overload on large MRRS net: aquantia: Fix actual speed capabilities reporting sock: free skb in skb_complete_tx_timestamp on error s390/qeth: update takeover IPs after configuration change s390/qeth: lock IP table while applying takeover changes ...
| * | | bpf: add schedule points to map alloc/freeEric Dumazet2017-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While using large percpu maps, htab_map_alloc() can hold cpu for hundreds of ms. This patch adds cond_resched() calls to percpu alloc/free call sites, all running in process context. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | bpf: fix corruption on concurrent perf_event_output callsDaniel Borkmann2017-12-121-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When tracing and networking programs are both attached in the system and both use event-output helpers that eventually call into perf_event_output(), then we could end up in a situation where the tracing attached program runs in user context while a cls_bpf program is triggered on that same CPU out of softirq context. Since both rely on the same per-cpu perf_sample_data, we could potentially corrupt it. This can only ever happen in a combination of the two types; all tracing programs use a bpf_prog_active counter to bail out in case a program is already running on that CPU out of a different context. XDP and cls_bpf programs by themselves don't have this issue as they run in the same context only. Therefore, split both perf_sample_data so they cannot be accessed from each other. Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Reported-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2017-12-152-627/+38
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - Fix a S390 boot hang that was caused by the lock-break logic. Remove lock-break to begin with, as review suggested it was unreasonably fragile and our confidence in its continued good health is lower than our confidence in its removal. - Remove the lockdep cross-release checking code for now, because of unresolved false positive warnings. This should make lockdep work well everywhere again. - Get rid of the final (and single) ACCESS_ONCE() straggler and remove the API from v4.15. - Fix a liblockdep build warning" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/lib/lockdep: Add missing declaration of 'pr_cont()' checkpatch: Remove ACCESS_ONCE() warning compiler.h: Remove ACCESS_ONCE() tools/include: Remove ACCESS_ONCE() tools/perf: Convert ACCESS_ONCE() to READ_ONCE() locking/lockdep: Remove the cross-release locking checks locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
| * | | | locking/lockdep: Remove the cross-release locking checksIngo Molnar2017-12-121-617/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. If we disable cross-release by default but keep the code upstream then in practice the most likely outcome is that we'll allow the situation to degrade gradually, by allowing entropy to introduce more and more false positives, until it overwhelms maintenance capacity. Another bad side effect was that people were trying to work around the false positives by uglifying/complicating unrelated code. There's a marked difference between annotating locking operations and uglifying good code just due to bad lock debugging code ... This gradual decrease in quality happened to a number of debugging facilities in the kernel, and lockdep is pretty complex already, so we cannot risk this outcome. Either cross-release checking can be done right with no false positives, or it should not be included in the upstream kernel. ( Note that it might make sense to maintain it out of tree and go through the false positives every now and then and see whether new bugs were introduced. ) Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=yWill Deacon2017-12-121-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock field which is used to implement raw_spin_is_contended() by setting the field to 1 when waiting on a lock and clearing it to zero when holding a lock. However, there are a few problems with this approach: - There is a write-write race between a CPU successfully taking the lock (and subsequently writing break_lock = 0) and a waiter waiting on the lock (and subsequently writing break_lock = 1). This could result in a contended lock being reported as uncontended and vice-versa. - On machines with store buffers, nothing guarantees that the writes to break_lock are visible to other CPUs at any particular time. - READ_ONCE/WRITE_ONCE are not used, so the field is potentially susceptible to harmful compiler optimisations, Consequently, the usefulness of this field is unclear and we'd be better off removing it and allowing architectures to implement raw_spin_is_contended() by providing a definition of arch_spin_is_contended(), as they can when CONFIG_GENERIC_LOCKBREAK=n. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAKWill Deacon2017-12-121-4/+4
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: a8a217c22116 ("locking/core: Remove {read,spin,write}_can_lock()") removed the definition of raw_spin_can_lock(), causing the GENERIC_LOCKBREAK spin_lock() routines to poll the ->break_lock field when waiting on a lock. This has been reported to cause a deadlock during boot on s390, because the ->break_lock field is also set by the waiters, and can potentially remain set indefinitely if no other CPUs come in to take the lock after it has been released. This patch removes the explicit spinning on ->break_lock from the waiters, instead relying on the outer trylock() operation to determine when the lock is available. Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: a8a217c22116 ("locking/core: Remove {read,spin,write}_can_lock()") Link: http://lkml.kernel.org/r/1511894539-7988-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2017-12-152-12/+18
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a crash fix for an ARM SoC platform, and kernel-doc warnings fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Do not pull from current CPU if only one CPU to pull sched/core: Fix kernel-doc warnings after code movement
| * | | | sched/rt: Do not pull from current CPU if only one CPU to pullSteven Rostedt2017-12-151-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/core: Fix kernel-doc warnings after code movementRandy Dunlap2017-12-111-11/+11
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following kernel-doc warnings after code restructuring: ../kernel/sched/core.c:5113: warning: No description found for parameter 't' ../kernel/sched/core.c:5113: warning: Excess function parameter 'interval' description in 'sched_rr_get_interval' get rid of set_fs()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: abca5fc535a3e ("sched_rr_get_interval(): move compat to native, Link: http://lkml.kernel.org/r/995c6ded-b32e-bbe4-d9f5-4d42d121aff1@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge tag 'trace-v4.15-rc1' of ↵Linus Torvalds2017-12-144-32/+20
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fix-ups: - comment fixes - build fix - better memory alloction (don't use NR_CPUS) - configuration fix - build warning fix - enhanced callback parameter (to simplify users of trace hooks) - give up on stack tracing when RCU isn't watching (it's a lost cause)" * tag 'trace-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have stack trace not record if RCU is not watching tracing: Pass export pointer as argument to ->write() ring-buffer: Remove unused function __rb_data_page_index() tracing: make PREEMPTIRQ_EVENTS depend on TRACING tracing: Allocate mask_str buffer dynamically tracing: always define trace_{irq,preempt}_{enable_disable} tracing: Fix code comments in trace.c
| * | | | tracing: Have stack trace not record if RCU is not watchingSteven Rostedt (VMware)2017-12-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stack tracer records a stack dump whenever it sees a stack usage that is more than what it ever saw before. This can happen at any function that is being traced. If it happens when the CPU is going idle (or other strange locations), RCU may not be watching, and in this case, the recording of the stack trace will trigger a warning. There's been lots of efforts to make hacks to allow stack tracing to proceed even if RCU is not watching, but this only causes more issues to appear. Simply do not trace a stack if RCU is not watching. It probably isn't a bad stack anyway. Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | tracing: Pass export pointer as argument to ->write()Felipe Balbi2017-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By passing an export descriptor to the write function, users don't need to keep a global static pointer and can rely on container_of() to fetch their own structure. Link: http://lkml.kernel.org/r/20170602102025.5140-1-felipe.balbi@linux.intel.com Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Chunyan Zhang <zhang.chunyan@linaro.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | ring-buffer: Remove unused function __rb_data_page_index()Matthias Kaehlcke2017-12-041-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following warning when building with clang: kernel/trace/ring_buffer.c:1842:1: error: unused function '__rb_data_page_index' [-Werror,-Wunused-function] Link: http://lkml.kernel.org/r/20170518001415.5223-1-mka@chromium.org Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | tracing: make PREEMPTIRQ_EVENTS depend on TRACINGArnd Bergmann2017-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_TRACING is disabled, the new preemptirq events tracer produces a build failure: In file included from kernel/trace/trace_irqsoff.c:17:0: kernel/trace/trace.h: In function 'trace_test_and_set_recursion': kernel/trace/trace.h:542:28: error: 'struct task_struct' has no member named 'trace_recursion' Adding an explicit dependency avoids the broken configuration. Link: http://lkml.kernel.org/r/20171103104031.270375-1-arnd@arndb.de Fixes: d59158162e03 ("tracing: Add support for preempt and irq enable/disable events") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | tracing: Allocate mask_str buffer dynamicallyChangbin Du2017-12-041-20/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default NR_CPUS can be very large, but actual possible nr_cpu_ids usually is very small. For my x86 distribution, the NR_CPUS is 8192 and nr_cpu_ids is 4. About 2 pages are wasted. Most machines don't have so many CPUs, so define a array with NR_CPUS just wastes memory. So let's allocate the buffer dynamically when need. With this change, the mutext tracing_cpumask_update_lock also can be removed now, which was used to protect mask_str. Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com Fixes: 36dfe9252bd4c ("ftrace: make use of tracing_cpumask") Cc: stable@vger.kernel.org Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | | | tracing: Fix code comments in trace.cChunyu Hu2017-12-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Naming in code comments for tracing_snapshot, tracing_snapshot_alloc and trace_pid_filter_add_remove_task don't match the real function names. And latency_trace has been removed from tracing directory. Fix them. Link: http://lkml.kernel.org/r/1508394753-20887-1-git-send-email-chuhu@redhat.com Fixes: cab5037 ("tracing/ftrace: Enable snapshot function trigger") Fixes: 886b5b7 ("tracing: remove /debug/tracing/latency_trace") Signed-off-by: Chunyu Hu <chuhu@redhat.com> [ Replaced /sys/kernel/debug/tracing with /sys/kerne/tracing ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | | | arch: define weak abort()Sudip Mukherjee2017-12-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc toggle -fisolate-erroneous-paths-dereference (default at -O2 onwards) isolates faulty code paths such as null pointer access, divide by zero etc. If gcc port doesnt implement __builtin_trap, an abort() is generated which causes kernel link error. In this case, gcc is generating abort due to 'divide by zero' in lib/mpi/mpih-div.c. Currently 'frv' and 'arc' are failing. Previously other arch was also broken like m32r was fixed by commit d22e3d69ee1a ("m32r: fix build failure"). Let's define this weak function which is common for all arch and fix the problem permanently. We can even remove the arch specific 'abort' after this is done. Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker2017-12-142-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | kcov: fix comparison callback signatureDmitry Vyukov2017-12-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a silly copy-paste bug. We truncated u32 args to u16. Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com Fixes: ded97d2c2b2c ("kcov: support comparison operands collection") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: syzkaller@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-4.15-fixes' of ↵Linus Torvalds2017-12-112-4/+8
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Prateek posted a couple patches to fix a deadlock involving cpuset and workqueue. It unfortunately caused a different deadlock and the recent workqueue hotplug simplification removed the original deadlock, so Prateek's two patches are reverted for now. - The new stat code was missing u64_stats initialization. Fixed. - Doc and other misc changes * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: add warning about RT not being supported on cgroup2 Revert "cgroup/cpuset: remove circular dependency deadlock" Revert "cpuset: Make cpuset hotplug synchronous" cgroup: properly init u64_stats debug cgroup: use task_css_set instead of rcu_dereference cpuset: Make cpuset hotplug synchronous cgroup/cpuset: remove circular dependency deadlock
| * | | | | Revert "cgroup/cpuset: remove circular dependency deadlock"Tejun Heo2017-12-041-30/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit aa24163b2ee5c92120e32e99b5a93143a0f4258e. This and the following commit led to another circular locking scenario and the scenario which is fixed by this commit no longer exists after e8b3f8db7aad ("workqueue/hotplug: simplify workqueue_offline_cpu()") which removes work item flushing from hotplug path. Revert it for now. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | Revert "cpuset: Make cpuset hotplug synchronous"Tejun Heo2017-12-043-20/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 1599a185f0e6113be185b9fb809c621c73865829. This and the previous commit led to another circular locking scenario and the scenario which is fixed by this commit no longer exists after e8b3f8db7aad ("workqueue/hotplug: simplify workqueue_offline_cpu()") which removes work item flushing from hotplug path. Revert it for now. Signed-off-by: Tejun Heo <tj@kernel.org>
OpenPOWER on IntegriCloud