summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* s390/dasd: Simplify code in format logicJan Höppner2016-03-071-27/+6
| | | | | | | | | | | | | | | | | Currently, dasd_format is calling the format logic of a DASD discipline with PAV enabled. If that fails with an error code of -EAGAIN the value of retries is decremented and the discipline function is called with PAV turned off. The loop is supposed to try this up to 255 times until success. However, -EAGAIN can only occur once here and therefore the loop will never reach the 255 retries. So, replace the unnecessarily complicated loop logic and simply try again without PAV enabled in case of an -EAGAIN error. Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dasd: Improve dasd format codeJan Höppner2016-03-072-58/+59
| | | | | | | | | | | | | - Make sure a calling function can rely on data in fdata by resetting to its initial values - Move special treatment for track 0 and 1 to dasd_eckd_build_format - Replace dangerous backward goto with a loop logic - Add define for number that specifies the maximum amount of CCWs per request and is used for format_step calculation - Remove unused variable Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/percpu: remove this_cpu_cmpxchg_double_4Heiko Carstens2016-03-021-1/+0
| | | | | | | | | | | | | | | | | git commit 26f15caaf993 ("s390/cmpxchg: simplify cmpxchg_double") removed support for cmpxchg_double for two consecutive four byte values, for which it would generate a cds instruction. However I forgot to remove the corresponding define in our percpu header file, which means that this_cpu_cmpxchg_double would now incorrectly generate a cdsg instruction if being used on a double four byte location. Therefore remove the percpu define as well. There is currently no user and therefore no bug fixed with this. Obviously any such user could and should simply use cmpxchg. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/cpumf: Improve guest detection heuristicsChristian Borntraeger2016-03-022-3/+7
| | | | | | | | | | | | | | | | | | | | | | commit e22cf8ca6f75 ("s390/cpumf: rework program parameter setting to detect guest samples") requires guest changes to get proper guest/host. We can do better: We can use the primary asn value, which is set on all Linux variants to compare this with the host pp value. We now have the following cases: 1. Guest using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp != 0 --> guest 2. Guest not using PP host sample: gpp == 0, asn == hpp --> host guest sample: gpp == 0, asn != hpp --> guest As soon as the host no longer sets CR4, we must back out this heuristics - let's add a comment in switch_to. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/fault: merge report_user_fault implementationsHeiko Carstens2016-03-023-26/+12
| | | | | | | | | We have two close to identical report_user_fault functions. Add a parameter to one and get rid of the other one in order to reduce code duplication. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dis: use correct escape sequence for '%' characterHeiko Carstens2016-03-021-12/+5
| | | | | | | | | | | | | | | | The double escape character sequence introduced with commit 272fa59ccb4f ("s390/dis: Fix handling of format specifiers") is not necessary anymore since commit 561e10300269 ("s390/dis: Fix printing of the register numbers"). Instead this now generates an extra '%' character: lg %%r1,160(%%r11) So fix this and basically revert 272fa59ccb4f. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/kvm: simplify set_guest_storage_keyMartin Schwidefsky2016-03-021-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | Git commit ab3f285f227fec62868037e9b1b1fd18294a83b8 "KVM: s390/mm: try a cow on read only pages for key ops" added a fixup_user_fault to set_guest_storage_key force a copy on write if the page is mapped read-only. This is supposed to fix the problem of differing storage keys for shared mappings, e.g. the empty_zero_page. But if the storage key is set before the pte is mapped the storage key update is done on the pgste. A later fault will happily map the shared page with the key from the pgste. Eventually git commit 2faee8ff9dc6f4bfe46f6d2d110add858140fb20 "s390/mm: prevent and break zero page mappings in case of storage keys" fixed this problem for the empty_zero_page. The commit makes sure that guests enabled for storage keys will not use the empty_zero_page at all. As the call to fixup_user_fault in set_guest_storage_key depends on the order of the storage key operation vs. the fault that maps the pte it does not really fix anything. Just remove it. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/oprofile: add z13/z13s model numbersHeiko Carstens2016-02-241-0/+1
| | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390: add z13s model number to z13 elf platformHeiko Carstens2016-02-242-4/+5
| | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/mm: correct comment about segment table entriesMartin Schwidefsky2016-02-231-5/+5
| | | | | | | | | The comment describing the bit encoding for segment table entries is incorrect in regard to the read and write bits. The segment read bit is 0x0002 and write is 0x0001, not the other way around. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dumpstack: merge all four stack tracersHeiko Carstens2016-02-237-218/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have four different stack tracers of which three had bugs. So it's time to merge them to a single stack tracer which allows to specify a call back function which will be called for each step. This patch changes behavior a bit: - the "nosched" and "in_sched_functions" check within save_stack_trace_tsk did work only for the last stack frame within a context. Now it considers the check for each stack frame like it should. - both the oprofile variant and the perf_events variant did save a return address twice if a zero back chain was detected, which indicates an interrupt frame. The new dump_trace function will call the oprofile and perf_events backends with the psw address that is contained within the corresponding pt_regs structure instead. - the original show_trace and save_context_stack functions did already use the psw address of the pt_regs structure if a zero back chain was detected. However now we ignore the psw address if it is a user space address. After all we trace the kernel stack and not the user space stack. This way we also get rid of the garbage user space address in case of warnings and / or panic call traces. So this should make life easier since now there is only one stack tracer left which we can break. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/mm: remove unnecessary indirection with pgste_update_allMartin Schwidefsky2016-02-231-13/+12
| | | | | | | The first parameter of pgste_update_all is a pointer to a pte. Simplify the code by passing the pte value. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dasd: fix incorrect locking order for LCU device add/removeStefan Haberland2016-02-231-83/+152
| | | | | | | | | | | | | | | | The correct lock order for LCU lock and cdev lock is to take the cdev lock first and afterwards the LCU lock. This is caused by the fact that LCU functions are called in an interrupt context with the cdev lock implicitly hold by CIO. To assure the right locking order but also be able to iterate over devices in a LCU introduce a trylock block that can be called with the device lock for one device hold and then takes the LCU lock and try to lock all devices accounted to this LCU. Afterwards all devices and the LCU itself are locked. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dumpstack: use bit fields to decode psw mask in show_registers()Heiko Carstens2016-02-231-13/+4
| | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/dumpstack: add missing ri bit to show_registers() outputHeiko Carstens2016-02-231-1/+2
| | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390: add current_stack_pointer() helper functionHeiko Carstens2016-02-234-16/+20
| | | | | | | | | | Implement current_stack_pointer() helper function and use it everywhere, instead of having several different inline assembly variants. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/stacktrace: use nosched instead of savesched parameterHeiko Carstens2016-02-231-6/+6
| | | | | | | | Use the inversed "nosched" logic like all other architectures. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/pageattr: do a single TLB flush for change_page_attrMartin Schwidefsky2016-02-231-5/+3
| | | | | | | | | | | | | | | | | The change of the access rights for an address range in the kernel address space is currently done with a loop of IPTE + a store of the modified PTE. Between the IPTE and the store the PTE will be invalid, this intermediate state can cause problems with concurrent accesses. Consider a change of a kernel area from read-write to read-only, a concurrent reader of that area should be fine but with the invalid PTE it might get an unexpected exception. Remove the IPTEs for each PTE and do a global flush after all PTEs have been modified. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/xor: optimized xor routing using the XC instructionMartin Schwidefsky2016-02-233-2/+155
| | | | | Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/pci: remove pdev pointer from arch dataSebastian Ott2016-02-236-23/+33
| | | | | | | | | | | | | | | | | | | | For each PCI function we need to maintain arch specific data in struct zpci_dev which also contains a pointer to struct pci_dev. When a function is registered or deregistered (which is triggered by PCI common code) we need to adjust that pointer which could interfere with the machine check handler (triggered by FW) using zpci_dev->pdev. Since multiple instances of the same pdev could exist at a time this can't be solved with locking. Fix that by ditching the pdev pointer and use a bus walk to reach struct pci_dev (only one instance of a pdev can be registered at the bus at a time). Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/fpu: signals vs. floating point control registerMartin Schwidefsky2016-02-221-0/+2
| | | | | | | | | | | | | git commit 904818e2f229f3d94ec95f6932a6358c81e73d78 "s390/kernel: introduce fpu-internal.h with fpu helper functions" introduced the fpregs_store / fp_regs_load helper. These function fail to save and restore the floating pointer control registers. The effect is that the FPC is not correctly handled on signal delivery and signal return. Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* s390/compat: correct restore of high gprs on signal returnMartin Schwidefsky2016-02-221-1/+1
| | | | | | | | | | | | | | | | git commit 8070361799ae1e3f4ef347bd10f0a508ac10acfb "s390: add support for vector extension" broke 31-bit compat processes in regard to signal handling. The restore_sigregs_ext32() function is used to restore the additional elements from the user space signal frame. Among the additional elements are the upper registers halves for 64-bit register support for 31-bit processes. The copy_from_user that is used to retrieve the high-gprs array from the user stack uses an incorrect length, 8 bytes instead of 64 bytes. This causes incorrect upper register halves to get loaded. Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* Linux 4.5-rc5v4.5-rc5Linus Torvalds2016-02-201-1/+1
|
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2016-02-2017-134/+500
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This is unusually large, partly due to the EFI fixes that prevent accidental deletion of EFI variables through efivarfs that may brick machines. These fixes are somewhat involved to maintain compatibility with existing install methods and other usage modes, while trying to turn off the 'rm -rf' bricking vector. Other fixes are for large page ioremap()s and for non-temporal user-memcpy()s" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix vmalloc_fault() to handle large pages properly hpet: Drop stale URLs x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable lib/ucs2_string: Correct ucs2 -> utf8 conversion efi: Add pstore variables to the deletion whitelist efi: Make efivarfs entries immutable by default efi: Make our variable validation list include the guid efi: Do variable name validation tests in utf8 efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version lib/ucs2_string: Add ucs2 -> utf8 helper functions
| * x86/mm: Fix vmalloc_fault() to handle large pages properlyToshi Kani2016-02-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A kernel page fault oops with the callstack below was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system: BUG: unable to handle kernel paging request at ffff880840000ff8 IP: vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM Call Trace: __do_page_fault+0x285/0x3e0 do_page_fault+0x2f/0x80 ? put_prev_entity+0x35/0x7a0 page_fault+0x28/0x30 ? memcpy_erms+0x6/0x10 ? schedule+0x35/0x80 ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] ? schedule_timeout+0x183/0x240 btt_log_read+0x63/0x140 [nd_btt] : ? __symbol_put+0x60/0x60 ? kernel_read+0x50/0x80 SyS_finit_module+0xb9/0xf0 entry_SYSCALL_64_fastpath+0x1a/0xa4 Since v4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's PGD entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel PGD entry to point it. If user process's PGD entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the PGD sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 PGD entry covers the entire vmalloc range, which is always valid. (A separate change to sync PGD entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its PGD entry is always valid. Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> # 4.1+ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * hpet: Drop stale URLsMichael S. Tsirkin2016-02-173-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looks like the HPET spec at intel.com got moved. It isn't hard to find so drop the link, just mention the revision assumed. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in ↵Toshi Kani2016-02-171-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __copy_user_nocache() Data corruption issues were observed in tests which initiated a system crash/reset while accessing BTT devices. This problem is reproducible. The BTT driver calls pmem_rw_bytes() to update data in pmem devices. This interface calls __copy_user_nocache(), which uses non-temporal stores so that the stores to pmem are persistent. __copy_user_nocache() uses non-temporal stores when a request size is 8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates the BTT map table, which entry size is 4 bytes. Therefore, updates to the map table entries remain cached, and are not written to pmem after a crash. Change __copy_user_nocache() to use non-temporal store when a request size is 4 bytes. The change extends the current byte-copy path for a less-than-8-bytes request, and does not add any overhead to the regular path. Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com> Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/uaccess/64: Make the __copy_user_nocache() assembly code more readableToshi Kani2016-02-171-41/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add comments to __copy_user_nocache() to clarify its procedures and alignment requirements. Also change numeric branch target labels to named local labels. No code changed: arch/x86/lib/copy_user_64.o: text data bss dec hex filename 1239 0 0 1239 4d7 copy_user_64.o.before 1239 0 0 1239 4d7 copy_user_64.o.after md5: 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.before.asm 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.after.asm Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brian.boylston@hpe.com Cc: dan.j.williams@intel.com Cc: linux-nvdimm@lists.01.org Cc: micah.parrish@hpe.com Cc: ross.zwisler@linux.intel.com Cc: vishal.l.verma@intel.com Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com [ Small readability edits and added object file comparison. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge tag 'efi-urgent' of ↵Ingo Molnar2016-02-162-7/+8
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI bug fixes from Matt Fleming: * Fix bugs in our code that converts ucs2 strings to utf8 where we unintentionally drop bits from the original string (Jason Andryuk) * Add the efi-pstore variables to the variable whitelist so that users can continue to delete them via efivarfs without needing to manipulate the immutable flag (Matt Fleming) Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * lib/ucs2_string: Correct ucs2 -> utf8 conversionJason Andryuk2016-02-161-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comparisons should be >= since 0x800 and 0x80 require an additional bit to store. For the 3 byte case, the existing shift would drop off 2 more bits than intended. For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in byte 2. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Cc: Peter Jones <pjones@redhat.com> Cc: Matthew Garrett <mjg59@coreos.com> Cc: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| | * efi: Add pstore variables to the deletion whitelistMatt Fleming2016-02-161-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Laszlo explains why this is a good idea, 'This is because the pstore filesystem can be backed by UEFI variables, and (for example) a crash might dump the last kilobytes of the dmesg into a number of pstore entries, each entry backed by a separate UEFI variable in the above GUID namespace, and with a variable name according to the above pattern. Please see "drivers/firmware/efi/efi-pstore.c". While this patch series will not prevent the user from deleting those UEFI variables via the pstore filesystem (i.e., deleting a pstore fs entry will continue to delete the backing UEFI variable), I think it would be nice to preserve the possibility for the sysadmin to delete Linux-created UEFI variables that carry portions of the crash log, *without* having to mount the pstore filesystem.' There's also no chance of causing machines to become bricked by deleting these variables, which is the whole purpose of excluding things from the whitelist. Use the LINUX_EFI_CRASH_GUID guid and a wildcard '*' for the match so that we don't have to update the string in the future if new variable name formats are created for crash dump variables. Reported-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Peter Jones <pjones@redhat.com> Tested-by: Peter Jones <pjones@redhat.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| * Merge tag 'efi-urgent' of ↵Ingo Molnar2016-02-1612-83/+383
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: * Prevent accidental deletion of EFI variables through efivarfs that may brick machines. We use a whitelist of known-safe variables to allow things like installing distributions to work out of the box, and instead restrict vendor-specific variable deletion by making non-whitelist variables immutable (Peter Jones) Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * efi: Make efivarfs entries immutable by defaultPeter Jones2016-02-109-41/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to *accidentally* brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| | * efi: Make our variable validation list include the guidPeter Jones2016-02-103-22/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the variables in this list so far are defined to be in the global namespace in the UEFI spec, so this just further ensures we're validating the variables we think we are. Including the guid for entries will become more important in future patches when we decide whether or not to allow deletion of variables based on presence in this list. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| | * efi: Do variable name validation tests in utf8Peter Jones2016-02-101-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually translate from ucs2 to utf8 before doing the test, and then test against our other utf8 data, instead of fudging it. Signed-off-by: Peter Jones <pjones@redhat.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| | * efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad versionPeter Jones2016-02-102-23/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Translate EFI's UCS-2 variable names to UTF-8 instead of just assuming all variable names fit in ASCII. Signed-off-by: Peter Jones <pjones@redhat.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
| | * lib/ucs2_string: Add ucs2 -> utf8 helper functionsPeter Jones2016-02-102-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds ucs2_utf8size(), which tells us how big our ucs2 string is in bytes, and ucs2_as_utf8, which translates from ucs2 to utf8.. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-02-202-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of CPU hotplug related fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Plug potential memory leak in CPU_UP_PREPARE perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state perf/core: Remove bogus UP_CANCELED hotplug state perf/x86/amd/uncore: Plug reference leak
| * | | perf/core: Plug potential memory leak in CPU_UP_PREPAREThomas Gleixner2016-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated and assigned hash has been freed already, but perf_event_init_cpu() unconditionally allocates and assignes a new hash if the swhash is referenced. By overwriting the pointer the existing hash is not longer accessible. Verify that there is no hash assigned on this cpu before allocating and assigning a new one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.843269966@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug stateThomas Gleixner2016-02-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the swhash is referenced. If yes it allocates a new hash and stores the pointer in the per cpu data structure. But at this point the cpu is still online, so there must be a valid hash already. By overwriting the pointer the existing hash is not longer accessible. Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.763417379@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | perf/core: Remove bogus UP_CANCELED hotplug stateThomas Gleixner2016-02-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(), which is a pointless exercise. The cpu is not online, so the smp function calls return -ENXIO. So the result is a list walk to call noops. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.682184765@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | perf/x86/amd/uncore: Plug reference leakThomas Gleixner2016-02-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore struct is freed, but the percpu pointer still references it. Set it to NULL. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge tag 'powerpc-4.5-3' of ↵Linus Torvalds2016-02-2014-7/+91
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar - Fix dedotify for binutils >= 2.26 from Andreas Schwab - Don't trace hcalls on offline CPUs from Denis Kirjanov - eeh: Fix stale cached primary bus from Gavin Shan - eeh: Fix stale PE primary bus from Gavin Shan - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ioda: Set "read" permission when "write" is set powerpc/mm: Fix Multi hit ERAT cause by recent THP update powerpc/powernv: Fix stale PE primary bus powerpc/eeh: Fix stale cached primary bus powerpc/pseries: Don't trace hcalls on offline CPUs powerpc: Fix dedotify for binutils >= 2.26 powerpc/book3s_32: Fix build error with checkpoint restart
| * | | | powerpc/ioda: Set "read" permission when "write" is setAlexey Kardashevskiy2016-02-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: stable@vger.kernel.org # 4.2+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/mm: Fix Multi hit ERAT cause by recent THP updateAneesh Kumar K.V2016-02-154-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With ppc64 we use the deposited pgtable_t to store the hash pte slot information. We should not withdraw the deposited pgtable_t without marking the pmd none. This ensure that low level hash fault handling will skip this huge pte and we will handle them at upper levels. Recent change to pmd splitting changed the above in order to handle the race between pmd split and exit_mmap. The race is explained below. Consider following race: CPU0 CPU1 shrink_page_list() add_to_swap() split_huge_page_to_list() __split_huge_pmd_locked() pmdp_huge_clear_flush_notify() // pmd_none() == true exit_mmap() unmap_vmas() zap_pmd_range() // no action on pmd since pmd_none() == true pmd_populate() As result the THP will not be freed. The leak is detected by check_mm(): BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512 The above required us to not mark pmd none during a pmd split. The fix for ppc is to clear the huge pte of _PAGE_USER, so that low level fault handling code skip this pte. At higher level we do take ptl lock. That should serialze us against the pmd split. Once the lock is acquired we do check the pmd again using pmd_same. That should always return false for us and hence we should retry the access. We do the pmd_same check in all case after taking plt with THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and huge_pmd_set_accessed) Also make sure we wait for irq disable section in other cpus to finish before flipping a huge pte entry with a regular pmd entry. Code paths like find_linux_pte_or_hugepte depend on irq disable to get a stable pte_t pointer. A parallel thp split need to make sure we don't convert a pmd pte to a regular pmd entry without waiting for the irq disable section to finish. Fixes: eef1b3ba053a ("thp: implement split_huge_pmd()") Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/powernv: Fix stale PE primary busGavin Shan2016-02-153-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PCI bus is unplugged during full hotplug for EEH recovery, the platform PE instance (struct pnv_ioda_pe) isn't released and it dereferences the stale PCI bus that has been released. It leads to kernel crash when referring to the stale PCI bus. This fixes the issue by correcting the PE's primary bus when it's oneline at plugging time, in pnv_pci_dma_bus_setup() which is to be called by pcibios_fixup_bus(). Cc: stable@vger.kernel.org # v4.1+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/eeh: Fix stale cached primary busGavin Shan2016-02-154-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PE is created, its primary bus is cached to pe->bus. At later point, the cached primary bus is returned from eeh_pe_bus_get(). However, we could get stale cached primary bus and run into kernel crash in one case: full hotplug as part of fenced PHB error recovery releases all PCI busses under the PHB at unplugging time and recreate them at plugging time. pe->bus is still dereferencing the PCI bus that was released. This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity of pe->bus. pe->bus is updated when its first child EEH device is online and the flag is set. Before unplugging in full hotplug for error recovery, the flag is cleared. Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE") Cc: stable@vger.kernel.org #v3.11+ Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/pseries: Don't trace hcalls on offline CPUsDenis Kirjanov2016-02-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a cpu is hotplugged while the hcall trace points are active, it's possible to hit a warning from RCU due to the trace points calling into RCU from an offline cpu, eg: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 Make the hypervisor tracepoints conditional by using TRACE_EVENT_FN_COND. Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc: Fix dedotify for binutils >= 2.26Andreas Schwab2016-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since binutils 2.26 BFD is doing suffix merging on STRTAB sections. But dedotify modifies the symbol names in place, which can also modify unrelated symbols with a name that matches a suffix of a dotted name. To remove the leading dot of a symbol name we can just increment the pointer into the STRTAB section instead. Backport to all stables to avoid breakage when people update their binutils - mpe. Cc: stable@vger.kernel.org Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/book3s_32: Fix build error with checkpoint restartAneesh Kumar K.V2016-01-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In file included from mm/vmscan.c:54:0: include/linux/swapops.h: In function ‘pte_to_swp_entry’: include/linux/swapops.h:69:2: error: implicit declaration of function ‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration] if (pte_swp_soft_dirty(pte)) ^ include/linux/swapops.h:70:3: error: implicit declaration of function ‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration] pte = pte_swp_clear_soft_dirty(pte); We support soft dirty tracking only with book3s 64 for now. So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE feature is not really dependent on SOFT_DIRTY. We track the dependency between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers Fixes: 7207f43665b8 ("powerpc/mm: Add page soft dirty tracking") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud