summaryrefslogtreecommitdiffstats
path: root/arch/s390/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | Merge tag 'nmiforkvm' of ↵Martin Schwidefsky2017-06-283-11/+89
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features Pull kvm patches from Christian Borntraeger: "s390,kvm: provide plumbing for machines checks when running guests" This provides the basic plumbing for handling machine checks when running guests
| | * | | | KVM: s390: Backup the guest's machine check infoQingFeng Hao2017-06-271-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a machine check happens in the guest, related mcck info (mcic, external damage code, ...) is stored in the vcpu's lowcore on the host. Then the machine check handler's low-level part is executed, followed by the high-level part. If the high-level part's execution is interrupted by a new machine check happening on the same vcpu on the host, the mcck info in the lowcore is overwritten with the new machine check's data. If the high-level part's execution is scheduled to a different cpu, the mcck info in the lowcore is uncertain. Therefore, for both cases, the further reinjection to the guest will use the wrong data. Let's backup the mcck info in the lowcore to the sie page for further reinjection, so that the right data will be used. Add new member into struct sie_page to store related machine check's info of mcic, failing storage address and external damage code. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | | s390/nmi: s390: New low level handling for machine check happening in guestQingFeng Hao2017-06-273-11/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the logic to check if the machine check happens when the guest is running. If yes, set the exit reason -EINTR in the machine check's interrupt handler. Refactor s390_do_machine_check to avoid panicing the host for some kinds of machine checks which happen when guest is running. Reinject the instruction processing damage's machine checks including Delayed Access Exception instead of damaging the host if it happens in the guest because it could be caused by improper update on TLB entry or other software case and impacts the guest only. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | | | s390/fpu: export save_fpu_regs for all configsMartin Schwidefsky2017-06-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The save_fpu_regs function is a general API that is supposed to be usable for modules as well. Remove the #ifdef that hides the symbol for CONFIG_KVM=n. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/kvm: avoid global config of vm.alloc_pgste=1Martin Schwidefsky2017-06-131-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The system control vm.alloc_pgste is used to control the size of the page tables, either 2K or 4K. The idea is that a KVM host sets the vm.alloc_pgste control to 1 which causes *all* new processes to run with 4K page tables. For a non-kvm system the control should stay off to save on memory used for page tables. Trouble is that distributions choose to set the control globally to be able to run KVM guests. This wastes memory on non-KVM systems. Introduce the PT_S390_PGSTE ELF segment type to "mark" the qemu executable with it. All executables with this (empty) segment in its ELF phdr array will be started with 4K page tables. Any executable without PT_S390_PGSTE will run with the default 2K page tables. This removes the need to set vm.alloc_pgste=1 for a KVM host and minimizes the waste of memory for page tables. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390: rename struct psw_bits membersHeiko Carstens2017-06-123-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename a couple of the struct psw_bits members so it is more obvious for what they are good. Initially I thought using the single character names from the PoP would be sufficient and obvious, but admittedly that is not true. The current implementation is not easy to use, if one has to look into the source file to figure out which member represents the 'per' bit (which is the 'r' member). Therefore rename the members to sane names that are identical to the uapi psw mask defines: r -> per i -> io e -> ext t -> dat m -> mcheck w -> wait p -> pstate Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390: rename psw_bits enumsHeiko Carstens2017-06-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The address space enums that must be used when modifying the address space part of a psw with the psw_bits() macro can easily be confused with the psw defines that are used to mask and compare directly the mask part of a psw. We have e.g. PSW_AS_PRIMARY vs PSW_ASC_PRIMARY. To avoid confusion rename the PSW_AS_* enums to PSW_BITS_AS_*. In addition also rename the PSW_AMODE_* enums, so they also follow the same naming scheme: PSW_BITS_AMODE_*. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPLHeiko Carstens2017-06-121-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the two commits 7afbeb6df2aa ("s390/ipl: always use load normal for CCW-type re-IPL") 0f7451ff3ab8 ("s390/ipl: use load normal for LPAR re-ipl") The two commits did not take into account that behavior of standby memory changes fundamentally if the re-IPL method is changed from Load Clear to Load Normal. In case of the old re-IPL clear method all memory that was initially in standby state will be put into standby state again within the re-IPL process. Or in other words: memory that was brought online before a re-IPL will be offline again after a reboot. Given that we use different re-IPL methods depending on the hypervisor and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and why memory will stay online or will be offline after a re-IPL. This does also have other side effects, since memory that is online from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE for memory that is offline. Therefore, before the change, a user could online and offline memory easily since standby memory was always in ZONE_NORMAL. After the change, and a re-IPL, this depended on which memory parts were online before the re-IPL. From a usability point of view the current behavior is more than suboptimal. Therefore revert these changes until we have a better solution and get back to a consistent behavior. The bad thing about this is that the time required for a re-IPL will be significantly increased for configurations with several 100GB or 1TB of memory. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/dumpstack: remove raw stack dumpHeiko Carstens2017-06-121-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove raw stack dumps that are printed before call traces in case of a warning, or the 'l' sysrq trigger (show a stack backtrace for all active CPUs). Besides that a raw stack dump should not be shown for the 'l' sysrq trigger the value of the dump is close to zero. That's also why we don't print it in case of a panic since ages anymore. That this is still printed on warnings is just a leftover. So get rid of this completely. The following won't be printed anymore with this change: Stack: 00000000bbc4fbc8 00000000bbc4fc58 0000000000000003 0000000000000000 00000000bbc4fcf8 00000000bbc4fc70 00000000bbc4fc70 0000000000000020 000000007fe00098 00000000bfe8be00 00000000bbc4fe94 000000000000000a 000000000000000c 00000000bbc4fcc0 0000000000000000 0000000000000000 000000000095b930 0000000000113366 00000000bbc4fc58 00000000bbc4fca0 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/perf: fix null string in perf list pmu commandThomas Richter2017-06-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Command 'perf list pmu' displays events which contain an invalid string "(null)=xxx", where xxx is the pmu event name, for example: cpum_cf/AES_BLOCKED_CYCLES,(null)=AES_BLOCKED_CYCLES/ This is not correct, the invalid string should not be displayed at all. It is caused by an obsolete term in the sysfs attribute file for each s390 CPUMF counter event. Reading from the sysfs file also displays the event name. Fix this by omitting the event name. This patch makes s390 CPUMF sysfs files consistent with other plattforms. This is an interface change between user and kernel but does not break anything. Reading from a counter event sysfs file should only list terms mentioned in the /sys/bus/event_source/devices/<cpumf>/format directory. Name is not listed. Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/ptrace: guarded storage regset for the current taskMartin Schwidefsky2017-06-121-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The regset functions for guarded storage are supposed to work on the current task as well. For task == current add the required load and store instructions for the guarded storage control block. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/smp: fix false positive kmemleak of mcesa data structureChristian Borntraeger2017-06-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I get number of CPUs - 1 kmemleak hits like unreferenced object 0x37ec6f000 (size 1024): comm "swapper/0", pid 1, jiffies 4294937330 (age 889.690s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<000000000034a848>] kmem_cache_alloc+0x2b8/0x3d0 [<00000000001164de>] __cpu_up+0x456/0x488 [<000000000016f60c>] bringup_cpu+0x4c/0xd0 [<000000000016d5d2>] cpuhp_invoke_callback+0xe2/0x9e8 [<000000000016f3c6>] cpuhp_up_callbacks+0x5e/0x110 [<000000000016f988>] _cpu_up+0xe0/0x158 [<000000000016faf0>] do_cpu_up+0xf0/0x110 [<0000000000dae1ee>] smp_init+0x126/0x130 [<0000000000d9bd04>] kernel_init_freeable+0x174/0x2e0 [<000000000089fc62>] kernel_init+0x2a/0x148 [<00000000008adce2>] kernel_thread_starter+0x6/0xc [<00000000008adcdc>] kernel_thread_starter+0x0/0xc [<ffffffffffffffff>] 0xffffffffffffffff The pointer of this data structure is stored in the prefix page of that CPU together with some extra bits ORed into the the low bits. Mark the data structure as non-leak. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/vdso: use _install_special_mapping to establish vdsoMartin Schwidefsky2017-06-121-27/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to the improved _install_special_mapping function to install the vdso mapping. This has two advantages, the arch_vma_name function is not needed anymore and the vdso vma still has its name after its memory location has been changed with mremap. Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390/cputime: simplify account_system_index_scaledMartin Schwidefsky2017-06-121-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The account_system_index_scaled gets two cputime values, a raw value derived from CPU timer deltas and a scaled value. The scaled value is always calculated from the raw value, the code can be simplified by moving the scale_vtime call into account_system_index_scaled. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | | | s390: add missing header includes for type checkingHeiko Carstens2017-06-121-0/+1
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing include statements to make sure that prototypes match implementation. As reported by sparse: arch/s390/crypto/arch_random.c:18:1: warning: symbol 's390_arch_random_available' was not declared. Should it be static? arch/s390/kernel/traps.c:279:13: warning: symbol 'trap_init' was not declared. Should it be static? Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | | | Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuidLinus Torvalds2017-07-031-1/+1
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull uuid subsystem from Christoph Hellwig: "This is the new uuid subsystem, in which Amir, Andy and I have started consolidating our uuid/guid helpers and improving the types used for them. Note that various other subsystems have pulled in this tree, so I'd like it to go in early. UUID/GUID summary: - introduce the new uuid_t/guid_t types that are going to replace the somewhat confusing uuid_be/uuid_le types and make the terminology fit the various specs, as well as the userspace libuuid library. (me, based on a previous version from Amir) - consolidated generic uuid/guid helper functions lifted from XFS and libnvdimm (Amir and me) - conversions to the new types and helpers (Amir, Andy and me)" * tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits) ACPI: hns_dsaf_acpi_dsm_guid can be static mmc: sdhci-pci: make guid intel_dsm_guid static uuid: Take const on input of uuid_is_null() and guid_is_null() thermal: int340x_thermal: fix compile after the UUID API switch thermal: int340x_thermal: Switch to use new generic UUID API acpi: always include uuid.h ACPI: Switch to use generic guid_t in acpi_evaluate_dsm() ACPI / extlog: Switch to use new generic UUID API ACPI / bus: Switch to use new generic UUID API ACPI / APEI: Switch to use new generic UUID API acpi, nfit: Switch to use new generic UUID API MAINTAINERS: add uuid entry tmpfs: generate random sb->s_uuid scsi_debug: switch to uuid_t nvme: switch to uuid_t sysctl: switch to use uuid_t partitions/ldm: switch to use uuid_t overlayfs: use uuid_t instead of uuid_be fs: switch ->s_uuid to uuid_t ima/policy: switch to use uuid_t ...
| * | | | S390/sysinfo: use uuid_is_null instead of opencoding itChristoph Hellwig2017-06-051-1/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | And switch to use uuid_t instead of the old uuid_be type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
* | | | arch: remove unused macro/function thread_saved_pc()Tobias Klauser2017-06-281-25/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only user of thread_saved_pc() in non-arch-specific code was removed in commit 8243d5597793 ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPLHeiko Carstens2017-06-141-6/+1
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the two commits 7afbeb6df2aa ("s390/ipl: always use load normal for CCW-type re-IPL") 0f7451ff3ab8 ("s390/ipl: use load normal for LPAR re-ipl") The two commits did not take into account that behavior of standby memory changes fundamentally if the re-IPL method is changed from Load Clear to Load Normal. In case of the old re-IPL clear method all memory that was initially in standby state will be put into standby state again within the re-IPL process. Or in other words: memory that was brought online before a re-IPL will be offline again after a reboot. Given that we use different re-IPL methods depending on the hypervisor and CCW-type vs SCSI re-IPL it is not easy to tell in advance when and why memory will stay online or will be offline after a re-IPL. This does also have other side effects, since memory that is online from the beginning will be in ZONE_NORMAL by default vs ZONE_MOVABLE for memory that is offline. Therefore, before the change, a user could online and offline memory easily since standby memory was always in ZONE_NORMAL. After the change, and a re-IPL, this depended on which memory parts were online before the re-IPL. From a usability point of view the current behavior is more than suboptimal. Therefore revert these changes until we have a better solution and get back to a consistent behavior. The bad thing about this is that the time required for a re-IPL will be significantly increased for configurations with several 100GB or 1TB of memory. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | s390/kvm: do not rely on the ILC on kvm host protection faulsChristian Borntraeger2017-05-171-6/+13
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | For most cases a protection exception in the host (e.g. copy on write or dirty tracking) on the sie instruction will indicate an instruction length of 4. Turns out that there are some corner cases (e.g. runtime instrumentation) where this is not necessarily true and the ILC is unpredictable. Let's replace our 4 byte rewind_pad with 3 byte nops to prepare for all possible ILCs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2017-05-164-8/+33
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - convert the debug feature to refcount_t - reduce the copy size for strncpy_from_user - 8 bug fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/virtio: change virtio_feature_desc:features type to __le32 s390: convert debug_info.ref_count from atomic_t to refcount_t s390: move _text symbol to address higher than zero s390/qdio: increase string buffer size s390/ccwgroup: increase string buffer size s390/topology: let topology_mnest_limit() return unsigned char s390/uaccess: use sane length for __strncpy_from_user() s390/uprobes: fix compile for !KPROBES s390/ftrace: fix compile for !MODULES s390/cputime: fix incorrect system time
| * s390: convert debug_info.ref_count from atomic_t to refcount_tElena Reshetova2017-05-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390: move _text symbol to address higher than zeroHeiko Carstens2017-05-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf tool assumes that kernel symbols are never present at address zero. In fact it assumes if functions that map symbols to addresses return zero, that the symbol was not found. Given that s390's _text symbol historically is located at address zero this yields at least a couple of false errors and warnings in one of perf's test cases about not present symbols ("perf test 1"). To fix this simply move the _text symbol to address 0x200, just behind the initial psw and channel program located at the beginning of the kernel image. This is now hard coded within the linker script. I tried a nicer solution which moves the initial psw and channel program into an own section. However that would move the symbols within the "real" head.text section to different addresses, since the ".org" statements within head.S are relative to the head.text section. If there is a new section in front, everything else will be moved. Alternatively I could have adjusted all ".org" statements. But this current solution seems to be the easiest one, since nobody really cares where the _text symbol is actually located. Reported-by: Zvonko Kosic <zkosic@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/ftrace: fix compile for !MODULESHeiko Carstens2017-05-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | Fix this compile error if CONFIG_MODULES is disabled: arch/s390/built-in.o: In function `ftrace_plt_init': arch/s390/kernel/ftrace.o:(.init.text+0x34cc): undefined reference to `module_alloc' Reported-by: Rob Landley <rob@landley.net> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/cputime: fix incorrect system timeMartin Schwidefsky2017-05-031-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git commit c5328901aa1db134 "[S390] entry[64].S improvements" removed the update of the exit_timer lowcore field from the critical section cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done blocks. If the PSW is updated by the critical section cleanup to point to user space again, the interrupt entry code will do a vtime calculation after the cleanup completed with an exit_timer value which has *not* been updated. Due to this incorrect system time deltas are calculated. If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry time of the interrupt. Cc: stable@vger.kernel.org # 3.3+ Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | s390: use set_memory.h headerLaura Abbott2017-05-083-1/+3
| | | | | | | | | | | | | | | | | | | | | | set_memory_* functions have moved to set_memory.h. Switch to this explicitly Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | cpumask: make "nr_cpumask_bits" unsignedAlexey Dobriyan2017-05-081-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bit searching functions accept "unsigned long" indices but "nr_cpumask_bits" is "int" which is signed, so inevitable sign extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of uses across whole kernel. Change "nr_cpumask_bits" to unsigned, this number can't be negative after all. It allows to do implicit zero-extension on x86_64 without MOVSX. Change signed comparisons into unsigned comparisons where necessary. Other uses looks fine because it is either argument passed to a function or comparison is already unsigned. Net win on allyesconfig type of kernel: ~2.8 KB (!) add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833) function old new delta xen_exit_mmap 691 735 +44 qstat_read 426 440 +14 __cpufreq_cooling_register 1678 1687 +9 trace_rb_cpu_prepare 447 455 +8 vermagic 54 60 +6 nfp_driver_version 54 60 +6 rcu_torture_stats_print 1147 1151 +4 find_next_push_cpu 267 269 +2 xen_irq_resume 961 960 -1 ... init_vp_index 946 906 -40 od_set_powersave_bias 328 281 -47 power_cpu_exit 193 139 -54 arch_show_interrupts 3538 3484 -54 select_idle_sibling 1558 1471 -87 Total: Before=158358910, After=158356077, chg -0.00% Same arguments apply to "nr_cpu_ids" but I haven't yet found enough courage to delve into this issue (and proper fix may require new type "cpu_t" which is whole separate story). Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2017-05-021-1/+29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch updates from Jiri Kosina: - a per-task consistency model is being added for architectures that support reliable stack dumping (extending this, currently rather trivial set, is currently in the works). This extends the nature of the types of patches that can be applied by live patching infrastructure. The code stems from the design proposal made [1] back in November 2014. It's a hybrid of SUSE's kGraft and RH's kpatch, combining advantages of both: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Most of the heavy lifting done by Josh Poimboeuf with help from Miroslav Benes and Petr Mladek [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz - module load time patch optimization from Zhou Chengming - a few assorted small fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing printk newlines livepatch: Cancel transition a safe way for immediate patches livepatch: Reduce the time of finding module symbols livepatch: make klp_mutex proper part of API livepatch: allow removal of a disabled patch livepatch: add /proc/<pid>/patch_state livepatch: change to a per-task consistency model livepatch: store function sizes livepatch: use kstrtobool() in enabled_store() livepatch: move patching functions into patch.c livepatch: remove unnecessary object loaded check livepatch: separate enabled and patched states livepatch/s390: add TIF_PATCH_PENDING thread flag livepatch/s390: reorganize TIF thread flag bits livepatch/powerpc: add TIF_PATCH_PENDING thread flag livepatch/x86: add TIF_PATCH_PENDING thread flag livepatch: create temporary klp_update_patch_state() stub x86/entry: define _TIF_ALLWORK_MASK flags explicitly stacktrace/x86: add function for detecting reliable stack traces
| * livepatch/s390: add TIF_PATCH_PENDING thread flagMiroslav Benes2017-03-081-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update a task's patch state when returning from a system call or user space interrupt, or after handling a signal. This greatly increases the chances of a patch operation succeeding. If a task is I/O bound, it can be patched when returning from a system call. If a task is CPU bound, it can be patched when returning from an interrupt. If a task is sleeping on a to-be-patched function, the user can send SIGSTOP and SIGCONT to force it to switch. Since there are two ways the syscall can be restarted on return from a signal handling process, it is important to clear the flag before do_signal() is called. Otherwise we could miss the migration if we used SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking tasks. If we place our hook to sysc_work label in entry before TIF_SIGPENDING is evaluated we kill two birds with one stone. The task is correctly migrated in all return paths from a syscall. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge branch 'for-linus' of ↵Linus Torvalds2017-05-0224-142/+761
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - three merges for KVM/s390 with changes for vfio-ccw and cpacf. The patches are included in the KVM tree as well, let git sort it out. - add the new 'trng' random number generator - provide the secure key verification API for the pkey interface - introduce the z13 cpu counters to perf - add a new system call to set up the guarded storage facility - simplify TASK_SIZE and arch_get_unmapped_area - export the raw STSI data related to CPU topology to user space - ... and the usual churn of bug-fixes and cleanups. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits) s390/crypt: use the correct module alias for paes_s390. s390/cpacf: Introduce kma instruction s390/cpacf: query instructions use unique parameters for compatibility with KMA s390/trng: Introduce s390 TRNG device driver. s390/crypto: Provide s390 specific arch random functionality. s390/crypto: Add new subfunctions to the cpacf PRNO function. s390/crypto: Renaming PPNO to PRNO. s390/pageattr: avoid unnecessary page table splitting s390/mm: simplify arch_get_unmapped_area[_topdown] s390/mm: make TASK_SIZE independent from the number of page table levels s390/gs: add regset for the guarded storage broadcast control block s390/kvm: Add use_cmma field to mm_context_t s390/kvm: Add PGSTE manipulation functions vfio: ccw: improve error handling for vfio_ccw_mdev_remove vfio: ccw: remove unnecessary NULL checks of a pointer s390/spinlock: remove compare and delay instruction s390/spinlock: use atomic primitives for spinlocks s390/cpumf: simplify detection of guest samples s390/pci: remove forward declaration s390/pci: increase the PCI_NR_FUNCTIONS default ...
| * | s390/gs: add regset for the guarded storage broadcast control blockMartin Schwidefsky2017-04-211-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | The guarded storage interface allows to register a control block for each thread that is activated with the guarded storage broadcast event. To retrieve the complete state of a process from the kernel a register set for the stored broadcast control block is required. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/spinlock: remove compare and delay instructionMartin Schwidefsky2017-04-121-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CAD instruction never worked quite as expected for the spinlock code. It has been disabled by default with git commit 61b0b01686d48220, if the "cad" kernel parameter is specified it is enabled for both user space and the spinlock code. Leave the option to enable the instruction for user space but remove it from the spinlock code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpumf: simplify detection of guest samplesMartin Schwidefsky2017-04-053-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three different code levels in regard to the identification of guest samples. They differ in the way the LPP instruction is used. 1) Old kernels without the LPP instruction. The guest program parameter is always zero. 2) Newer kernels load the process pid into the program parameter with LPP. The guest program parameter is non-zero if the guest executes in a process != idle. 3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value non-zero even for the idle task. The guest program parameter is non-zero if the guest is running. All kernels load the process pid to CR4 on context switch. The CPU sampling code uses the value in CR4 to decide between guest and host samples in case the guest program parameter is zero. The three cases: 1) CR4==pid, gpp==0 2) CR4==pid, gpp==pid 3) CR4==pid, gpp==((1UL << 31) | pid) The load-control instruction to load the pid into CR4 is expensive and the goal is to remove it. To distinguish the host CR4 from the guest pid for the idle process the maximum value 0xffff for the PASN is used. This adds a fourth case for a guest OS with an updated kernel: 4) CR4==0xffff, gpp=((1UL << 31) | pid) The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff) to identify guest samples. This works nicely with all 4 cases, the only possible issue would be a guest with an old kernel (gpp==0) and a process pid of 0xffff. Well, don't do that.. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390: use 64-bit lctlg to load task pid to cr4 on context switchMartin Schwidefsky2017-04-051-1/+3
| | | | | | | | | | | | | | | | | | | | | The 32-bit lctl instruction is quite a bit slower than the 64-bit counter part lctlg. Use the faster instruction. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: make hw_perf_event_update() a void functionHendrik Brueckner2017-03-311-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return code of hw_perf_event_update() is not evaluated by its callers. Hence, simplify the function by removing the return code. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: correct variable naming (cleanup)Hendrik Brueckner2017-03-311-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | Make clear that the event definitions relate to the counter facility (cf) and not to the sampling facility (sf). Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: add IBM z13 counter event namesHendrik Brueckner2017-03-311-7/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the event names for the IBM z13/z13s specific CPU-MF counters. Also improve the merging of the generic and model specific events so that their sysfs attribute definitions completely reside in memory. Hence, flagging the generic event attribute definitions as initdata too. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: add support for the MT-diagnostic counter set (z13)Hendrik Brueckner2017-03-311-25/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | Complete the IBM z13 support and support counters from the MT-diagnostic counter set. Note that this counter set is available only if SMT is enabled. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: cleanup event/counter validationHendrik Brueckner2017-03-311-30/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The validate_event() function just checked for reserved counters in particular CPU-MF counter sets. Because the number of counters in counter sets vary among different hardware models, remove the explicit check to tolerate new models. Reserved counters are not accounted and, thus, will return zero. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/cpum_cf: update counter numbers to ecctr limitsHendrik Brueckner2017-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Use the highest counter number that can be specified for the ecctr (extract CPU counter) instruction for perf. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/kdump: Add final noteMichael Holzheu2017-03-281-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since linux v3.14 with commit 38dfac843cb6d7be1 ("vmcore: prevent PT_NOTE p_memsz overflow during header update") on s390 we get the following message in the kdump kernel: Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b, n_descsz=0x6b6b6b6b The reason for this is that we don't create a final zero note in the ELF header which the proc/vmcore code uses to find out the end of the notes section (see also kernel/kexec_core.c:final_note()). It still worked on s390 by chance because we (most of the time?) have the byte pattern 0x6b6b6b6b after the notes section which also makes the notes parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is interpreded as note size: if ((real_sz + sz) > max_sz) { pr_warn("Warning: Exceeded p_memsz, dropping P ...); break; } So fix this and add the missing final note to the ELF header. We don't have to adjust the memory size for ELF header ("alloc_size") because the new ELF note still fits into the 0x1000 base memory. Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/facilities: get rid of __ASSEMBLY__ in facility header fileHeiko Carstens2017-03-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need for the __ASSEMBLY__ ifdefery anymore since the architecture level set code that deals with facility bits was converted to C in the meantime. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/sysinfo: provide remaining stsi information via debugfsHeiko Carstens2017-03-221-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide the remaining stsi information via debugfs files. This also might be useful for debugging purposes. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/sysinfo,topology: provide raw stsi 15,1,x data via debugfsHeiko Carstens2017-03-222-1/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide the raw stsi 15,1,x data contents via debugfs. This makes it much easier to debug unexpected scheduling domains on machines that provide cpu topology information. Therefore this file adds a new 's390/stsi' debugfs directory with a file for each possible topology nesting level that is allowed by the architecture. The files will be created regardless if the machine supports all, or any, level. If a level is not supported, or no data is available, user space can recognize this with a -EINVAL error code when trying to read such data. In addition a 'topology' symlink is created that points to the file that contains the data that is used to create the scheduling domains. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/debugfs: introduce top-level 's390' directoryHeiko Carstens2017-03-222-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a top-level 's390' directory which should be used when adding new s390 specific debug feature files and/or directories. This makes hopefully sure that the contents of the s390 directory will be a bit more structured. Right now we have a couple of top-level files where it is not easy to tell to which subsystem they belong to. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/dump_stack: use control program identification stringHeiko Carstens2017-03-221-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If running within a level 3 hypervisor, the hypervisor provides a SYSIB block which contains a control program indentifier string. Use this string instead of the simple KVM and z/VM strings only. In case of z/VM this provides addtional information: the z/VM version. The new string looks similar to this: Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/dump_stack: remove whitespace from arch descriptionHeiko Carstens2017-03-221-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arch description provided for the "Hardware name:" contains lots of extra whitespace due to the way the SYSIB contents are defined (strings aren't zero terminated). This looks a bit odd and therefore remove the extra whitespace characters. This also gives the opportunity to add more information, if required, without hitting the magic 80 characters per line limit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/sysinfo: allow compiler warnings againHeiko Carstens2017-03-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow compiler warnings again for the sysinfo file. Compiler warnings were disabled when the bogomips calculation with math-emu code was introduced ("[S390] Calibrate delay and bogomips."). Since that code is gone, we can enable warnings again. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/topology: fix typo in early topology codeHeiko Carstens2017-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use MACHINE_FLAG_TOPOLOGY instead of MACHINE_HAS_TOPOLOGY when clearing the bit that indicates if the machine provides topology information (and if it should be used). Currently works anyway. Fixes: 68cc795d1933 ("s390/topology: make "topology=off" parameter work") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/topology: get rid of core mask arrayHeiko Carstens2017-03-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a single long value instead of a single element array to represent the core mask. The array is a leftover from 32/31 bit code so we were able to use bitops helper functions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
OpenPOWER on IntegriCloud