summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-11-1216-264/+365
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second batch of kvm updates from Paolo Bonzini: "Four changes: - x86: work around two nasty cases where a benign exception occurs while another is being delivered. The endless stream of exceptions causes an infinite loop in the processor, which not even NMIs or SMIs can interrupt; in the virt case, there is no possibility to exit to the host either. - x86: support for Skylake per-guest TSC rate. Long supported by AMD, the patches mostly move things from there to common arch/x86/kvm/ code. - generic: remove local_irq_save/restore from the guest entry and exit paths when context tracking is enabled. The patches are a few months old, but we discussed them again at kernel summit. Andy will pick up from here and, in 4.5, try to remove it from the user entry/exit paths. - PPC: Two bug fixes, see merge commit 370289756becc for details" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: rename update_db_bp_intercept to update_bp_intercept KVM: svm: unconditionally intercept #DB KVM: x86: work around infinite loop in microcode when #AC is delivered context_tracking: avoid irq_save/irq_restore on guest entry and exit context_tracking: remove duplicate enabled check KVM: VMX: Dump TSC multiplier in dump_vmcs() KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded KVM: VMX: Enable and initialize VMX TSC scaling KVM: x86: Use the correct vcpu's TSC rate to compute time scale KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() KVM: x86: Replace call-back compute_tsc_offset() with a common function KVM: x86: Replace call-back set_tsc_khz() with a common function KVM: x86: Add a common TSC scaling function KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch KVM: x86: Collect information for setting TSC scaling ratio KVM: x86: declare a few variables as __read_mostly KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common KVM: PPC: Book3S HV: Don't dynamically split core when already split ...
| * Merge branch 'kvm-ppc-fixes' of ↵Paolo Bonzini2015-11-122-9/+13
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD "Paolo, I have two fixes for HV KVM which I would like to have included in v4.4-rc1. The first one is a fix for a bug identified by Red Hat which causes occasional guest crashes. The second one fixes a bug which causes host stalls and timeouts under certain circumstances when the host is configured for static 2-way micro-threading mode."
| | * KVM: PPC: Book3S HV: Don't dynamically split core when already splitPaul Mackerras2015-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In static micro-threading modes, the dynamic micro-threading code is supposed to be disabled, because subcores can't make independent decisions about what micro-threading mode to put the core in - there is only one micro-threading mode for the whole core. The code that implements dynamic micro-threading checks for this, except that the check was missed in one case. This means that it is possible for a subcore in static 2-way micro-threading mode to try to put the core into 4-way micro-threading mode, which usually leads to stuck CPUs, spinlock lockups, and other stalls in the host. The problem was in the can_split_piggybacked_subcores() function, which should always return false if the system is in a static micro-threading mode. This fixes the problem by making can_split_piggybacked_subcores() use subcore_config_ok() for its checks, as subcore_config_ok() includes the necessary check for the static micro-threading modes. Credit to Gautham Shenoy for working out that the reason for the hangs and stalls we were seeing was that we were trying to do dynamic 4-way micro-threading while we were in static 2-way mode. Fixes: b4deba5c41e9 Cc: vger@stable.kernel.org # v4.3 Signed-off-by: Paul Mackerras <paulus@samba.org>
| | * KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup failsPaul Mackerras2015-11-061-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling a hypervisor data or instruction storage interrupt (HDSI or HISI), we look up the SLB entry for the address being accessed in order to translate the effective address to a virtual address which can be looked up in the guest HPT. This lookup can occasionally fail due to the guest replacing an SLB entry without invalidating the evicted SLB entry. In this situation an ERAT (effective to real address translation cache) entry can persist and be used by the hardware even though there is no longer a corresponding SLB entry. Previously we would just deliver a data or instruction storage interrupt (DSI or ISI) to the guest in this case. However, this is not correct and has been observed to cause guests to crash, typically with a data storage protection interrupt on a store to the vmemmap area. Instead, what we do now is to synthesize a data or instruction segment interrupt. That should cause the guest to reload an appropriate entry into the SLB and retry the faulting instruction. If it still faults, we should find an appropriate SLB entry next time and be able to handle the fault. Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | KVM: x86: rename update_db_bp_intercept to update_bp_interceptPaolo Bonzini2015-11-104-4/+4
| | | | | | | | | | | | | | | | | | | | | Because #DB is now intercepted unconditionally, this callback only operates on #BP for both VMX and SVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: svm: unconditionally intercept #DBPaolo Bonzini2015-11-101-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to avoid the possibility that the guest triggers an infinite stream of #DB exceptions (CVE-2015-8104). VMX is not affected: because it does not save DR6 in the VMCS, it already intercepts #DB unconditionally. Reported-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: work around infinite loop in microcode when #AC is deliveredEric Northup2015-11-103-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was found that a guest can DoS a host by triggering an infinite stream of "alignment check" (#AC) exceptions. This causes the microcode to enter an infinite loop where the core never receives another interrupt. The host kernel panics pretty quickly due to the effects (CVE-2015-5307). Signed-off-by: Eric Northup <digitaleric@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | context_tracking: avoid irq_save/irq_restore on guest entry and exitPaolo Bonzini2015-11-102-28/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guest_enter and guest_exit must be called with interrupts disabled, since they take the vtime_seqlock with write_seq{lock,unlock}. Therefore, it is not necessary to check for exceptions, nor to save/restore the IRQ state, when context tracking functions are called by guest_enter and guest_exit. Split the body of context_tracking_entry and context_tracking_exit out to __-prefixed functions, and use them from KVM. Rik van Riel has measured this to speed up a tight vmentry/vmexit loop by about 2%. Cc: Andy Lutomirski <luto@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | context_tracking: remove duplicate enabled checkPaolo Bonzini2015-11-102-16/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All calls to context_tracking_enter and context_tracking_exit are already checking context_tracking_is_enabled, except the context_tracking_user_enter and context_tracking_user_exit functions left in for the benefit of assembly calls. Pull the check up to those functions, by making them simple wrappers around the user_enter and user_exit inline functions. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Dump TSC multiplier in dump_vmcs()Haozhong Zhang2015-11-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch enhances dump_vmcs() to dump the value of TSC multiplier field in VMCS. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSCHaozhong Zhang2015-11-101-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes kvm-intel to return a scaled host TSC plus the TSC offset when handling guest readings to MSR_IA32_TSC. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Setup TSC scaling ratio when a vcpu is loadedHaozhong Zhang2015-11-101-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes kvm-intel module to load TSC scaling ratio into TSC multiplier field of VMCS when a vcpu is loaded, so that TSC scaling ratio can take effect if VMX TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Enable and initialize VMX TSC scalingHaozhong Zhang2015-11-102-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | This patch exhances kvm-intel module to enable VMX TSC scaling and collects information of TSC scaling ratio during initialization. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Use the correct vcpu's TSC rate to compute time scaleHaozhong Zhang2015-11-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes KVM use virtual_tsc_khz rather than the host TSC rate as vcpu's TSC rate to compute the time scale if TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()Haozhong Zhang2015-11-104-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()Haozhong Zhang2015-11-104-22/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Replace call-back compute_tsc_offset() with a common functionHaozhong Zhang2015-11-104-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Replace call-back set_tsc_khz() with a common functionHaozhong Zhang2015-11-105-59/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Add a common TSC scaling functionHaozhong Zhang2015-11-105-45/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_archHaozhong Zhang2015-11-103-16/+20
| | | | | | | | | | | | | | | | | | | | | | | | This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Collect information for setting TSC scaling ratioHaozhong Zhang2015-11-103-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: declare a few variables as __read_mostlyPaolo Bonzini2015-11-102-9/+7
| | | | | | | | | | | | | | | | | | | | | These include module parameters and variables that are set by kvm_x86_ops->hardware_setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_commonPaolo Bonzini2015-11-104-21/+10
| |/ | | | | | | | | | | | | | | | | They are exactly the same, except that handle_mmio_page_fault has an unused argument and a call to WARN_ON. Remove the unused argument from the callers, and move the warning to (the former) handle_mmio_page_fault_common. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'pm+acpi-4.4-rc1-2' of ↵Linus Torvalds2015-11-1235-209/+405
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management and ACPI updates from Rafael Wysocki: "The only new feature in this batch is support for the ACPI _CCA device configuration object, which it a pre-requisite for future ACPI PCI support on ARM64, but should not affect the other architectures. The rest is fixes and cleanups, mostly in cpufreq (including intel_pstate), the Operating Performace Points (OPP) framework and tools (cpupower and turbostat). Specifics: - Support for the ACPI _CCA configuration object intended to tell the OS whether or not a bus master device supports hardware managed cache coherency and a new set of functions to allow drivers to check the cache coherency support for devices in a platform firmware interface agnostic way (Suravee Suthikulpanit, Jeremy Linton). - ACPI backlight quirks for ESPRIMO Mobile M9410 and Dell XPS L421X (Aaron Lu, Hans de Goede). - Fixes for the arm_big_little and s5pv210-cpufreq cpufreq drivers (Jon Medhurst, Nicolas Pitre). - kfree()-related fixup for the recently introduced CPPC cpufreq frontend (Markus Elfring). - intel_pstate fix reducing kernel log noise on systems where P-states are managed by hardware (Prarit Bhargava). - intel_pstate maintainers information update (Srinivas Pandruvada). - cpufreq core optimization related to the handling of delayed work items used by governors (Viresh Kumar). - Locking fixes and cleanups of the Operating Performance Points (OPP) framework (Viresh Kumar). - Generic power domains framework cleanups (Lina Iyer). - cpupower tool updates (Jacob Tanenbaum, Sriram Raghunathan, Thomas Renninger). - turbostat tool updates (Len Brown)" * tag 'pm+acpi-4.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) PCI: ACPI: Add support for PCI device DMA coherency PCI: OF: Move of_pci_dma_configure() to pci_dma_configure() of/pci: Fix pci_get_host_bridge_device leak device property: ACPI: Remove unused DMA APIs device property: ACPI: Make use of the new DMA Attribute APIs device property: Adding DMA Attribute APIs for Generic Devices ACPI: Adding DMA Attribute APIs for ACPI Device device property: Introducing enum dev_dma_attr ACPI: Honor ACPI _CCA attribute setting cpufreq: CPPC: Delete an unnecessary check before the function call kfree() PM / OPP: Add opp_rcu_lockdep_assert() to _find_device_opp() PM / OPP: Hold dev_opp_list_lock for writers PM / OPP: Protect updates to list_dev with mutex PM / OPP: Propagate error properly from dev_pm_opp_set_sharing_cpus() cpufreq: s5pv210-cpufreq: fix wrong do_div() usage MAINTAINERS: update for intel P-state driver Creating a common structure initialization pattern for struct option cpupower: Enable disabled Cstates if they are below max latency cpupower: Remove debug message when using cpupower idle-set -D switch cpupower: cpupower monitor reports uninitialized values for offline cpus ...
| * \ Merge branch 'pm-tools'Rafael J. Wysocki2015-11-1211-63/+88
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-tools: Creating a common structure initialization pattern for struct option cpupower: Enable disabled Cstates if they are below max latency cpupower: Remove debug message when using cpupower idle-set -D switch cpupower: cpupower monitor reports uninitialized values for offline cpus tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIO tools/power turbostat: simplify Bzy_MHz calculation
| | * \ Merge branch 'turbostat' of ↵Rafael J. Wysocki2015-11-111-12/+18
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat changes for v4.4 from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIO tools/power turbostat: simplify Bzy_MHz calculation
| | | * | tools/power turbostat: bugfix: print MAX_NON_TURBO_RATIOLen Brown2015-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=6 lock=0) should print all 7 bits of MAX_NON_TURBO_RATIO (in decimal): MSR_TURBO_ACTIVATION_RATIO: 0x00000016 (MAX_NON_TURBO_RATIO=22 lock=0) Signed-off-by: Len Brown <len.brown@intel.com>
| | | * | tools/power turbostat: simplify Bzy_MHz calculationLen Brown2015-10-191-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bzy_MHz = TSC_delta*tsc_tweak/APERF_delta/MPERF_delta/measurement_interval becomes Bzy_MHz = base_mhz/APERF_delta/MPERF_delta on systems which support MSR_NHM_PLATFORM_INFO. base_mhz is calculated directly from the base_ratio reported in MSR_NHM_PLATFORM_INFO * bclk, and bclk is discovered via MSR or cpuid. This reduces the dependency of Bzy_MHz calculation on the TSC. Previously, there were 4 TSC readings required in each caculation, the raw TSC delta combined with the measurement_interval. This also removes the "tsc_tweak" correction factor used when TSC runs on a different base clock from the CPU's bclk. After this change, tsc_tweak is used only for %Busy. The end-result should be a Bzy_MHz result slightly less prone to jitter. Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | Creating a common structure initialization pattern for struct optionSriram Raghunathan2015-11-027-35/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to creates a common structure initialization within the cpupower tool. Previously the ``struct option`` was initialized using `designated initializer` technique which was not needed. There were conflicting initialization methods seen with bench/main.c & others. Signed-off-by: Sriram Raghunathan <sriram@marirs.net.in> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | cpupower: Enable disabled Cstates if they are below max latencyThomas Renninger2015-11-022-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpupower idle-set -D <latency> currently only disables all C-states that have a higher latency than the specified <latency>. But if deep sleep states were already disabled and have a lower latency, they should get enabled again. For example: This call: cpupower idle-set -D 30 disables all C-states with a higher or equal latency than 30. If one then calls: cpupower idle-set -D 100 C-states with a latency between 30-99 will get enabled again with this patch now. It is ensured that only C-states with a latency of 100 and higher are disabled. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | cpupower: Remove debug message when using cpupower idle-set -D switchThomas Renninger2015-11-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | cpupower: cpupower monitor reports uninitialized values for offline cpusJacob Tanenbaum2015-11-022-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [root@hp-dl980g7-02 linux]# cpupower monitor ... 5472| 0| 1|******|******|******|******|| 0.00| 0.00| 0.00| 0.00| 0.00 *is offline 10567| 0| 159|******|******|******|******|| 0.00| 0.00| 0.00| 0.00| 0.00 *is offline 1661206560|859272560| 150|******|******|******|******|| 0.00| 0.00| 0.00| 0.00| 0.00 *is offline 1661206560|943093104| 140|******|******|******|******|| 0.00| 0.00| 0.00| 0.00| 0.00 *is offline because of this cpupower also holds the incorrect value for the number of physical packages in the machine Changed cpupower to initialize the values of an offline cpu's socket and core to -1, warn the user that one or more cpus is/are offline and not print statistics for offline cpus. This fix hides offlined cores where topology cannot be accessed. With a recent kernel patch suggested from Prarit Bhargava it may be possible that soft offlined cores' topology can still be parsed. This patch would then show which cores in which package/socket are offline, when sane toplogoy information is available. Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | Merge branch 'pm-domains'Rafael J. Wysocki2015-11-121-11/+10
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-domains: PM / Domains: Allocate memory outside domain locks PM / Domains: Remove dev->driver check for runtime PM
| | * | | | PM / Domains: Allocate memory outside domain locksLina Iyer2015-11-021-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for supporting IRQ-safe domains, allocate domain data outside the domain locks. These functions are not called in an atomic context, so we can always allocate memory using GFP_KERNEL. By allocating memory before the locks, we can safely lock the domain using spinlocks instead of mutexes. Reviewed-by: Kevin Hilman <khilman@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Lina Iyer <lina.iyer@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | PM / Domains: Remove dev->driver check for runtime PMLina Iyer2015-11-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove check for driver of a device, for runtime PM. Device may be suspended without an explicit driver. This check seems to be vestigial and incorrect in the current context. Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Lina Iyer <lina.iyer@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | Merge branch 'pm-cpufreq'Rafael J. Wysocki2015-11-075-26/+46
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq: cpufreq: s5pv210-cpufreq: fix wrong do_div() usage MAINTAINERS: update for intel P-state driver cpufreq: governor: Quit work-handlers early if governor is stopped intel_pstate: decrease number of "HWP enabled" messages cpufreq: arm_big_little: fix frequency check when bL switcher is active
| | * | | | | cpufreq: s5pv210-cpufreq: fix wrong do_div() usageNicolas Pitre2015-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is wrong to use do_div() with 32-bit dividends (unsigned long is 32 bits on 32-bit architectures). Signed-off-by: Nicolas Pitre <nico@linaro.org> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | MAINTAINERS: update for intel P-state driverSrinivas Pandruvada2015-11-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Srinivas Pandruvada and Len Brown as maintainers and remove Kristen Carlson Accardi from the list of maintainers. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | cpufreq: governor: Quit work-handlers early if governor is stoppedViresh Kumar2015-11-021-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gov_queue_work() acquires cpufreq_governor_lock to allow cpufreq_governor_stop() to drain delayed work items possibly scheduled on CPUs that share the policy with a CPU being taken offline. However, the same goal may be achieved in a more straightforward way if the policy pointer in the struct cpu_dbs_info matching the policy CPU is reset upfront by cpufreq_governor_stop() under the timer_mutex belonging to it and checked against NULL, under the same lock, at the beginning of dbs_timer(). In that case every instance of dbs_timer() run for a struct cpu_dbs_info sharing the policy pointer in question after cpufreq_governor_stop() has started will notice that that pointer is NULL and bail out immediately without queuing up any new work items. In turn, gov_cancel_work() called by cpufreq_governor_stop() before destroying timer_mutex will wait for all of the delayed work items currently running on the CPUs sharing the policy to drop the mutex, so it may be destroyed safely. Make cpufreq_governor_stop() and dbs_timer() work as described and modify gov_queue_work() so it does not acquire cpufreq_governor_lock any more. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | intel_pstate: decrease number of "HWP enabled" messagesPrarit Bhargava2015-11-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When booting an HWP enabled system the kernel displays one "HWP enabled" message for each cpu. The messages are superfluous since HWP is globally enabled across all CPUs. This patch also adds an informational message when HWP is disabled via intel_pstate=no_hwp. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | cpufreq: arm_big_little: fix frequency check when bL switcher is activeJon Medhurst \(Tixy\)2015-11-021-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check for correct frequency being set in bL_cpufreq_set_rate is broken when the big.LITTLE switcher is active, for two reasons. 1. The 'new_rate' variable gets overwritten before the test by the code calculating the frequency of the old cluster. 2. The frequency returned by bL_cpufreq_get_rate will be the virtual frequency, not the actual one the intended version of new_rate contains. This means the function always returns an error causing an endless stream of: "cpufreq: __target_index: Failed to change cpu frequency: -5" As the intent is to check for errors that clk_set_rate doesn't report lets move the check to immediately after that and directly use clk_get_rate, rather than the arm_big_little helpers which only confuse matters. Also, update the comment to be hopefully clearer about the purpose of the code. Fixes: 0a95e630b49a (cpufreq: arm_big_little: check if the frequency is set correctly) Signed-off-by: Jon Medhurst <tixy@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Michael Turquette <mturquette@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | Merge branch 'pm-opp'Rafael J. Wysocki2015-11-073-19/+41
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-opp: PM / OPP: Add opp_rcu_lockdep_assert() to _find_device_opp() PM / OPP: Hold dev_opp_list_lock for writers PM / OPP: Protect updates to list_dev with mutex PM / OPP: Propagate error properly from dev_pm_opp_set_sharing_cpus() PM / OPP: Parse all power-supply related bindings together PM / OPP: Rename routines specific to old bindings with _v1 PM / OPP: Improve print messages with pr_fmt
| | * | | | | | PM / OPP: Add opp_rcu_lockdep_assert() to _find_device_opp()Viresh Kumar2015-11-061-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _find_device_opp() should be called with rcu-read lock or dev_opp_list_lock held. Add the opp_rcu_lockdep_assert() check to make sure caller have taken appropriate locks. Fix comment over the routine as well. Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Hold dev_opp_list_lock for writersViresh Kumar2015-11-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writers need to update OPP device and their list with dev_opp_list_lock mutex held, which was missed at few places. Fix it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.3 <stable@vger.kernel.org> # 4.3 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Protect updates to list_dev with mutexViresh Kumar2015-11-063-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_opp_list_lock is used everywhere to protect device and OPP lists, but dev_pm_opp_set_sharing_cpus() is missed somehow. And instead we used rcu-lock, which wouldn't help here as we are adding a new list_dev. This also fixes a problem where we have called kzalloc(..., GFP_KERNEL) from within rcu-lock, which isn't allowed as kzalloc can sleep when called with GFP_KERNEL. With CONFIG_DEBUG_ATOMIC_SLEEP set, we get following lockdep-splat: include/linux/rcupdate.h:578 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 5 locks held by swapper/0/1: #0: (&dev->mutex){......}, at: [<c02f68f4>] __driver_attach+0x48/0x98 #1: (&dev->mutex){......}, at: [<c02f6904>] __driver_attach+0x58/0x98 #2: (cpu_hotplug.lock){++++++}, at: [<c00249d0>] get_online_cpus+0x40/0xb0 #3: (subsys mutex#5){+.+.+.}, at: [<c02f4f8c>] subsys_interface_register+0x44/0xdc #4: (rcu_read_lock){......}, at: [<c0305c80>] dev_pm_opp_set_sharing_cpus+0x0/0x1e4 stack backtrace: CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.3.0-rc7-00047-g81f5932958a8 #59 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c0016874>] (unwind_backtrace) from [<c001355c>] (show_stack+0x10/0x14) [<c001355c>] (show_stack) from [<c022553c>] (dump_stack+0x94/0xbc) [<c022553c>] (dump_stack) from [<c004904c>] (___might_sleep+0x24c/0x298) [<c004904c>] (___might_sleep) from [<c00f07e4>] (kmem_cache_alloc+0xe8/0x164) [<c00f07e4>] (kmem_cache_alloc) from [<c0305354>] (_add_list_dev+0x30/0x58) [<c0305354>] (_add_list_dev) from [<c0305d50>] (dev_pm_opp_set_sharing_cpus+0xd0/0x1e4) [<c0305d50>] (dev_pm_opp_set_sharing_cpus) from [<c040eda4>] (cpufreq_init+0x4cc/0x62c) [<c040eda4>] (cpufreq_init) from [<c040a964>] (cpufreq_online+0xbc/0x73c) [<c040a964>] (cpufreq_online) from [<c02f4fe0>] (subsys_interface_register+0x98/0xdc) [<c02f4fe0>] (subsys_interface_register) from [<c040a640>] (cpufreq_register_driver+0x110/0x17c) [<c040a640>] (cpufreq_register_driver) from [<c040ef64>] (dt_cpufreq_probe+0x60/0x8c) [<c040ef64>] (dt_cpufreq_probe) from [<c02f8084>] (platform_drv_probe+0x44/0xa4) [<c02f8084>] (platform_drv_probe) from [<c02f67c0>] (driver_probe_device+0x208/0x2f4) [<c02f67c0>] (driver_probe_device) from [<c02f6940>] (__driver_attach+0x94/0x98) [<c02f6940>] (__driver_attach) from [<c02f4c1c>] (bus_for_each_dev+0x68/0x9c) Reported-by: Michael Turquette <mturquette@baylibre.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.3 <stable@vger.kernel.org> # 4.3 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Propagate error properly from dev_pm_opp_set_sharing_cpus()Viresh Kumar2015-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are returning 0 even in case of errors, fix it. Fixes: 8d4d4e98acd6 ("PM / OPP: Add helpers for initializing CPU OPPs") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.3 <stable@vger.kernel.org> # 4.3 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Parse all power-supply related bindings togetherViresh Kumar2015-11-021-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move all DT parsing for the power supplies to a single function, rather than keeping them at separate places. This will help manage things properly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Rename routines specific to old bindings with _v1Viresh Kumar2015-11-021-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clearly distinguish routines based on what version of bindings they parse. We have already postfixed routines properly with _v2 for new bindings. Postfix the older ones now with _v1. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | PM / OPP: Improve print messages with pr_fmtViresh Kumar2015-11-022-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To identify OPP core's print messages easily, prefix them with KBUILD_MODNAME. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | | | |
| | \ \ \ \ \ \
| *-. \ \ \ \ \ \ Merge branches 'acpi-video' and 'acpi-cppc'Rafael J. Wysocki2015-11-073-14/+76
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-video: ACPI / video: only register backlight for LCD device ACPI / video: Add a quirk to force acpi-video backlight on Dell XPS L421X * acpi-cppc: cpufreq: CPPC: Delete an unnecessary check before the function call kfree()
OpenPOWER on IntegriCloud