<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/arch/x86/kernel/cpu/mce, branch master</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2020-01-28T20:46:42+00:00</updated>
<entry>
<title>Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-01-28T20:46:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-01-28T20:46:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=c0275ae758f8ce72306da200b195d1e1d66d0a8d'/>
<id>urn:sha1:c0275ae758f8ce72306da200b195d1e1d66d0a8d</id>
<content type='text'>
Pull x86 cpu-features updates from Ingo Molnar:
 "The biggest change in this cycle was a large series from Sean
  Christopherson to clean up the handling of VMX features. This both
  fixes bugs/inconsistencies and makes the code more coherent and
  future-proof.

  There are also two cleanups and a minor TSX syslog messages
  enhancement"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/cpu: Remove redundant cpu_detect_cache_sizes() call
  x86/cpu: Print "VMX disabled" error message iff KVM is enabled
  KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
  perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
  KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
  KVM: VMX: Check for full VMX support when verifying CPU compatibility
  KVM: VMX: Use VMX feature flag to query BIOS enabling
  KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
  x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
  x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
  x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
  x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
  x86/vmx: Introduce VMX_FEATURES_*
  x86/cpu: Clear VMX feature flag if VMX is not fully enabled
  x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
  x86/centaur: Use common IA32_FEAT_CTL MSR initialization
  x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
  x86/intel: Initialize IA32_FEAT_CTL MSR at boot
  tools/x86: Sync msr-index.h from kernel sources
  selftests, kvm: Replace manual MSR defs with common msr-index.h
  ...
</content>
</entry>
<entry>
<title>Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-01-27T17:19:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-01-27T17:19:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=30f5a75640998900d995e099e060e920e72790b2'/>
<id>urn:sha1:30f5a75640998900d995e099e060e920e72790b2</id>
<content type='text'>
Pull RAS updates from Borislav Petkov:

 - Misc fixes to the MCE code all over the place, by Jan H. Schönherr.

 - Initial support for AMD F19h and other cleanups to amd64_edac, by
   Yazen Ghannam.

 - Other small cleanups.

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Make fam_ops static global
  EDAC/amd64: Drop some family checks for newer systems
  EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh
  x86/amd_nb: Add Family 19h PCI IDs
  EDAC/mce_amd: Always load on SMCA systems
  x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType
  x86/mce: Fix use of uninitialized MCE message string
  x86/mce: Fix mce=nobootlog
  x86/mce: Take action on UCNA/Deferred errors again
  x86/mce: Remove mce_inject_log() in favor of mce_log()
  x86/mce: Pass MCE message to mce_panic() on failed kernel recovery
  x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType</title>
<updated>2020-01-16T16:09:02+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-01-10T01:56:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=89a76171bf50bd20d44338408b8c09433c302956'/>
<id>urn:sha1:89a76171bf50bd20d44338408b8c09433c302956</id>
<content type='text'>
Add support for a new version of the Load Store unit bank type as
indicated by its McaType value, which will be present in future SMCA
systems.

Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is
logically the same to the user.

Also, add the new error descriptions to edac_mce_amd.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/mce/therm_throt: Do not access uninitialized therm_work</title>
<updated>2020-01-15T10:31:33+00:00</updated>
<author>
<name>Chuansheng Liu</name>
<email>chuansheng.liu@intel.com</email>
</author>
<published>2020-01-07T00:41:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=978370956d2046b19313659ce65ed12d5b996626'/>
<id>urn:sha1:978370956d2046b19313659ce65ed12d5b996626</id>
<content type='text'>
It is relatively easy to trigger the following boot splat on an Ice Lake
client platform. The call stack is like:

  kernel BUG at kernel/timer/timer.c:1152!

  Call Trace:
  __queue_delayed_work
  queue_delayed_work_on
  therm_throt_process
  intel_thermal_interrupt
  ...

The reason is that a CPU's thermal interrupt is enabled prior to
executing its hotplug onlining callback which will initialize the
throttling workqueues.

Such a race can lead to therm_throt_process() accessing an uninitialized
therm_work, leading to the above BUG at a very early bootup stage.

Therefore, unmask the thermal interrupt vector only after having setup
the workqueues completely.

 [ bp: Heavily massage commit message and correct comment formatting. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Chuansheng Liu &lt;chuansheng.liu@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
</content>
</entry>
<entry>
<title>x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked</title>
<updated>2020-01-13T16:47:18+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-12-21T04:44:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=6d527cebfa04ba4792be9e79e0d7cab22ab6c377'/>
<id>urn:sha1:6d527cebfa04ba4792be9e79e0d7cab22ab6c377</id>
<content type='text'>
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU
initialization unconditionally locks the MSR.

Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20191221044513.21680-6-sean.j.christopherson@intel.com
</content>
</entry>
<entry>
<title>x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR</title>
<updated>2020-01-13T16:23:08+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-12-21T04:44:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=32ad73db7fc5fe7eebafdab3b528f99ab8498e3f'/>
<id>urn:sha1:32ad73db7fc5fe7eebafdab3b528f99ab8498e3f</id>
<content type='text'>
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".

Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
</content>
</entry>
<entry>
<title>x86/mce: Fix use of uninitialized MCE message string</title>
<updated>2020-01-13T09:07:56+00:00</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2020-01-03T15:07:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=7a8bc2b0462eaca0072d1f7f4ddc749fcb8a773c'/>
<id>urn:sha1:7a8bc2b0462eaca0072d1f7f4ddc749fcb8a773c</id>
<content type='text'>
The function mce_severity() is not required to update its msg argument.
In fact, mce_severity_amd() does not, which makes mce_no_way_out()
return uninitialized data, which may be used later for printing.

Assuming that implementations of mce_severity() either always or never
update the msg argument (which is currently the case), it is sufficient
to initialize the temporary variable in mce_no_way_out().

While at it, avoid printing a useless "Unknown".

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200103150722.20313-4-jschoenh@amazon.de
</content>
</entry>
<entry>
<title>x86/mce: Fix mce=nobootlog</title>
<updated>2020-01-13T09:07:56+00:00</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2020-01-03T15:07:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=90454e49593845e4f1cd162601249450b30898f3'/>
<id>urn:sha1:90454e49593845e4f1cd162601249450b30898f3</id>
<content type='text'>
Since commit

  8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic
		 pool twice")

the mce=nobootlog option has become mostly ineffective (after being only
slightly ineffective before), as the code is taking actions on MCEs left
over from boot when they have a usable address.

Move the check for MCP_DONTLOG a bit outward to make it effective again.

Also, since commit

  011d82611172 ("RAS: Add a Corrected Errors Collector")

the two branches of the remaining "if" at the bottom of machine_check_poll()
do same. Unify them.

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200103150722.20313-3-jschoenh@amazon.de
</content>
</entry>
<entry>
<title>x86/mce: Take action on UCNA/Deferred errors again</title>
<updated>2020-01-13T09:07:23+00:00</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2020-01-03T15:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=8438b84ab42d9a67df33633258e0865c5761d2d4'/>
<id>urn:sha1:8438b84ab42d9a67df33633258e0865c5761d2d4</id>
<content type='text'>
Commit

  fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA
		and Deferred error in machine_check_poll")

added handling of UCNA and Deferred errors by adding them to the ring
for SRAO errors.

Later, commit

  fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")

switched storage from the SRAO ring to the unified pool that is still
in use today. In order to only act on the intended errors, a filter
for MCE_AO_SEVERITY is used -- effectively removing handling of
UCNA/Deferred errors again.

Extend the severity filter to include UCNA/Deferred errors again.
Also, generalize the naming of the notifier from SRAO to UC to capture
the extended scope.

Note, that this change may cause a message like the following to appear,
as the same address may be reported as SRAO and as UCNA:

 Memory failure: 0x5fe3284: already hardware poisoned

Technically, this is a return to previous behavior.

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
</content>
</entry>
<entry>
<title>x86/mce: Remove mce_inject_log() in favor of mce_log()</title>
<updated>2019-12-17T09:26:41+00:00</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2019-12-10T00:07:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=81736abd5578b1dbdf95eb714da45787054198ba'/>
<id>urn:sha1:81736abd5578b1dbdf95eb714da45787054198ba</id>
<content type='text'>
The mutex in mce_inject_log() became unnecessary with commit

  5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"),

though the original reason for its presence only vanished with commit

  7298f08ea887 ("x86/mcelog: Get rid of RCU remnants").

Drop the mutex. And as that makes mce_inject_log() identical to mce_log(),
get rid of the former in favor of the latter.

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20191210000733.17979-7-jschoenh@amazon.de
</content>
</entry>
</feed>
