| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.
After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.
For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.
This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
From: Keith Mannthey <kmannth@us.ibm.com>
Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c
This correctly identifies the state of the ECC at the machine.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
| |
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
| |
As Nehalem has a different binding to EDAC API, and its own different
error injection code, documents it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.
While here, adds debug messages to allow seeing what registers are being
set while sending error injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.
Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.
While here, fix a small typo.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Xeon55xx fails to probe with this error message:
EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22
This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.
A later fix is needed, in order to get bank/rank information from MCA
error log.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).
Some fixes are still needed here, in order to properly fill the EDAC
fields.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.
Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.
This seems to be required for properly support of Xeon 55xx.
Still needs testing with Xeon 55xx.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.
The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.
This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This patch adds a probing code that seeks for an specific pci bus. It
still needs testing, but it is hoped that this will help to identify the
memory controller with Xeon 55xx series.
Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.
Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).
I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a glue code to allow i7core to work with mcelog. With the glue,
i7core registers itself on edac_mce. At mce, when an error is detected,
it calls all registered drivers (in this case, i7core), for EDAC error
handling.
TODO: It currently just prints the MCE error log using about the same
format as mce panic messages. The error message should be enhanced
with mcelog userspace info and converted into the proper EDAC format,
to feed the EDAC error counts.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
| |
Since mcelog is bool, edac_mce glue should also be bool, or otherwise
will not work.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
| |
edac_mce module is an interface module that gets mcelog data and
forwards to any registered edac module that expects to receive data via
mce.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
| |
csrows is still fake, since we can't identify its representation with
Nehalem registers.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, memory size is properly displayed:
EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked
EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400
EDAC i7core: Memory channel configuration:
EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
Still, as the way to retrieve csrows info is not known, it does a
mapping of what's available to csrows basic unit at edac core.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
| |
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
| |
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
| |
Properly check the number of channels and improve probing error detection
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function appears only on Xeon 5500 datasheet. Yet, testing with a
Xeon 3503 showed that this is also implemented on other Nehalem
processors.
At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any
value. Modify CE error logic to update the error count only after the
second read.
An alternative approach would be to do a write at rcv0 and rcv1
registers, but it seemed better to keep they untouched, since BIOS might
eventually assume that they are exclusive for their usage.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
| |
There are some locking troubles with edac_core: if you don't declare an
edac_check, module may suffer from soft lock.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
| |
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, it will try to register on all supported Memory Controller
functions.
It should be noticed that dev3, function 2 is present only on chips with
Registered DIMM's, according to the datasheet. So, the driver doesn't
return -ENODEV is all functions but this one were successfully
registered and enabled:
EDAC i7core: Registered device 8086:2c18 fn=3 0
EDAC i7core: Registered device 8086:2c19 fn=3 1
EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2)
EDAC i7core: Registered device 8086:2c1c fn=3 4
EDAC i7core: Registered device 8086:2c20 fn=4 0
EDAC i7core: Registered device 8086:2c21 fn=4 1
EDAC i7core: Registered device 8086:2c22 fn=4 2
EDAC i7core: Registered device 8086:2c23 fn=4 3
EDAC i7core: Registered device 8086:2c28 fn=5 0
EDAC i7core: Registered device 8086:2c29 fn=5 1
EDAC i7core: Registered device 8086:2c2a fn=5 2
EDAC i7core: Registered device 8086:2c2b fn=5 3
EDAC i7core: Registered device 8086:2c30 fn=6 0
EDAC i7core: Registered device 8086:2c31 fn=6 1
EDAC i7core: Registered device 8086:2c32 fn=6 2
EDAC i7core: Registered device 8086:2c33 fn=6 3
EDAC i7core: Driver loaded.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
| |
This patch were co-authored with Aristeu Rozanski.
Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Implements set_inject_error() with the low-level code needed to inject
memory errors at Nehalem, and adds some sysfs nodes to allow error injection
The next patch will add an API for error injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This driver is meant to support i7 core/i7core extreme desktop
processors and Xeon 35xx/55xx series with integrated memory controller.
It is likely that it can be expanded in the future to work with other
processor series based at the same Memory Controller design.
For now, it has just a few MCH status reads.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: don't needlessly skip PAGE_USER test for Fsl booke
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The bypassing of this test is a leftover from 2.4 vintage
kernels, and is no longer appropriate, or even used by KGDB.
Currently KGDB uses probe_kernel_write() for all access to
memory via the KGDB core, so it can simply be deleted.
This fixes CVE-2010-1446.
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wufei <fei.wu@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|\ \
| | |
| | |
| | |
| | | |
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: add a shrinker to background inode reclaim
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.
This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
exofs: Fix "add bdi backing to mount session" fall out
fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The patch: add bdi backing to mount session
(b3d0ab7e60d1865bb6f6a79a77aaba22f2543236)
Has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept
last.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| | |
When CONFIG_BLOCK is set, it ends up getting backing-dev.h included.
But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is
include <linux/backing-dev.h> directly from the file it's used from,
so do that.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|