summaryrefslogtreecommitdiffstats
path: root/drivers/edac/amd64_edac.c
Commit message (Collapse)AuthorAgeFilesLines
...
* EDAC, amd64: Reserve correct PCI devices on AMD Fam17hYazen Ghannam2016-11-281-18/+69
| | | | | | | | | | | | | | | | | Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older systems. Update struct amd64_pvt to hold the new functions and reserve them if on Fam17h. Also, allocate an array of UMC structs within our newly allocated PVT struct. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com [ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Add AMD Fam17h family type and opsYazen Ghannam2016-11-241-0/+44
| | | | | | | | | | | | | Add a family type and associated ops for Fam17h. Define a struct to hold all the UMC registers that we need. Make this a part of struct amd64_pvt in order to maximize code reuse in the rest of the driver. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Extend ecc_enabled() to Fam17hYazen Ghannam2016-11-241-10/+40
| | | | | | | | | | | | | | Update the ecc_enabled() function to work on Fam17h. This entails reading a different set of registers and using the SMN (System Management Network) rather than PCI devices. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com [ Fixup ecc_en assignment and get_umc_base(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Don't force-enable ECC checking on newer systemsYazen Ghannam2016-11-231-3/+8
| | | | | | | | | | | | | | It's not recommended for the OS to try and force-enable ECC checking. This is considered a firmware task since it includes memory training, etc, so don't change ECC settings on Fam17h or newer systems and inform the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com [ Put the "forcing" message in an else branch. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Add Deferred Error typeYazen Ghannam2016-11-211-0/+2
| | | | | | | | | | | | Currently, deferred errors are classified as correctable in EDAC. Add a new error type for deferred errors so that they are correctly reported to the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Rename __log_bus_error() to be more specificYazen Ghannam2016-11-211-2/+2
| | | | | | | | | | | | | We only use __log_bus_error() to log DRAM ECC errors, so let's change the name to reflect this. We'll also use this function for DRAM ECC errors on Fam17h, but we'll call it from a different function than decode_bus_error(). Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Change target of pci_name from F2 to F3Yazen Ghannam2016-11-211-1/+1
| | | | | | | | | | | | AMD Fam17h will not be using PCI function 2 for EDAC, but will continue to use function 3. So let's get the name of F3 instead of F2 to support Fam17h and previous families. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Autoload module using x86_cpu_idYazen Ghannam2016-09-211-0/+9
| | | | | | | | | Reinstate driver autoloading now that PCI dependency is gone. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64: Fix channel decode on Fam15hMod60h systemsYazen Ghannam2016-08-081-3/+12
| | | | | | | | | | | | Fam15hMod60h systems are using the channel decode of Fam15hMod30h which gives incorrect results. Fam15hMod60h systems should use the generic channel decode method plus a couple more cases. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64_edac: Init opstate at the proper time during initBorislav Petkov2016-06-161-2/+2
| | | | | | | It is useless to do it if we're loaded on unsupported hardware so do that only after we have detected at least 1 supported AMD northbridge. Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge branch 'for-linus' of ↵Linus Torvalds2016-05-171-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
| * treewide: Fix typos in printkMasanari Iida2016-04-181-1/+1
| | | | | | | | | | | | | | | | | | This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | EDAC, amd64_edac: Drop pci_register_driver() useBorislav Petkov2016-05-091-82/+43
| | | | | | | | | | | | | | | | | | | | | | | | - remove homegrown instances counting. - take F3 PCI device from amd_nb caching instead of F2 which was used with the PCI core. With those changes, the driver doesn't need to register a PCI driver and relies on the northbridges caching which we do anyway on AMD. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
* | EDAC, amd64_edac: Issue driver banner only on successBorislav Petkov2016-04-271-2/+2
|/ | | | | | | ... and don't mislead users into thinking that the driver has loaded successfully. Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()Dan Carpenter2016-01-251-1/+1
| | | | | | | | | | | | | | | dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge branch 'ras-core-for-linus' of ↵Linus Torvalds2015-11-031-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...
| * x86/amd_nb, EDAC: Rename amd_get_node_id()Aravind Gopalakrishnan2015-10-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function doesn't give us the "Node ID" as the function name suggests. Rather, it receives a PCI device as argument, checks the available F3 PCI device IDs in the system and returns the index of the matching Bus/Device IDs. Rename it to amd_pci_dev_to_node_id(). No functional change is introduced. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | EDAC, amd64_edac: Extend scrub rate support to F15hM60hAravind Gopalakrishnan2015-09-291-10/+25
|/ | | | | | | | | | | | | | | | | | The scrub rate control register has moved to function 2 in PCI config space and is at a different offset on family 0x15, models 0x60 and later. The minimum recommended scrub rate has also changed. (Refer to D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG). Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate this. Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Cleanup conditionals. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: enforce synchronous probeLuis R. Rodriguez2015-05-201-0/+1
| | | | | | | | | | | | | | | While testing asynchronous PCI probe on this driver I noticed it failed because the driver checks if any of the PCI devices have been bound to the driver after registering it, which obviously does not work if probing is asynchronous. While there are patches and discussions on how the driver should behave are ongoing, let's enforce synchronous probe for this driver for now. Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* EDAC, amd64_edac: Get rid of per-node driver instancesBorislav Petkov2015-02-231-20/+13
| | | | | | | ... and do the proper thing using EDAC core facilities. Cc: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC: amd64: Use static attribute groupsTakashi Iwai2015-02-231-36/+11
| | | | | | | | | | | Instead of calling device_create_file() and device_remove_file() manually, pass the static attribute groups with the new edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs files is done by a proper is_visible callback. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, amd64_edac: Prevent OOPS with >16 memory controllersDaniel J Blueman2015-02-171-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com Cc: stable@vger.kernel.org [ Boris: massage commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Build module on x86-32Tomasz Pala2014-11-051-0/+5
| | | | | | | | | | | | By popular demand, enable amd64_edac on 32-bit too. Boris: - update Kconfig text. - add a warning on load which states that 32-bit configurations are unsupported. Signed-off-by: Tomasz Pala <gotar@polanet.pl> Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Add F15h M60h supportAravind Gopalakrishnan2014-10-301-79/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for ECC error decoding for F15h M60h processor. Aside from the usual changes, the patch adds support for some new features in the processor: - DDR4(unbuffered, registered); LRDIMM DDR3 support - relevant debug messages have been modified/added to report these memory types - new dbam_to_cs mappers - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find cs_size. This multiplier value is obtained from the per-dimm DCSM register. So, change the interface to accept a 'cs_mask_nr' value to facilitate this calculation - switch-casing determine_memory_type() - done to cleanse the function of too many if-else statements and improve readability - This is now called early in read_mc_regs() to cache dram_type Misc cleanup: - amd64_pci_table[] is condensed by using PCI_VDEVICE macro. Testing details: Tested the patch by injecting 'ECC' type errors using mce_amd_inj and error decoding works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: determine_memory_type() cleanups ] Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Modify usage of amd64_read_dct_pci_cfg()Aravind Gopalakrishnan2014-09-231-66/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rationale behind this change: - F2x1xx addresses were stopped from being mapped explicitly to DCT1 from F15h (OR) onwards. They use _dct[0:1] mechanism to access the registers. So we should move away from using address ranges to select DCT for these families. - On newer processors, the address ranges used to indicate DCT1 (0x140, 0x1a0) have different meanings than what is assumed currently. Changes introduced: - amd64_read_dct_pci_cfg() now takes in dct value and uses it for 'selecting the dct' - Update usage of the function. Keep in mind that different families have specific handling requirements - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different from amd64_read_pci_cfg() - Move the k8 specific check to amd64_read_pci_cfg - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg() - Remove now needless .read_dct_pci_cfg Testing: - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG' and mce_amd_inj - driver obtains info from F2x registers and caches it in pvt structures correctly - ECC decoding works fine Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Add support for newer F16h modelsAravind Gopalakrishnan2014-02-271-0/+24
| | | | | | | | | | | Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC turned on using mce_amd_inj module and the patch works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com Tested-by: Arindam Nath <Arindam.Nath@amd.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Fix logic to determine channel for F15 M30h processorsAravind Gopalakrishnan2014-02-071-3/+11
| | | | | | | | | | | | Update current channel selection logic to include F15h, M30h memory controllers. Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low) (Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf) Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Remove "amd64" prefix from static functionsBorislav Petkov2013-12-151-62/+56
| | | | | | | No need for the namespace tagging there. Cleanup setup_pci_device while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Simplify code around decode_bus_errorBorislav Petkov2013-12-151-9/+4
| | | | | | Drop wrapper function and prefixes. Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Mark amd64_decode_bus_error as staticRashika Kheria2013-12-151-1/+1
| | | | | | | | | | | | | This patch marks the function amd64_decode_bus_error() as static because it is not used outside of amd64_edac.c. It also eliminates the following warning: drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC: Remove DEFINE_PCI_DEVICE_TABLE macroJingoo Han2013-12-061-1/+1
| | | | | | | | | | Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Fix condition to verify max channels allowed for F15 M30hAravind Gopalakrishnan2013-12-061-1/+1
| | | | | | | | | | | | | | | | | | | The value returned from 'f15_m30h_determine_channel' will always be 0x3 max. The condition (channel > 4 || channel < 0) works as hardware never returns a value of 4, but it leads to static checker analysis errors like http://marc.info/?l=linux-edac&m=138607615131951&w=2. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain [ Boris: massage commit message a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* bitops: Introduce a more generic BITMASK macroChen, Gong2013-10-211-22/+24
| | | | | | | | | | | | | | | | GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* amd64_edac: Fix incorrect wraparoundsAravind Gopalakrishnan2013-08-271-5/+6
| | | | | | | | | | | | | | dct_base and dct_limit obtain 32 bit register values when they read their respective pci config space registers. A left shift beyond 32 bits will cause them to wrap around. Similar case for chan_addr as can be seen from the bug report (link below). In the patch, we rectify this by casting chan_addr to u64 and by comparing dct_base and dct_limit against properly shifted sys_addr in order to compare the correct bits. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Correct erratum 505 rangeBorislav Petkov2013-08-271-4/+4
| | | | | | | | Basically we want to cover all 0x0-0xf models, i.e. Orochi and later. Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge tag 'edac_for_3.12' of ↵Ingo Molnar2013-08-151-1/+8
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS/EDAC updates from Boris Petkov: "An amd64_edac fix for single channel configurations + trivial cleanups courtesy of Jingoo Han." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * amd64_edac: Fix single-channel setupsBorislav Petkov2013-07-291-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can happen that configurations are running in a single-channel mode even with a dual-channel memory controller, by, say, putting the DIMMs only on the one channel and leaving the other empty. This causes a problem in init_csrows which implicitly assumes that when the second channel is enabled, i.e. channel 1, the struct dimm hierarchy will be present. Which is not. So always allocate two channels unconditionally. This provides for the nice side effect that the data structures are initialized so some day, when memory hotplug is supported, it should just work out of the box when all of a sudden a second channel appears. Reported-and-tested-by: Roger Leigh <rleigh@debian.org> Signed-off-by: Borislav Petkov <bp@suse.de>
* | amd64_edac: Get rid of boot_cpu_data accessesBorislav Petkov2013-08-121-47/+43
| | | | | | | | | | | | | | | | | | Now that we cache (family, model, stepping) locally, use them instead of boot_cpu_data. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
* | amd64_edac: Add ECC decoding support for newer F15h modelsAravind Gopalakrishnan2013-08-121-30/+216
|/ | | | | | | | | | | | | | | | | | | | | | | On newer models, support has been included for upto 4 DCT's, however, only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10). Also, the routing DRAM Requests algorithm is different for F15h M30h. Thus it is cleaner to use a brand new function rather than adding quirks to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM Routing Requests" in the BKDG for further info. Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and verified to be functionally correct. While at it, verify if erratum workarounds for E505 and E637 still hold. From email conversations within AMD, the current status of the errata is: * Erratum 505: fixed in model 0x1, stepping 0x1 and later. * Erratum 637: not fixed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanups, corrections ] Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Add Family 16h supportAravind Gopalakrishnan2013-04-191-1/+64
| | | | | | | | | | | Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC: Merge mci.mem_is_per_rank with mci.csbasedMauro Carvalho Chehab2013-03-161-1/+0
| | | | | | | | | | | | | | | Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* amd64_edac: Correct DIMM sizesMauro Carvalho Chehab2013-03-161-5/+9
| | | | | | | | | | | We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds2013-02-201-105/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Pull EDAC updates from Borislav Petkov: "Mostly AMD's side of EDAC. It is basically a new family enablement stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is trivial cleanups." * tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: mpc85xx_edac: Fix typo EDAC, MCE, AMD: Remove unneeded exports EDAC, MCE, AMD: Add MCE decoding support for Family 16h EDAC, MCE, AMD: Make MC2 decoding per-family amd64_edac: Remove dead code
| * amd64_edac: Remove dead codeBorislav Petkov2013-01-221-105/+0
| | | | | | | | | | | | | | | | 5e2af0c09e60 ("edac: Don't initialize csrow's first_page & friends when not needed") removed useless initialization of variables but left in the functions which did that. They're unused now so drop them. Signed-off-by: Borislav Petkov <bp@alien8.de>
* | amd64_edac: Fix type usage in NB IDs and memory rangesDaniel J Blueman2013-01-101-13/+13
| | | | | | | | | | | | | | | | | | | | | | Use appropriate types for northbridge IDs and memory ranges. Mark immutable data const and keep within compilation unit on related structures. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com [Boris: Drop arg change to node_to_amd_nb] Signed-off-by: Borislav Petkov <bp@alien8.de>
* | amd64_edac: Fix PCI function lookupDaniel J Blueman2013-01-101-34/+38
| | | | | | | | | | | | | | | | | | | | | | | | Fix locating sibling memory controller PCI functions by using the correct PCI domain and use a northbridge descriptor only if found. We need to at least warn if it wasn't found so that it gets fixed and we don't go off with wrong results. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com [Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails] Signed-off-by: Borislav Petkov <bp@alien8.de>
* | x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_idDaniel J Blueman2013-01-101-2/+3
| | | | | | | | | | | | | | | | | | Change amd_get_nb_id to return u16 to support >255 memory controllers, and related consistency fixes. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Borislav Petkov <bp@alien8.de>
* | x86, AMD, NB: Add multi-domain supportDaniel J Blueman2013-01-101-3/+3
|/ | | | | | | | | | | Fix get_node_id to match northbridge IDs from the array of detected ones, allowing multi-server support such as with Numascale's NumaConnect, renaming to 'amd_get_node_id' for consistency. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com [Boris: shorten lines to fit 80 cols] Signed-off-by: Borislav Petkov <bp@alien8.de>
* Drivers: edac: remove __dev* attributes.Greg Kroah-Hartman2013-01-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* EDAC: Fix csrow size reported in sysfsBorislav Petkov2012-11-281-0/+1
| | | | | | | | | | On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
OpenPOWER on IntegriCloud