summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIESMichael Ellerman2018-08-071-4/+3
| | | | | | To just EXCEPTION_RELON_PROLOG(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1Michael Ellerman2018-08-071-10/+10
| | | | | | | | | | The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as EXCEPTION_PROLOG_2 (which we just recently created), except for "RELON" (relocation on) exceptions. So rename it as such. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Remove PSERIES from the NORI macrosMichael Ellerman2018-08-071-5/+5
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2Michael Ellerman2018-08-071-10/+10
| | | | | | | | | | | | | As with the other patches in this series, we are removing the "PSERIES" from the name as it's no longer meaningful. In this case it's not simply a case of removing the "PSERIES" as that would result in a clash with the existing EXCEPTION_PROLOG_1. Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in sequence after 0 and 1. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to STD_RELON_EXCEPTION_OOLMichael Ellerman2018-08-072-2/+2
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTIONMichael Ellerman2018-08-072-2/+2
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOLMichael Ellerman2018-08-072-2/+2
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTIONMichael Ellerman2018-08-072-2/+2
| | | | | | | | | | | | | | The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros from the legacy iSeries versions, which are called STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs powernv or powermac etc. We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x ("powerpc: Remove the main legacy iSerie platform code"). So remove "PSERIES" from the macros. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()Michael Ellerman2018-08-071-2/+1
| | | | | | | | | EXCEPTION_RELON_PROLOG_PSERIES() only has two users, STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()Michael Ellerman2018-08-071-2/+1
| | | | | | | | EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES() and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/lib: Implement strlen() in assembly for PPC32Christophe Leroy2018-08-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation of strlen() reads strings byte per byte. This patch implements strlen() in assembly based on a read of entire words, in the same spirit as what some other arches and glibc do. On a 8xx the time spent in strlen is reduced by 3/4 for long strings. strlen() selftest on an 8xx provides the following values: Before the patch (ie with the generic strlen() in lib/string.c): len 256 : time = 1.195055 len 016 : time = 0.083745 len 008 : time = 0.046828 len 004 : time = 0.028390 After the patch: len 256 : time = 0.272185 ==> 78% improvment len 016 : time = 0.040632 ==> 51% improvment len 008 : time = 0.033060 ==> 29% improvment len 004 : time = 0.029149 ==> 2% degradation On a 832x: Before the patch: len 256 : time = 0.236125 len 016 : time = 0.018136 len 008 : time = 0.011000 len 004 : time = 0.007229 After the patch: len 256 : time = 0.094950 ==> 60% improvment len 016 : time = 0.013357 ==> 26% improvment len 008 : time = 0.010586 ==> 4% improvment len 004 : time = 0.008784 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Defer the logging of rtas error to irq work queue.Mahesh Salgaonkar2018-08-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area. On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. Apart from that, as Nick pointed out, pSeries_log_error() also takes a spin_lock while logging error which is not safe in NMI context. It may endup in deadlock if we get another MCE before releasing the lock. Fix this by deferring the logging of rtas error to irq work queue. Current implementation uses two different buffers to hold rtas error log depending on whether extended log is provided or not. This makes bit difficult to identify which buffer has valid data that needs to logged later in irq work. Simplify this using single buffer, one per paca, and copy rtas log to it irrespective of whether extended log is provided or not. Allocate this buffer below RMA region so that it can be accessed in real mode mce handler. Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org # v4.14+ Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xive: Remove xive_kexec_teardown_cpu()Benjamin Herrenschmidt2018-08-071-1/+0
| | | | | | | It's identical to xive_teardown_cpu() so just use the latter Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s/radix: tlb do not flush on page size when fullmmNicholas Piggin2018-08-071-0/+3
| | | | | | | | | When the mm is being torn down there will be a full PID flush so there is no need to flush the TLB on page size changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Add support to enable sensor groupsShilpasri G Bhat2018-07-312-0/+3
| | | | | | | | | | Adds support to enable/disable a sensor group at runtime. This can be used to select the sensor groups that needs to be copied to main memory by OCC. Sensor groups like power, temperature, current, voltage, frequency, utilization can be enabled/disabled at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powernv/cpuidle: Use parsed device tree values for cpuidle_initAkshay Adiga2018-07-311-0/+2
| | | | | | | | | | | Export pnv_idle_states and nr_pnv_idle_states so that its accessible to cpuidle driver. Use properties from pnv_idle_states structure for powernv cpuidle_init. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powernv/cpuidle: Parse dt idle properties into global structureAkshay Adiga2018-07-311-0/+11
| | | | | | | | | | | | | | Device-tree parsing happens twice, once while deciding idle state to be used for hotplug and once during cpuidle init. Hence, parsing the device tree and caching it will reduce code duplication. Parsing code has been moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition to the properties in the device tree the number of available states is also required. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Don't report PUDs as memory leaks when using kmemleakMichael Ellerman2018-07-301-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Paul Menzel reported that kmemleak was producing reports such as: unreferenced object 0xc0000000f8b80000 (size 16384): comm "init", pid 1, jiffies 4294937416 (age 312.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d997deb7>] __pud_alloc+0x80/0x190 [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0 [<00000000091e51c2>] shift_arg_pages+0xc0/0x210 [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0 [<0000000060871529>] load_elf_binary+0x41c/0x1648 [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280 [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940 [<000000005f953a6e>] sys_execve+0x58/0x70 [<000000009700a858>] system_call+0x5c/0x70 Indicating that a PUD was being leaked. However what's really happening is that kmemleak is not able to recognise the references from the PGD to the PUD, because they are not fully qualified pointers. We can confirm that in xmon, eg: Find the task struct for pid 1 "init": 0:mon> P task_struct ->thread.ksp PID PPID S P CMD c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd Dump virtual address 0 to find the PGD: 0:mon> dv 0 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 Dump the memory of the PGD: 0:mon> d c0000000f8b01000 c0000000f8b01000 00000000f8b90000 0000000000000000 |................| c0000000f8b01010 0000000000000000 0000000000000000 |................| c0000000f8b01020 0000000000000000 0000000000000000 |................| c0000000f8b01030 0000000000000000 00000000f8b80000 |................| ^^^^^^^^^^^^^^^^ There we can see the reference to our supposedly leaked PUD. But because it's missing the leading 0xc, kmemleak won't recognise it. We can confirm it's still in use by translating an address that is mapped via it: 0:mon> dv 7fff94000000 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <-- pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000 pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000 ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386 Maps physical address = 0x00000001d5ce0000 Flags = Accessed Dirty Read Write The fix is fairly simple. We need to tell kmemleak to ignore PUD allocations and never report them as leaks. We can also tell it not to scan the PGD, because it will never find pointers in there. However it will still notice if we allocate a PGD and then leak it. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: split asm/tlbflush.hChristophe Leroy2018-07-304-81/+94
| | | | | | | | | | Split asm/tlbflush.h into: asm/nohash/tlbflush.h asm/book3s/32/tlbflush.h asm/book3s/64/tlbflush.h (already existing) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: remove unnecessary inclusion of asm/tlbflush.hChristophe Leroy2018-07-302-2/+1
| | | | | | | | | | asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/44x: remove page.h from mmu-44x.hChristophe Leroy2018-07-301-5/+4
| | | | | | | mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/nohash: fix hash related comments in pgtable.hChristophe Leroy2018-07-302-18/+4
| | | | | Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: fix includes in asm/processor.hChristophe Leroy2018-07-302-3/+3
| | | | | | | Remove superflous includes and add missing ones Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/book3s: Remove PPC_PIN_SIZEChristophe Leroy2018-07-302-6/+1
| | | | | | | PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: declare set_breakpoint() staticChristophe Leroy2018-07-301-1/+0
| | | | | | | set_breakpoint() is only used in process.c so make it static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: remove superflous inclusions of asm/fixmap.hChristophe Leroy2018-07-301-2/+0
| | | | | | | Files not using fixmap consts or functions don't need asm/fixmap.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: clean inclusions of asm/feature-fixups.hChristophe Leroy2018-07-309-4/+6
| | | | | | | | files not using feature fixup don't need asm/feature-fixups.h files using feature fixup need asm/feature-fixups.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: clean the inclusion of stringify.hChristophe Leroy2018-07-305-3/+3
| | | | | | | Only include linux/stringify.h is files using __stringify() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: move ASM_CONST and stringify_in_c() into asm-const.hChristophe Leroy2018-07-3028-24/+43
| | | | | | | | | This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/405: move PPC405_ERR77 in asm-405.hChristophe Leroy2018-07-309-15/+25
| | | | | Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: remove unneeded inclusions of cpu_has_feature.hChristophe Leroy2018-07-303-3/+0
| | | | | | | Files not using cpu_has_feature() don't need cpu_has_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: remove kdump.h from page.hChristophe Leroy2018-07-301-1/+0
| | | | | | | page.h doesn't need kdump.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: implement opal_put_chars_atomicNicholas Piggin2018-07-241-0/+1
| | | | | | | | | | | | The RAW console does not need writes to be atomic, so relax opal_put_chars to be able to do partial writes, and implement an _atomic variant which does not take a spinlock. This API is used in xmon, so the less locking that is used, the better chance there is that a crash can be debugged. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Implement and use opal_flush_consoleNicholas Piggin2018-07-241-0/+1
| | | | | | | | | | | | | | | | | | | A new console flushing firmware API was introduced to replace event polling loops, and implemented in opal-kmsg with affddff69c55e ("powerpc/powernv: Add a kmsg_dumper that flushes console output on panic"), to flush the console in the panic path. The OPAL console driver has other situations where interrupts are off and it needs to flush the console synchronously. These still use a polling loop. So move the opal-kmsg flush code to opal_flush_console, and use the new function in opal-kmsg and opal_put_chars. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64: enhance memcmp() with VMX instruction for long bytes comparisionSimon Guo2018-07-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add VMX primitives to do memcmp() in case the compare size is equal or greater than 4K bytes. KSM feature can benefit from this. Test result with following test program(replace the "^>" with ""): ------ ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c >#include <malloc.h> >#include <stdlib.h> >#include <string.h> >#include <time.h> >#include "utils.h" >#define SIZE (1024 * 1024 * 900) >#define ITERATIONS 40 int test_memcmp(const void *s1, const void *s2, size_t n); static int testcase(void) { char *s1; char *s2; unsigned long i; s1 = memalign(128, SIZE); if (!s1) { perror("memalign"); exit(1); } s2 = memalign(128, SIZE); if (!s2) { perror("memalign"); exit(1); } for (i = 0; i < SIZE; i++) { s1[i] = i & 0xff; s2[i] = i & 0xff; } for (i = 0; i < ITERATIONS; i++) { int ret = test_memcmp(s1, s2, SIZE); if (ret) { printf("return %d at[%ld]! should have returned zero\n", ret, i); abort(); } } return 0; } int main(void) { return test_harness(testcase, "memcmp"); } ------ Without this patch (but with the first patch "powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()." in the series): 4.726728762 seconds time elapsed ( +- 3.54%) With VMX patch: 4.234335473 seconds time elapsed ( +- 2.63%) There is ~+10% improvement. Testing with unaligned and different offset version (make s1 and s2 shift random offset within 16 bytes) can archieve higher improvement than 10%.. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: add vcmpequd/vcmpequb ppc instruction macroSimon Guo2018-07-241-0/+11
| | | | | | | | | | Some old tool chains don't know about instructions like vcmpequd. This patch adds .long macro for vcmpequd and vcmpequb, which is a preparation to optimize ppc64 memcmp with VMX instructions. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencodingAneesh Kumar K.V2018-07-241-0/+10
| | | | | | | No functional change Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP configAneesh Kumar K.V2018-07-241-3/+10
| | | | | | | | We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not impacted. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: NMI IPI make NMI IPIs fully sychronousNicholas Piggin2018-07-241-1/+0
| | | | | | | | | | | | | | | There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits for all CPUs to call in to the handler, but it does not wait for completion of the handler. This is a needless complication, so remove it and always wait synchronously. The synchronous wait allows the caller to easily time out and clear the wait for completion (zero nmi_ipi_busy_count) in the case of badly behaved handlers. This would have prevented the recent smp_send_stop NMI IPI bug from causing the system to hang. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closelyNicholas Piggin2018-07-241-4/+6
| | | | | | | | | | | | | | | | | | | When the masked interrupt handler clears MSR[EE] for an interrupt in the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS. This makes them get out of synch. With that taken into account, it's only low level irq manipulation (and interrupt entry before reconcile) where they can be out of synch. This makes the code less surprising. It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value and not have to mtmsrd again in this case (e.g., for an external interrupt that has been masked). The bigger benefit might just be that there is not such an element of surprise in these two bits of state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pkeys: make protection key 0 less specialRam Pai2018-07-241-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. (c) its permissions cannot be modified by userspace. NOTE: (c) is specific to powerpc only. pkey-0 is associated by default with all pages including kernel pages, and pkeys are also active in kernel mode. If any permission is denied on pkey-0, the kernel running in the context of the application will be unable to operate. Tested on powerpc. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Drop #define PKEY_0 0 in favour of plain old 0] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pkeys: key allocation/deallocation must not change pkey registersRam Pai2018-07-241-11/+0
| | | | | | | | | | | | | | | | | | | | Key allocation and deallocation has the side effect of programming the UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of the application and not that of the kernel, to modify the permission on the key. Do not modify the pkey registers at key allocation/deallocation. This patch also fixes a bug where a sys_pkey_free() resets the UAMOR bits of the key, thus making its permissions unmodifiable from user space. Later if the same key gets reallocated from a different thread this thread will no longer be able to change the permissions on the key. Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2018-07-197-77/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | Merge in some commits we're sharing with the KVM tree. I manually propagated the change from commit d3d4ffaae439 ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into pci-ioda-tce.c. Conflicts: arch/powerpc/include/asm/cputable.h arch/powerpc/platforms/powernv/pci-ioda.c arch/powerpc/platforms/powernv/pci.h
| * powerpc/powernv/ioda: Allocate indirect TCE levels on demandAlexey Kardashevskiy2018-07-161-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment we allocate the entire TCE table, twice (hardware part and userspace translation cache). This normally works as we normally have contigous memory and the guest will map entire RAM for 64bit DMA. However if we have sparse RAM (one example is a memory device), then we will allocate TCEs which will never be used as the guest only maps actual memory for DMA. If it is a single level TCE table, there is nothing we can really do but if it a multilevel table, we can skip allocating TCEs we know we won't need. This adds ability to allocate only first level, saving memory. This changes iommu_table::free() to avoid allocating of an extra level; iommu_table::set() will do this when needed. This adds @alloc parameter to iommu_table::exchange() to tell the callback if it can allocate an extra level; the flag is set to "false" for the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns H_TOO_HARD. This still requires the entire table to be counted in mm::locked_vm. To be conservative, this only does on-demand allocation when the usespace cache table is requested which is the case of VFIO. The example math for a system replicating a powernv setup with NVLink2 in a guest: 16GB RAM mapped at 0x0 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000 the table to cover that all with 64K pages takes: (((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB If we allocate only necessary TCE levels, we will only need: (((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect levels). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/powernv: Add indirect levels to it_userspaceAlexey Kardashevskiy2018-07-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support sparse memory and therefore huge chunks of DMA windows do not need to be mapped. If a DMA window big enough to require 2 or more indirect levels, and a DMA window is used to map all RAM (which is a default case for 64bit window), we can actually save some memory by not allocation TCE for regions which we are not going to map anyway. The hardware tables alreary support indirect levels but we also keep host-physical-to-userspace translation array which is allocated by vmalloc() and is a flat array which might use quite some memory. This converts it_userspace from vmalloc'ed array to a multi level table. As the format becomes platform dependend, this replaces the direct access to it_usespace with a iommu_table_ops::useraddrptr hook which returns a pointer to the userspace copy of a TCE; future extension will return NULL if the level was not allocated. This should not change non-KVM handling of TCE tables and it_userspace will not be allocated for non-KVM tables. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Make iommu_table::it_userspace big endianAlexey Kardashevskiy2018-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | We are going to reuse multilevel TCE code for the userspace copy of the TCE table and since it is big endian, let's make the copy big endian too. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: Remove POWER9 DD1 supportNicholas Piggin2018-07-166-69/+11
| | | | | | | | | | | | | | | | | | | | | | POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Remove Power8 DD1 from cputableJoel Stanley2018-07-121-8/+7
| | | | | | | | | | | | | | | | | | This was added to support an early version of Power8 that did not have working doorbells. These machines were not publicly available, and all of the internal users have long since upgraded. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb"Alastair D'Silva2018-07-021-7/+0
| | | | | | | | | | | | | | | | | | | | Remove abandonned capi support for the Mellanox CX4. This reverts commit 4361b03430d685610e5feea3ec7846e8b9ae795f. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/eeh: Avoid misleading message "EEH: no capable adapters found"Mauro S. M. Rodrigues2018-07-021-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to recent refactoring in EEH in: commit b9fde58db7e5 ("powerpc/powernv: Rework EEH initialization on powernv") a misleading message was seen in the kernel message buffer: [ 0.108431] EEH: PowerNV platform initialized [ 0.589979] EEH: No capable adapters found This happened due to the removal of the initialization delay for powernv platform. Even though the EEH infrastructure for the devices is eventually initialized and still works just fine the eeh device probe step is postponed in order to assure the PEs are created. Later pnv_eeh_post_init does the probe devices job but at that point the message was already shown right after eeh_init flow. This patch introduces a new flag EEH_POSTPONED_PROBE to represent that temporary state and avoid the message mentioned above and showing the follow one instead: [ 0.107724] EEH: PowerNV platform initialized [ 4.844825] EEH: PCI Enhanced I/O Error Handling Enabled Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Tested-by:Venkat Rao B <vrbagal1@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud