| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Some macros use RA where when RA=R0 the values is 0, so make this
the enforced mnemonic in the macro.
Idea suggested by Andreas Schwab.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
R0 is special since it'll be 0.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Enforce the use of R0-R31 in macros where possible now we have all the
fixes in.
R0-R31 macros are removed here so that can't be used anymore. They
should not be defined anywhere.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
| |
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now have ___PPC_RA/B/S/T we can use it in some places. These are
places where we can't use the existing defines which will soon enforce
R0-R31 usage.
The macros being changed here are being used in inline asm, which
can't convert to enforce the R0-R31 usage.
bpf_jit uses a mix of both generated and non-generated with the same
code, so just convert all these to use the ___PPC_R versions which
won't enforce R usage later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
These are currently the same as __PPC_RA/B/S/T but we'll wrap them
soon.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
We need to do this so we can enforce the name of a and b in called
macros PPC_RA/B later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
move lbz/stbciz to ppc-opcode.h.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
We are going to use these later and convert r0 to %r0 etc.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This provides the low-level support for MMIO emulation in Book3S HV
guests. When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page. Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page. An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.
When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An implementation of a code generator for BPF programs to speed up packet
filtering on PPC64, inspired by Eric Dumazet's x86-64 version.
Filter code is generated as an ABI-compliant function in module_alloc()'d mem
with stackframe & prologue/epilogue generated if required (simple filters don't
need anything more than an li/blr). The filter's local variables, M[], live in
registers. Supports all BPF opcodes, although "complicated" loads from negative
packet offsets (e.g. SKF_LL_OFF) are not yet supported.
There are a couple of further optimisations left for future work; many-pass
assembly with branch-reach reduction and a register allocator to push M[]
variables into volatile registers would improve the code quality further.
This currently supports big-endian 64-bit PowerPC only (but is fairly simple
to port to PPC32 or LE!).
Enabled in the same way as x86-64:
echo 1 > /proc/sys/net/core/bpf_jit_enable
Or, enabled with extra debug output:
echo 2 > /proc/sys/net/core/bpf_jit_enable
Signed-off-by: Matt Evans <matt@ozlabs.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.
This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.
Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
| |
Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
The popcnt instructions went into binutils relatively recently. As with a
number of other instructions, create macros and hardcode them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the emulate_step() function to handle a large proportion
of the Book I instructions implemented on current 64-bit server
processors. The aim is to handle all the load and store instructions
used in the kernel, plus all of the instructions that appear between
l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx
and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7).
The new code can emulate user mode instructions, and checks the
effective address for a load or store if the saved state is for
user mode. It doesn't handle little-endian mode at present.
For floating-point, Altivec/VMX and VSX instructions, it checks
that the saved MSR has the enable bit for the relevant facility
set, and if so, assumes that the FP/VMX/VSX registers contain
valid state, and does loads or stores directly to/from the
FP/VMX/VSX registers, using assembly helpers in ldstfp.S.
Instructions supported now include:
* Loads and stores, including some but not all VMX and VSX instructions,
and lmw/stmw
* Atomic loads and stores (l[dw]arx, st[dw]cx.)
* Arithmetic instructions (add, subtract, multiply, divide, etc.)
* Compare instructions
* Rotate and mask instructions
* Shift instructions
* Logical instructions (and, or, xor, etc.)
* Condition register logical instructions
* mtcrf, cntlz[wd], exts[bhw]
* isync, sync, lwsync, ptesync, eieio
* Cache operations (dcbf, dcbst, dcbt, dcbtst)
The overflow-checking arithmetic instructions are not included, but
they appear not to be ever used in C code.
This uses decimal values for the minor opcodes in the switch statements
because that is what appears in the Power ISA specification, thus it is
easier to check that they are correct if they are in decimal.
If this is used to single-step an instruction where a data breakpoint
interrupt occurred, then there is the possibility that the instruction
is a lwarx or ldarx. In that case we have to be careful not to lose the
reservation until we get to the matching st[wd]cx., or we'll never make
forward progress. One alternative is to try to arrange that we can
return from interrupts and handle data breakpoint interrupts without
losing the reservation, which means not using any spinlocks, mutexes,
or atomic ops (including bitops). That seems rather fragile. The
other alternative is to emulate the larx/stcx and all the instructions
in between. This is why this commit adds support for a wide range
of integer instructions.
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
|
|
|
|
|
|
|
|
|
| |
e500v1/v2 based chips will treat any reserved field being set in an
opcode as illegal. Thus always setting the hint in the opcode is
a bad idea.
Anton should be kept away from the powerpc opcode map.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
|
|
|
|
|
| |
This patch implements the lwarx/ldarx hint bit for bit locks.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent versions of the PowerPC architecture added a hint bit to the larx
instructions to differentiate between an atomic operation and a lock operation:
> 0 Other programs might attempt to modify the word in storage addressed by EA
> even if the subsequent Store Conditional succeeds.
>
> 1 Other programs will not attempt to modify the word in storage addressed by
> EA until the program that has acquired the lock performs a subsequent store
> releasing the lock.
To avoid a binutils dependency this patch create macros for the extended lwarx
format and uses it in the spinlock code. To test this change I used a simple
test case that acquires and releases a global pthread mutex:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
On a 32 core POWER6, running 32 test threads we spend almost all our time in
the futex spinlock code:
94.37% perf [kernel] [k] ._raw_spin_lock
|
|--99.95%-- ._raw_spin_lock
| |
| |--63.29%-- .futex_wake
| |
| |--36.64%-- .futex_wait_setup
Which is a good test for this patch. The results (in lock/unlock operations per
second) are:
before: 1538203 ops/sec
after: 2189219 ops/sec
An improvement of 42%
A 32 core POWER7 improves even more:
before: 1279529 ops/sec
after: 2282076 ops/sec
An improvement of 78%
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
This adds the opcode definitions to ppc-opcode.h for the two instructions
tlbivax and tlbsrx. as defined by Book3E 2.06
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.
Only useful for bare metal systems.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
| |
Cleans up the VSX load/store instructions by moving them into
ppc-opcode.h.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
Make macros more braces happy.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
This reverts commit e9965577406a2148ade97b5e0ce7c448b4ba4ef6. Our HW
guys were able to fix this so it never sees the light of day.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode. Add support for a MMU_FTR
fixup to deal with this.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
The tlbilx opcode was not matching the Power ISA 2.06 arch spec.
The old opcode was an early suggested opcode that changed during the
2.06 architecture process.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|