| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 214863
|
| |
|
|
|
|
| |
for them.
llvm-svn: 214862
|
| |
|
|
|
|
|
|
|
|
|
| |
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
|
| |
|
|
| |
llvm-svn: 214826
|
| |
|
|
| |
llvm-svn: 214825
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
llvm-svn: 214800
|
| |
|
|
| |
llvm-svn: 214797
|
| |
|
|
| |
llvm-svn: 214784
|
| |
|
|
|
|
| |
information and update all callers. No functional change.
llvm-svn: 214781
|
| |
|
|
| |
llvm-svn: 214769
|
| |
|
|
|
|
| |
systems it represents.
llvm-svn: 214755
|
| |
|
|
| |
llvm-svn: 214754
|
| |
|
|
| |
llvm-svn: 214733
|
| |
|
|
| |
llvm-svn: 214731
|
| |
|
|
|
|
| |
Move the test cases for them into separate files.
llvm-svn: 214724
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
llvm-svn: 214718
|
| |
|
|
|
|
|
|
|
|
| |
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
llvm-svn: 214716
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
Reviewed by Bill Schmidt.
llvm-svn: 214714
|
| |
|
|
| |
llvm-svn: 214640
|
| |
|
|
| |
llvm-svn: 214639
|
| |
|
|
|
|
|
|
|
| |
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.
llvm-svn: 214636
|
| |
|
|
|
|
|
|
|
| |
so that we can use it to get the old-style JIT out of the subtarget.
This code should be removed when the old-style JIT is removed
(imminently).
llvm-svn: 214560
|
| |
|
|
|
|
|
|
|
| |
Found by inspection while looking at PR20280: code would mark slots
in the parameter save area where a byval parameter is passed as
"immutable". This is not correct since code is allowed to modify
byval parameters in place in the parameter save area.
llvm-svn: 214517
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
regular loads
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.
When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:
The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.
A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.
To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.
Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.
llvm-svn: 214481
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.
llvm-svn: 214469
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
|
| |
|
|
| |
llvm-svn: 214363
|
| |
|
|
| |
llvm-svn: 214354
|
| |
|
|
| |
llvm-svn: 214339
|
| |
|
|
| |
llvm-svn: 214332
|
| |
|
|
| |
llvm-svn: 214297
|
| |
|
|
|
|
| |
file. Add support for accessing SPRG 4 to 7 on BookE.
llvm-svn: 214295
|
| |
|
|
| |
llvm-svn: 214256
|
| |
|
|
| |
llvm-svn: 214250
|
| |
|
|
| |
llvm-svn: 214244
|
| |
|
|
| |
llvm-svn: 214238
|
| |
|
|
| |
llvm-svn: 214234
|
| |
|
|
| |
llvm-svn: 214196
|
| |
|
|
| |
llvm-svn: 214181
|
| |
|
|
| |
llvm-svn: 214147
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently
hard-coded via the target triple: powerpc64-linux is always ELFv1, while
powerpc64le-linux is always ELFv2.
These are of course the most common scenarios, but in principle it is
possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on
little-endian systems (and GCC does support that), and there are some
special use cases for that (e.g. certain Linux kernel versions could
only be built using ELFv1 on LE).
This patch implements the LLVM side of supporting this. As precedent
on other platforms suggests, ABI options are passed to the back-end as
features. Thus, this patch implements two features "elfv1" and "elfv2"
that select the desired ABI if present. (If not, the LLVM uses the
same default rules as now.)
llvm-svn: 214072
|
| |
|
|
|
|
|
|
|
|
| |
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
llvm-svn: 214055
|
| |
|
|
|
|
| |
Patch by Justin Hibbits!
llvm-svn: 213960
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
llvm-svn: 213915
|
| |
|
|
| |
llvm-svn: 213899
|
| |
|
|
| |
llvm-svn: 213856
|
| |
|
|
| |
llvm-svn: 213832
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds infrastructure support for passing array types
directly. These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type). The details of the
array type being used inform the back-end about ABI-relevant
properties. Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
GPRs/stack slots (for float / vector / integer element types,
respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots (8 for float / 16 for vector / the element type
size for integer element types) -- this corresponds to the
"byval align" field
Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
or vector aggregates in FPRs and VRs (this is similar to the ARM
homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
that fit fully in registers without using the "byval" mechanism
The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.
As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs. This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs. Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.
Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs. The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.
This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST). That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.
The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)
Reviewed by Hal Finkel.
llvm-svn: 213493
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2. In ELFv2 instead,
each of the CR fields is represented by its own CFI record. The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving. That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.
Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.
Reviewed by Hal Finkel.
llvm-svn: 213492
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
present in certain cases (it remains mandatory in functions with
variable arguments, and functions that have any parameter that is
passed on the stack)
The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional
This latter part requires some extra explanation: Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily. This can be addressed by a follow-on
optimization patch.
On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.
To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.
Reviewed by Hal Finkel.
llvm-svn: 213490
|