| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776
Differential revision: https://reviews.llvm.org/D79283
(cherry picked from commit 1a493b0fa556a07c728862c3c3f70bfd8683bef0)
|
|
|
|
|
|
| |
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)
This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.
It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .
This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.
Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70758
|
|
|
|
|
|
|
|
|
| |
semantics of ISD::BSWAP
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.
Differential Revision: https://reviews.llvm.org/D70657
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.
In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:
struct FunctionDescriptor {
void *EntryPoint;
void *TOCAnchor;
void *EnvironmentPointer;
};
An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:
1) Save the caller's TOC pointer into the TOC save slot in the linkage
area, and then load the callee's TOC pointer into the TOC register
(GPR 2 on AIX).
2) Load the function descriptor's entry point into the count register.
3) Load the environment pointer into the environment pointer register
(GPR 11 on AIX).
4) Perform the call by branching on count register.
5) Restore the caller's TOC pointer after returning from the indirect call.
A couple important caveats to the above:
- There is no way to directly load a value from memory into the count register.
Instead we populate the count register by loading the entry point address into
a gpr and then moving the gpr to the count register.
- The TOC restore has to come immediately after the branch on count register
instruction (i.e., the 1st instruction executed after we return from the
call). This is an implementation limitation. We could, in theory, schedule
the restore elsewhere as long as no uses of the TOC pointer fall in between
the call and the restore; however, to keep it simple, we insert a pseudo
instruction that represents both the indirect branch instruction and the
load instruction that restores the caller's TOC from the linkage area. As
they flow through the compiler as a single pseudo instruction, nothing can be
inserted between them and the caller's TOC is then valid at any use.
Differtential Revision: https://reviews.llvm.org/D70724
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds LowerFormalArguments_AIX, support is added for lowering
int, float, and double formal arguments into general purpose and
floating point registers only.
The aix calling convention testcase have been redone to test for caller
and callee functionality in the same lit test.
Patch by Zarko Todorovski!
Differential Revision: https://reviews.llvm.org/D69578
|
|
|
|
|
|
|
|
| |
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
long long test(long long a, long long b) {
if (a << b > 0)
return b;
if (a << b < 0)
return a;
return a*b;
}
Produces:
sld. 5, 3, 4
ble 0, .LBB0_2
mr 3, 4
blr
.LBB0_2: # %if.end
cmpldi 5, 0
li 5, 1
isel 4, 4, 5, 2
mulld 3, 4, 3
blr
But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already
contains the result of that comparison).
The root cause of this is that LLVM converts signed comparisons into equality
comparison based on dominance. Equality comparisons are unsigned by default, so
we get either a record-form or cmp (without the l for logical) feeding a cmpl.
That is the situation we want to avoid here.
Differential Revision: https://reviews.llvm.org/D60506
|
|
|
|
|
|
|
|
|
|
| |
VSX provides floating point minimum and maximum instructions that conform
to IEEE semantics. This legalizes the respective nodes and emits VSX code
for them. Furthermore, on Power9 cores we have xsmaxcdp and xsmincdp
instructions that conform to language semantics for the conditional operator
even in the presence of NaNs.
Differential revision: https://reviews.llvm.org/D62993
|
|
|
|
|
|
|
|
|
|
|
| |
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
|
|
|
|
| |
llvm-svn: 373081
|
|
|
|
|
|
|
|
|
|
|
| |
We currently produce a load, followed by (possibly a move for integers and) a
splat as separate instructions. VSX has always had a splatting load for
doublewords, but as of Power9, we have it for words as well. This patch just
exploits these instructions.
Differential revision: https://reviews.llvm.org/D63624
llvm-svn: 372139
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the missing piece of r372029.
Somehow when the patch for review D61961 was committed, only the test case
went in and the code didn't. This of course caused all kinds of build bot
breaks.
This patch just adds the code for that patch.
Author: Lei Huang
Differential revision: https://reviews.llvm.org/D61961
llvm-svn: 372043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Reviewed By: courbet
Subscribers: wuzish, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67386
llvm-svn: 371511
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements global address lowering for 32/64 bit with small/large code models.
1.For 32bit large code model on AIX, there are newly added pseudo opcode LWZtocL & ADDIStocHA32, the support of which on MC layer will be
provided by future patches.
2.The default code model on AIX should be small code model.
3.Since AIX does not have medium code model, "report_fatal_error" when users specify it.
Differential Revision: https://reviews.llvm.org/D63547
llvm-svn: 368744
|
|
|
|
|
|
|
|
|
|
|
|
| |
by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
Differential Revision: https://reviews.llvm.org/D65063
llvm-svn: 367516
|
|
|
|
| |
llvm-svn: 367388
|
|
|
|
|
|
|
|
|
|
| |
big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
llvm-svn: 367382
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it's only used in load/store
Replace float load/store pair with integer load/store pair when it's only used in load/store,
because float load/store instructions cost more cycles then integer load/store.
A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack.
So we need a load/store pair to do such memory operation if the variable is global variable.
Differential Revision: https://reviews.llvm.org/D64195
llvm-svn: 366775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset. However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).
The update to the register spill test is taken partially from the test
case shown in D49754.
Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted. This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.
Part of the patch contributed by Kei Thomsen.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54409
llvm-svn: 366318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
|
|
|
|
| |
llvm-svn: 362718
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.
Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma
Differential Revision: https://reviews.llvm.org/D61948
llvm-svn: 361669
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D62173
llvm-svn: 361346
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32.
eg. For the following IR
%0 = load <2 x float>, <2 x float>* %Ptr, align 8
%1 = fpext <2 x float> %0 to <2 x double>
ret <2 x double> %1
Pre custom lowering:
ld r3, 0(r3)
mtvsrd f0, r3
xxswapd vs34, vs0
xscvspdpn f0, vs0
xxsldwi vs1, vs34, vs34, 3
xscvspdpn f1, vs1
xxmrghd vs34, vs0, vs1
After custom lowering:
lfd f0, 0(r3)
xxmrghw vs0, vs0, vs0
xvcvspdp vs34, vs0
Differential Revision: https://reviews.llvm.org/D57857
llvm-svn: 360429
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
|
|
|
|
|
|
|
|
|
|
| |
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.
Differential Revision: https://reviews.llvm.org/D61026
llvm-svn: 359146
|
|
|
|
| |
llvm-svn: 357981
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in place
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
|
|
|
|
|
|
|
|
|
|
|
| |
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
|
|
|
|
|
|
|
|
| |
The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
Differential Revision: https://reviews.llvm.org/D56507
llvm-svn: 353724
|
|
|
|
|
|
|
|
|
| |
Move the CC analysis implementation to its own .cpp file instead of
duplicating it and artificually using functions in PPCISelLowering.cpp
and PPCFastISel.cpp. Follow-up to the same change done for X86, ARM, and
AArch64.
llvm-svn: 352444
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.
Reviewed By: steven.zhang, hfinkel
Differential Revision: https://reviews.llvm.org/D55754
llvm-svn: 349727
|
|
|
|
|
|
|
|
|
|
|
|
| |
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
|
|
|
|
|
|
|
|
|
|
| |
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Differential Revision: https://reviews.llvm.org/D54783
llvm-svn: 349437
|
|
|
|
|
|
|
|
|
|
|
| |
is accessed as got-indirect or not.
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals.
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry.
Differential Revision: https://reviews.llvm.org/D54925
llvm-svn: 348108
|
|
|
|
|
|
|
|
| |
an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
|
|
|
|
|
|
|
|
|
|
|
| |
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0
Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360
llvm-svn: 345366
|
|
|
|
|
|
|
|
|
|
|
|
| |
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D53346
llvm-svn: 345361
|
|
|
|
|
|
|
|
|
| |
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.
Differential Revision: https://reviews.llvm.org/D49507
llvm-svn: 345053
|
|
|
|
|
|
|
|
|
|
| |
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.
llvm-svn: 344077
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the ppc64le platform, if ir has the following form,
define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 {
entry:
%cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768)
%conv1 = zext i1 %cmp to i64
%add = add nsw i64 %conv1, %x
ret i64 %add
}
we can optimize it to the form below.
when C == 0
--> addze X, (addic Z, -1))
/
add X, (zext(setne Z, C))--
\ when -32768 <= -C <= 32767 && C != 0
--> addze X, (addic (addi Z, -C), -1)
Patch By: HLJ2009 (Li Jia He)
Differential Revision: https://reviews.llvm.org/D51403
Reviewed By: Nemanjai
llvm-svn: 341634
|
|
|
|
|
|
|
|
| |
The internal benchmark failure reported by Google was due to a missing
check for the result type for the sign-extend and shift DAG. This commit
adds the check and re-commits the patch.
llvm-svn: 340734
|
|
|
|
|
|
|
|
| |
immediate instruction" due to it causing a compiler crash on valid.
This reverts commit r340016, testcase forthcoming.
llvm-svn: 340315
|
|
|
|
|
|
|
|
|
|
|
| |
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D49879
llvm-svn: 340016
|
|
|
|
|
|
|
|
| |
BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
llvm-svn: 338329
|