| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
Change the default for PowerPC LE to -fno-PIC.
Differential Revision: https://reviews.llvm.org/D53383
llvm-svn: 348298
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the instruction selection
Summary:
A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code
includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the
most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54825
llvm-svn: 347831
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
selection
Add the following test case for the ISD::BR_CC node in the instruction selection
define i64 @testi64slt(i64 %c1, i64 %c2, i64 %c3, i64 %c4, i64 %a1, i64 %a2) #0 {
entry:
%cmp1 = icmp eq i64 %c3, %c4
%cmp3tmp = icmp eq i64 %c1, %c2
%cmp3 = icmp slt i1 %cmp3tmp, %cmp1
br i1 %cmp3, label %iftrue, label %iffalse
iftrue:
ret i64 %a1
iffalse:
ret i64 %a2
}
The data type i64 can be replaced by i32, i64, float, double
And condition codes can be replaced by: SETEQ, SETEN, SELT, SETLE, SETGT, SETGE,SETULT, SETULE, SSETGT, and SETUGE
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54824
llvm-svn: 347828
|
|
|
|
|
|
|
|
|
|
|
|
| |
SplitVecOp_TruncateHelper for FP_TO_SINT/UINT.
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
|
|
|
|
|
|
|
|
| |
This reverts commits r347532. Forget add the option
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.
llvm-svn: 347534
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 347532
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was
reversing that transform. I think we need this transform in the backend
regardless of what happens in IR to catch cases where the shift-xor
is formed late from GEP or other ops.
https://rise4fun.com/Alive/NC1
Name: shl
Pre: (-1 << C2) == C1
%shl = shl i8 %x, C2
%r = xor i8 %shl, C1
=>
%not = xor i8 %x, -1
%r = shl i8 %not, C2
Name: shr
Pre: (-1 u>> C2) == C1
%sh = lshr i8 %x, C2
%r = xor i8 %sh, C1
=>
%not = xor i8 %x, -1
%r = lshr i8 %not, C2
https://bugs.llvm.org/show_bug.cgi?id=39657
llvm-svn: 347478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334
llvm-svn: 347376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.
Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX
and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.
Differential Revision: https://reviews.llvm.org/D54700
llvm-svn: 347311
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add a STWU testcase for scheduling check.
Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd,
We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future).
We will fix the missing itineraries of IIC_LdStStoreUpd in following patch,
and update this testcase to show the scheduling difference only there.
Differential Revision: https://reviews.llvm.org/D54699
llvm-svn: 347310
|
|
|
|
|
|
|
|
|
|
| |
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.
llvm-svn: 347288
|
|
|
|
|
|
|
|
|
| |
This NFC patch just adds test cases for conversions that currently
require scalarization of vectors. An updcoming patch will change
the legalization for these and it is more suitable on the review
to show the diferences in code gen rather than just the new code gen.
llvm-svn: 347090
|
|
|
|
|
|
| |
This reverts commit r347069
llvm-svn: 347076
|
|
|
|
|
|
|
|
| |
Set -fno-PIC as the default option.
Differential Revision: https://reviews.llvm.org/D53383
llvm-svn: 347069
|
|
|
|
|
|
|
|
|
|
| |
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.
Differential Revision: https://reviews.llvm.org/D49531
llvm-svn: 346824
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be
This patch adds support for these.
Differential Revision: https://reviews.llvm.org/D53581
llvm-svn: 346148
|
|
|
|
|
|
|
|
|
|
| |
From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D53265
llvm-svn: 345810
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The debug-use flag must be set exactly for uses on DBG_VALUEs. This is
so obvious that it can be trivially inferred while parsing. This will
reduce noise when printing while omitting an information that has little
value to the user.
The parser will keep recognizing the flag for compatibility with old
`.mir` files.
Differential Revision: https://reviews.llvm.org/D53903
llvm-svn: 345671
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Relex hard coded registers and stack frame sizes
- Some test cleanups
- Change phi-dbg.ll to match on mir output after phi elimination instead
of going through the whole codegen pipeline.
This is in preparation for https://reviews.llvm.org/D52010
I'm committing all the test changes upfront that work before and after
independently.
llvm-svn: 345532
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, for this node:
vector int test(int a, int b, int c, int d) {
return (vector int) { a, b, c, d };
}
we get this on Power9:
mtvsrdd 34, 5, 3
mtvsrdd 35, 6, 4
vmrgow 2, 3, 2
and this on Power8:
mtvsrwz 0, 3
mtvsrwz 1, 5
mtvsrwz 2, 4
mtvsrwz 3, 6
xxmrghd 34, 1, 0
xxmrghd 35, 3, 2
vmrgow 2, 3, 2
This can be improved to this on LE Power9:
rldimi 3, 4, 32, 0
rldimi 5, 6, 32, 0
mtvsrdd 34, 5, 3
and this on LE Power8
rldimi 3, 4, 32, 0
rldimi 5, 6, 32, 0
mtvsrd 34, 3
mtvsrd 35, 5
xxpermdi 34, 35, 34, 0
This patch updates the TD pattern to generate the optimized sequence for both
Power8 and Power9 on LE and BE.
Differential Revision: https://reviews.llvm.org/D53494
llvm-svn: 345414
|
|
|
|
|
|
|
|
|
|
|
| |
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0
Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360
llvm-svn: 345366
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
combineSetCC
For both operands are bool, short, int, long, long long, add the following optimization test case.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0
Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53358
llvm-svn: 345365
|
|
|
|
|
|
|
|
|
|
|
|
| |
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D53346
llvm-svn: 345361
|
|
|
|
|
|
|
|
|
| |
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.
Differential Revision: https://reviews.llvm.org/D49507
llvm-svn: 345053
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.
There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.
This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.
Differential revision: https://reviews.llvm.org/D53323
llvm-svn: 344894
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable.
ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value.
This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node.
VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them.
VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits.
This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero).
Differential Revision: https://reviews.llvm.org/D48025
llvm-svn: 344347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Correct offset calculation in load-store forwarding for big-endian
targets.
Reviewers: rnk, RKSimon, waltl
Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53147
llvm-svn: 344272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.
Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.
Reviewers: RKSimon, rnk, kparzysz, javed.absar
Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D49200
llvm-svn: 344142
|
|
|
|
|
|
|
| |
An upcoming patch will change the codegen for these patterns. This test case is
added now so that the patch can show the differences in codegen.
llvm-svn: 344112
|
|
|
|
|
|
|
|
|
|
| |
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation.
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449
llvm-svn: 344109
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X
When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)
As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.
Differential revision: https://reviews.llvm.org/D44548
llvm-svn: 344093
|
|
|
|
|
|
|
| |
This just adds the test case so that the different code gen is clearly visible
when the DAG Combine lands.
llvm-svn: 344091
|
|
|
|
|
|
|
|
|
|
| |
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.
llvm-svn: 344077
|
|
|
|
| |
llvm-svn: 344037
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are occasionally instances where AADB rewrites registers in such a way
that a reg-reg copy becomes a self-copy. Such an instruction is obviously
redundant and can be removed. This patch does precisely that.
Note that this will not remove various nop's that we insert (which are
themselves just self-copies). The reason those are left alone is that all of
them have their own opcodes (that just encode to a self-copy).
What prompted this patch is the fact that these self-copies sometimes end up
using registers that make the instruction a priority-setting nop, thereby
having a significant effect on performance.
Differential revision: https://reviews.llvm.org/D52432
llvm-svn: 344036
|
|
|
|
|
|
|
|
|
|
|
| |
Going from XForm Load to DSForm Load requires that the immediate be 4 byte
aligned.
If we are not aligned we must leave the load as LDX (XForm).
This bug is causing a compile-time failure in the benchmark h264ref.
Differential Revision: https://reviews.llvm.org/D51988
llvm-svn: 343525
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When PR16508 was solved (in rL185363) a regression test was
added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll.
I discovered that the test case no longer reproduced the
scenario from PR16508. This problem could have been amended
by adding an extra RUN line with "-O1" (or possibly "-O0"),
but instead I added a mir-reproducer
test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir
to get a reproducer that is less sensitive to changes in
earlier passes (including O-level).
While being at it I also corrected a code comment in
PHIElimination::EliminatePHINodes that has been incorrect
since the related bugfix from rL185363.
Reviewers: MatzeB, hfinkel
Reviewed By: MatzeB
Subscribers: nemanjai, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52553
llvm-svn: 343416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET.
Other optimizers, such as block placement, may generate such code and hence
I do this at the very end of the optimization in pre-emit peephole pass.
A conditional branch based on a constant is eliminated or converted into unconditional branch.
Also CRSET/CRUNSET is eliminated if the condition code register is not used
by instruction other than the branch to be optimized.
Differential Revision: https://reviews.llvm.org/D52345
llvm-svn: 343100
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added
__builtin_vsx_scalar_extract_expq
__builtin_vsx_scalar_insert_exp_qp
Builtins should behave the same way as in GCC.
Differential Revision: https://reviews.llvm.org/D48185
llvm-svn: 342910
|
|
|
|
|
|
|
|
|
|
|
| |
gcc uses operand modifier 'x' in inline asm for VSX registers.
Without this modifier, instructions which use VSX numbering for their
operands are printed as VMX registers. This patch adds support for the
operand modifier 'x'.
Differential Revision: https://reviews.llvm.org/D52244
llvm-svn: 342882
|
|
|
|
|
|
|
|
|
|
| |
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072
llvm-svn: 342611
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.
This patch increases the number of record-form rotates we eliminate by
more than 70%.
Differential revision: https://reviews.llvm.org/D44897
llvm-svn: 342478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.
Differential revision: https://reviews.llvm.org/D51353
llvm-svn: 342472
|
|
|
|
| |
llvm-svn: 342463
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040
llvm-svn: 342441
|
|
|
|
|
|
|
|
|
| |
This patch add a mulld testcase for scheduling check.
Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52039
llvm-svn: 342440
|
|
|
|
|
|
|
|
| |
This patch fixes calculating address of label for non-pic ppc64.
Differential Revision: https://reviews.llvm.org/D50965
llvm-svn: 342368
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.
"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.
Fixes PR38661
Reviewers: hfinkel, cuviper
Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D51108
llvm-svn: 342288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the ppc64le platform, if ir has the following form,
define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 {
entry:
%cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768)
%conv1 = zext i1 %cmp to i64
%add = add nsw i64 %conv1, %x
ret i64 %add
}
we can optimize it to the form below.
when C == 0
--> addze X, (addic Z, -1))
/
add X, (zext(setne Z, C))--
\ when -32768 <= -C <= 32767 && C != 0
--> addze X, (addic (addi Z, -C), -1)
Patch By: HLJ2009 (Li Jia He)
Differential Revision: https://reviews.llvm.org/D51403
Reviewed By: Nemanjai
llvm-svn: 341634
|