| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
|
|
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.
llvm-svn: 262935
|
|
|
|
|
|
| |
Problem not hit by any in tree target.
llvm-svn: 262852
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The divrem combine assumed the type of the div/rem is simple, which isn't
necessarily true. This probably worked fine until r250825, since it only
saw legal types, but now breaks when it runs as a pre-type-legalization
combine.
This fixes PR26835.
Differential Revision: http://reviews.llvm.org/D17878
llvm-svn: 262746
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
|
|
|
|
|
|
|
|
| |
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
|
|
|
|
|
|
| |
This reverts commit r262507, which broke some ARM buildbots.
llvm-svn: 262594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
|
|
|
|
| |
llvm-svn: 262446
|
|
|
|
|
|
|
|
|
|
|
| |
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
llvm-svn: 262397
|
|
|
|
|
|
|
|
|
| |
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
|
|
|
|
|
|
|
| |
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.
llvm-svn: 262358
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
llvm-svn: 262148
|
|
|
|
|
|
| |
This is OK for +0 since compares to +/-0 give the same result.
llvm-svn: 262125
|
|
|
|
| |
llvm-svn: 261437
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17229
llvm-svn: 260901
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.
fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2. This prevents selection of
native f16 conversion instructions from f32 or f64. Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.
Reviewers: ab
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D17221
llvm-svn: 260769
|
|
|
|
| |
llvm-svn: 260316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.
The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.
The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.
The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.
llvm-svn: 259691
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
|
|
|
|
|
|
|
| |
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.
llvm-svn: 259283
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D16463
llvm-svn: 259024
|
|
|
|
|
|
| |
Make use of DAG.getBitcast and use clang-format to reduce number of lines (and make it more readable).
llvm-svn: 258644
|
|
|
|
|
|
|
|
| |
This reapplies r258296 and r258366, and also fixes an existing bug in
SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
offset in a GlobalAddressSDNode, which is uncovered by those patches.
llvm-svn: 258482
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts r258296 and the follow up r258366. With this change, we
miscompiled the following program on Windows:
#include <string>
#include <iostream>
static const char kData[] = "asdf jkl;";
int main() {
std::string s(kData + 3, sizeof(kData) - 3);
std::cout << s << '\n';
}
llvm-svn: 258465
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SelectionDAG previously missed opportunities to fold constants into
GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it
would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to
create `(add GA+c, y)`. This isn't often visible on targets such as X86 which
effectively reassociate adds in their complex address-mode folding logic,
however it is currently visible on WebAssembly since it currently has very
simple address mode folding code that doesn't reassociate anything.
This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses
at the same times that it folds constants together, so that it doesn't miss any
opportunities to perform such folding.
Differential Revision: http://reviews.llvm.org/D16090
llvm-svn: 258296
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bug was introduced with changes for x86-64 fp128:
http://reviews.llvm.org/rL254653
I don't know why an x86 change is here, so I'll follow up in:
http://reviews.llvm.org/D15134
Should fix:
https://llvm.org/bugs/show_bug.cgi?id=26070
llvm-svn: 257200
|
|
|
|
| |
llvm-svn: 257192
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
offsets.
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.
This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Differential Revision: http://reviews.llvm.org/D15544
llvm-svn: 256890
|
|
|
|
|
|
|
|
|
|
|
|
| |
x)) combines for ppc_fp128, since signbit computation is more
complicated.
Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html
Patch by Tim Shen!
llvm-svn: 255305
|
|
|
|
| |
llvm-svn: 254968
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.
* Relax type legalization checks to accept new f128 type configuration,
whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified
simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.
Differential Revision: http://reviews.llvm.org/D15134
llvm-svn: 254653
|
|
|
|
| |
llvm-svn: 254260
|
|
|
|
| |
llvm-svn: 254242
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
|
|
|
|
| |
llvm-svn: 253823
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When MergeConsecutiveStores() combines two loads and two stores into
wider loads and stores, the chain users of both of the original loads
must be transfered to the new load, because it may be that a chain
user only depends on one of the loads.
New test case: test/CodeGen/SystemZ/dag-combine-01.ll
Reviewed by James Y Knight.
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=25310#c6
llvm-svn: 253779
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In particular, this makes the code for 64-bit compares on 32-bit targets
much more efficient.
Example:
define i32 @test_slt(i64 %a, i64 %b) {
entry:
%cmp = icmp slt i64 %a, %b
br i1 %cmp, label %bb1, label %bb2
bb1:
ret i32 1
bb2:
ret i32 2
}
Before this patch:
test_slt:
movl 4(%esp), %eax
movl 8(%esp), %ecx
cmpl 12(%esp), %eax
setae %al
cmpl 16(%esp), %ecx
setge %cl
je .LBB2_2
movb %cl, %al
.LBB2_2:
testb %al, %al
jne .LBB2_4
movl $1, %eax
retl
.LBB2_4:
movl $2, %eax
retl
After this patch:
test_slt:
movl 4(%esp), %eax
movl 8(%esp), %ecx
cmpl 12(%esp), %eax
sbbl 16(%esp), %ecx
jge .LBB1_2
movl $1, %eax
retl
.LBB1_2:
movl $2, %eax
retl
Differential Revision: http://reviews.llvm.org/D14496
llvm-svn: 253572
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Don't fold
(zext (and (load x), cst)) -> (and (zextload x), (zext cst))
if
(and (load x) cst)
will match as a zextload already and has additional users.
For example, the following IR:
%load = load i32, i32* %ptr, align 8
%load16 = and i32 %load, 65535
%load64 = zext i32 %load16 to i64
store i32 %load16, i32* %dst1, align 4
store i64 %load64, i64* %dst2, align 8
used to produce the following aarch64 code:
ldr w8, [x0]
and w9, w8, #0xffff
and x8, x8, #0xffff
str w9, [x1]
str x8, [x2]
but with this change produces the following aarch64 code:
ldrh w8, [x0]
str w8, [x1]
str x8, [x2]
Reviewers: resistor, mcrosier
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14340
llvm-svn: 252789
|
|
|
|
| |
llvm-svn: 252775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was suggested in:
http://reviews.llvm.org/D13956
and is a follow-on to:
http://reviews.llvm.org/rL252515
http://reviews.llvm.org/rL252519
This lets us remove logically equivalent/duplicated code from DAGCombiner and X86ISelDAGToDAG.
A corresponding function for IR instructions already exists in ValueTracking.
llvm-svn: 252539
|
|
|
|
|
|
|
|
|
|
|
|
| |
extload
Reviewers: resistor, arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13805
llvm-svn: 252349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.
2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.
A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.
llvm-svn: 251816
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes.
This was originally part of D8900.
Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and
possibly other changes.
Differential Revision: http://reviews.llvm.org/D9708
llvm-svn: 251450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().
This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035
llvm-svn: 251373
|
|
|
|
|
|
|
|
| |
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.
Followup to D13851.
llvm-svn: 251197
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.
This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.
Differential Revision: http://reviews.llvm.org/D13851
llvm-svn: 251188
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.
I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).
Differential Revision: http://reviews.llvm.org/D13740
llvm-svn: 251028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner.
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.
The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.
Reviewers: RKSimon, eli.friedman, rengolin
Subscribers: john.brawn, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13733
llvm-svn: 250825
|