| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sse4.2 support.
This should fix the buildbots.
Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192476
|
|
|
|
|
|
| |
on ubuntu.
llvm-svn: 192474
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner::visitFP_EXTEND will apply the following transformation:
fold (fpext (load x)) -> (fpext (fptrunc (extload x)))
but the implementation does not handle indexed loads (pre/post inc.), but did
not specifically ignore them either (unlike for extending loads, which it
already ignored), causing an assert when the transformation was applied to an
indexed load. This is the minimal fix for correctness (causing the
transformation to be skipped for indexed loads).
Unfortunately, I don't have an in-tree test case.
llvm-svn: 191989
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
or not. The corresponding dag patterns are as following:
"DAGCombier::MatchRotate" function in DAGCombiner.cpp
Pattern1
// fold (or (shl (*ext x), (*ext y)),
// (srl (*ext x), (*ext (sub 32, y)))) ->
// (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext y)),
// (srl (*ext x), (*ext (sub 32, y)))) ->
// (*ext (rotr x, (sub 32, y)))
pattern2
// fold (or (shl (*ext x), (*ext (sub 32, y))),
// (srl (*ext x), (*ext y))) ->
// (*ext (rotl x, y))
// fold (or (shl (*ext x), (*ext (sub 32, y))),
// (srl (*ext x), (*ext y))) ->
// (*ext (rotr x, (sub 32, y)))
llvm-svn: 191905
|
|
|
|
|
|
|
|
|
|
| |
This change fixes the problem reported in pr17380 and re-add the dagcombine
transformation ensuring that the value types are always legal if the
transformation is triggered after Legalization took place.
Added the test case from pr17380.
llvm-svn: 191509
|
|
|
|
| |
llvm-svn: 191438
|
|
|
|
|
|
|
|
|
|
|
|
| |
(shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)).
The rule only triggers when there are no other uses of the
zext to avoid materializing more instructions.
This helps the DAGCombiner understand that the shl/shr
sequence can then be converted into an and instruction.
llvm-svn: 191393
|
|
|
|
|
|
| |
No functionality change, lots of indentation changes.
llvm-svn: 191303
|
|
|
|
| |
llvm-svn: 191214
|
|
|
|
|
|
|
|
| |
splitting too."
This reverts commit r191130.
llvm-svn: 191138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191130
|
|
|
|
|
|
| |
These violations were introduced in r191049
llvm-svn: 191059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191049
|
|
|
|
|
|
| |
There is a buildbot failure. Need to investigate this.
llvm-svn: 191048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191045
|
|
|
|
|
|
|
|
| |
a power of 2 but differ in bit width.
PR17283.
llvm-svn: 191000
|
|
|
|
|
|
|
| |
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
|
|
|
|
|
|
|
| |
This uses the TargetSubtargetInfo::useAA() function to control the defaults of
the -combiner-alias-analysis and -combiner-global-alias-analysis options.
llvm-svn: 189564
|
|
|
|
| |
llvm-svn: 189526
|
|
|
|
|
|
|
|
| |
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.
llvm-svn: 189348
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().
Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());
llvm-svn: 189227
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.
This commit fixes the issue by adding an additional pattern match
and also a test case.
Reviewer: Nadav
llvm-svn: 188936
|
|
|
|
| |
llvm-svn: 188442
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
testl %edi, %edi
setne %al
cmpl $-1, %edi
setne %cl
andb %al, %cl
With this transform, we generate the simpler:
incl %edi
cmpl $1, %edi
seta %al
Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.
rdar://14689217
llvm-svn: 188315
|
|
|
|
| |
llvm-svn: 187961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR. The default implementation returns
the result from TargetLowering::getPointerTy()
The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets. However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces. On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).
There is no intended functionality change with this patch.
llvm-svn: 187748
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
llvm-svn: 187396
|
|
|
|
|
|
|
| |
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.
llvm-svn: 187007
|
|
|
|
|
|
| |
size.
llvm-svn: 186274
|
|
|
|
| |
llvm-svn: 186243
|
|
|
|
| |
llvm-svn: 186032
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
llvm-svn: 185956
|
|
|
|
|
|
|
|
|
|
|
| |
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.
This was another bug found by llvm-stress.
llvm-svn: 185949
|
|
|
|
|
|
| |
after return, etc. No funcionality change.
llvm-svn: 185893
|
|
|
|
| |
llvm-svn: 185780
|
|
|
|
|
|
|
|
|
|
|
|
| |
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.
Fixes PR16551.
llvm-svn: 185755
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner was counting all uses of a load node when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.
rdar://problem/13896307
llvm-svn: 185419
|
|
|
|
| |
llvm-svn: 184933
|
|
|
|
|
|
| |
shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector.
llvm-svn: 184931
|
|
|
|
|
|
|
| |
When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.
llvm-svn: 184575
|
|
|
|
|
|
|
|
| |
redudant checks...
This doesn't really effect performance due to all the relevant calls being transparent but is clearer.
llvm-svn: 184027
|
|
|
|
| |
llvm-svn: 184012
|
|
|
|
| |
llvm-svn: 184008
|
|
|
|
|
|
| |
FADD/FMUL combinations; also improve accuracy of comments
llvm-svn: 183993
|
|
|
|
|
|
|
| |
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.
llvm-svn: 182703
|
|
|
|
| |
llvm-svn: 182680
|
|
|
|
| |
llvm-svn: 182180
|
|
|
|
|
|
|
| |
Use EVT::changeExtendedVectorElementTypeToInteger instead of doing the
same thing that it does
llvm-svn: 182165
|
|
|
|
| |
llvm-svn: 181721
|