| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 187961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This virtual function can be implemented by targets to specify the type
to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT,
INSERT_SUBVECTOR, EXTRACT_SUBVECTOR. The default implementation returns
the result from TargetLowering::getPointerTy()
The previous code was using TargetLowering::getPointerTy() for vector
indices, because this is guaranteed to be legal on all targets. However,
using TargetLowering::getPointerTy() can be a problem for targets with
pointer sizes that differ across address spaces. On such targets,
when vectors need to be loaded or stored to an address space other than the
default 'zero' address space (which is the address space assumed by
TargetLowering::getPointerTy()), having an index that
is a different size than the pointer can lead to inefficient
pointer calculations, (e.g. 64-bit adds for a 32-bit address space).
There is no intended functionality change with this patch.
llvm-svn: 187748
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
llvm-svn: 187396
|
|
|
|
|
|
|
| |
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.
llvm-svn: 187007
|
|
|
|
|
|
| |
size.
llvm-svn: 186274
|
|
|
|
| |
llvm-svn: 186243
|
|
|
|
| |
llvm-svn: 186032
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
llvm-svn: 185956
|
|
|
|
|
|
|
|
|
|
|
| |
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.
This was another bug found by llvm-stress.
llvm-svn: 185949
|
|
|
|
|
|
| |
after return, etc. No funcionality change.
llvm-svn: 185893
|
|
|
|
| |
llvm-svn: 185780
|
|
|
|
|
|
|
|
|
|
|
|
| |
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.
Fixes PR16551.
llvm-svn: 185755
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner was counting all uses of a load node when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.
rdar://problem/13896307
llvm-svn: 185419
|
|
|
|
| |
llvm-svn: 184933
|
|
|
|
|
|
| |
shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector.
llvm-svn: 184931
|
|
|
|
|
|
|
| |
When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.
llvm-svn: 184575
|
|
|
|
|
|
|
|
| |
redudant checks...
This doesn't really effect performance due to all the relevant calls being transparent but is clearer.
llvm-svn: 184027
|
|
|
|
| |
llvm-svn: 184012
|
|
|
|
| |
llvm-svn: 184008
|
|
|
|
|
|
| |
FADD/FMUL combinations; also improve accuracy of comments
llvm-svn: 183993
|
|
|
|
|
|
|
| |
Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.
llvm-svn: 182703
|
|
|
|
| |
llvm-svn: 182680
|
|
|
|
| |
llvm-svn: 182180
|
|
|
|
|
|
|
| |
Use EVT::changeExtendedVectorElementTypeToInteger instead of doing the
same thing that it does
llvm-svn: 182165
|
|
|
|
| |
llvm-svn: 181721
|
|
|
|
| |
llvm-svn: 181682
|
|
|
|
|
|
|
|
| |
(and)) into (and (not)).
PR15948.
llvm-svn: 181597
|
|
|
|
|
|
|
|
| |
Fold (xor (and x, y), y) -> (and (not x), y)
This removes an opportunity for a constant to appear twice.
llvm-svn: 181395
|
|
|
|
|
|
| |
(Would sometimes optimize away conacts used to extend a vector with undef values)
llvm-svn: 181186
|
|
|
|
|
|
|
|
|
| |
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.
rdar://13402653
PR15866
llvm-svn: 180871
|
|
|
|
|
|
| |
into account some previously misssed cases (PRE_DEC addressing mode, the offset and base address are swapped, etc). This should fix PR15581.
llvm-svn: 180609
|
|
|
|
|
|
|
|
|
|
| |
scalars.
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
llvm-svn: 180597
|
|
|
|
|
|
|
|
|
| |
VSETCC without first checking the target's vector boolean contents.
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
llvm-svn: 180129
|
|
|
|
| |
llvm-svn: 179939
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.
The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
ldr.w r9, [sp]
vmov d17, r3, r9
vmov d16, r1, r2
vst1.8 {d16, d17}, [r0]
bx lr
Differential Revision: http://llvm-reviews.chandlerc.com/D647
llvm-svn: 179106
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.
copy(char *a, char *b, int n) {
do {
int t0 = a[0];
int t1 = a[1];
b[0] = t0;
b[1] = t1;
radar://13536387
llvm-svn: 178546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would also like to merge sequences that involve a variable index like in the
example below.
int index = *idx++
int i0 = c[index+0];
int i1 = c[index+1];
b[0] = i0;
b[1] = i1;
By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.
The dag for the code above will look something like:
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i8 load %index))))
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (add (i8 load %index)
(i8 1))))
vs
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
radar://13536387
llvm-svn: 178483
|
|
|
|
|
|
|
|
| |
that case.
Fixes the crash reported in PR15608.
llvm-svn: 178429
|
|
|
|
|
|
|
|
| |
- Handle the case where the result of 'insert_subvect' is bitcasted
before 'extract_subvec'. This removes the redundant insertf128/extractf128
pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.
llvm-svn: 177945
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For instance, following transformation will be disabled:
x + x + x => 3.0f * x;
The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.
Reviewed by Nadav, thanks a lot!
rdar://13445387
llvm-svn: 177933
|
|
|
|
| |
llvm-svn: 176881
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.
v2:
- Expand more BR_CC value types for NVPTX
v3:
- Expand correct BR_CC value types for Hexagon, Mips, and XCore.
llvm-svn: 176694
|
|
|
|
|
|
|
|
| |
A legal BUILD_VECTOR goes in and gets constant folded into another legal
BUILD_VECTOR so we don't lose any legality here. The problematic PPC
optimization that made this check necessary was fixed recently.
llvm-svn: 175759
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y)))
can be folded into a (2xi32) (buildvector i32 a, i32 b).
Such a DAG would cause uneccessary vdup instructions followed by vmovn
instructions.
We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in
the vectorized version of the code below.
double A[N];
double B[N];
void test_double_compare_to_double() {
int i;
for(i=0;i<N;i++)
A[i] = (double)(A[i] < B[i]);
}
radar://13191881
Fixes bug 15283.
llvm-svn: 175670
|
|
|
|
| |
llvm-svn: 175190
|
|
|
|
|
|
| |
post-operand legalization.
llvm-svn: 175149
|
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner::ReduceLoadWidth was converting (trunc i32 (shl i64 v, 32))
into (shl i32 v, 32) into undef. To prevent this, check the shift count
against the final result size.
Patch by: Kevin Schoedel
Reviewed by: Nadav Rotem
llvm-svn: 174972
|
|
|
|
|
|
|
|
| |
Sorry for the lack of a test case. I tried writing one for i386 as i know selects are illegal on this target, but they are actually considered legal by isel and expanded later.
I can't see any targets to trigger this, but checking for the legality of a node before forming it is general goodness.
llvm-svn: 174934
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, even when a pre-increment load or store was generated,
we often needed to keep a copy of the original base register for use
with other offsets. If all of these offsets are constants (including
the offset which was combined into the addressing mode), then this is
clearly unnecessary. This change adjusts these other offsets to use the
new incremented address.
llvm-svn: 174746
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
differentiate between the alignment of the
base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the
original application of this patch.
ORIGINAL COMMIT LOG:
When the target-independent DAGCombiner inferred a higher alignment for a load,
it would replace the load with one with the higher alignment. However, it did
not place the new load in the worklist, which prevented later DAG combines in
the same phase (for example, target-specific combines) from ever seeing it.
This patch corrects that oversight, and updates some tests whose output changed
due to slightly different DAGCombine outputs.
llvm-svn: 174431
|