|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| ... |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977 | 
| | 
| 
| 
| 
| 
| 
| 
| | All other patches, including tests will follow.
http://reviews.llvm.org/D7665
llvm-svn: 235970 | 
| | 
| 
| 
| 
| 
| 
| 
| | Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Test case spotted by Patrik Hagglund
llvm-svn: 235371 | 
| | 
| 
| 
| 
| 
| | Fix for test case found by James Molloy - TRUNCATE of constant build vectors can be more simply achieved by simply replacing with a new build vector node with the truncated value type - no need to touch the scalar operands at all.
llvm-svn: 235079 | 
| | 
| 
| 
| | llvm-svn: 234764 | 
| | 
| 
| 
| 
| 
| | Same as r234255, but for lib/CodeGen and lib/Target.
llvm-svn: 234258 | 
| | 
| 
| 
| 
| 
| | Differential Revision: http://reviews.llvm.org/D8715
llvm-svn: 234179 | 
| | 
| 
| 
| | llvm-svn: 234106 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.
If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.
llvm-svn: 234038 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:
  1. If the preferred type is a floating-point value, do the mul-splat trick on
     the corresponding integer type and then bitcast.
  2. If the preferred type is a vector, do the mul-splat trick on one vector
     element, and then build a vector out of them.
Fixes PR22754 (although, we should also turn off use of vector types at -O0).
llvm-svn: 233749 | 
| | 
| 
| 
| 
| 
| | Update lib/CodeGen (and lib/Target) to use the new `DebugLoc` API.
llvm-svn: 233582 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
llvm-svn: 233224 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | as sitofp
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
llvm-svn: 233033 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code.
This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it.
Differential Revision: http://reviews.llvm.org/D8139
llvm-svn: 231581 | 
| | 
| 
| 
| | llvm-svn: 230973 | 
| | 
| 
| 
| | llvm-svn: 230972 | 
| | 
| 
| 
| | llvm-svn: 230948 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is a follow-on patch to:
http://reviews.llvm.org/D7093
That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.
This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283
The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...
This improves an existing x86 AVX test and changes codegen in an ARM test.
Differential Revision: http://reviews.llvm.org/D7389
llvm-svn: 229511 | 
| | 
| 
| 
| 
| 
| | Same functionality, but hoists the vector growth out of the loop.
llvm-svn: 229500 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | directly into blends of the splats.
These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.
Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.
llvm-svn: 229308 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)
Also, add `Function::getFnStackAlignment()`, and canonicalize:
getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()
llvm-svn: 229208 | 
| | 
| 
| 
| 
| 
| | TSI is not guaranteed be non-null in SelectionDAG.
llvm-svn: 227397 | 
| | 
| 
| 
| | llvm-svn: 227119 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.
Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.
llvm-svn: 227002 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()
Reviewers: resistor
Reviewed By: resistor
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D6912
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 226901 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | v2: use getZExtValue
    add missing break
    codestyle
v3: add few more comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226880 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
Fixed recommit of r226811.
llvm-svn: 226816 | 
| | 
| 
| 
| | llvm-svn: 226814 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
llvm-svn: 226811 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.
llvm-svn: 226808 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.
This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.
This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.
llvm-svn: 225925 | 
| | 
| 
| 
| | llvm-svn: 225836 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 
This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.
The codegen change in the new test case improves from:
vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0
To:
vmovups	32(%rdi), %ymm0
An existing test case is also improved from:
vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3
To:
vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1
This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).
Differential Revision: http://reviews.llvm.org/D6642
llvm-svn: 224379 | 
| | 
| 
| 
| 
| 
| | parity with F32 and F64.
llvm-svn: 223760 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 223348 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
Differential Revision: http://reviews.llvm.org/D5699
llvm-svn: 222340 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334 | 
| | 
| 
| 
| | llvm-svn: 221854 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893
Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.
llvm-svn: 221709 | 
| | 
| 
| 
| | llvm-svn: 219588 | 
| | 
| 
| 
| 
| 
| | a cached TLI instance.
llvm-svn: 219342 | 
| | 
| 
| 
| 
| 
| | inside init rather than have it passed in as an argument.
llvm-svn: 219270 | 
| | 
| 
| 
| 
| 
| | during init rather than construction time.
llvm-svn: 219262 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787 | 
| | 
| 
| 
| 
| 
| | "Move the complex address expression out of DIVariable and into an extra"
llvm-svn: 218782 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.
This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.
llvm-svn: 216716 |