| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
caused by defect in isBuildVectorAllZeros().
llvm-svn: 211567
|
| |
|
|
|
|
|
|
|
| |
We handle this by spilling the whole thing to the stack and doing the
insertion as a store.
PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled.
llvm-svn: 211435
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this patch, range metadata can be added to call/invoke including
IntrinsicInst. Previously, it could only be added to load.
Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because
range metadata is not only used by load.
Update the language reference to reflect this change.
Test Plan:
Add several tests in range-2.ll to confirm the verifier is happy with
having range metadata on call/invoke.
Add two tests in AddOverFlow.ll to confirm annotating range metadata to
call/invoke can benefit InstCombine.
Reviewers: meheff, nlewycky, reames, hfinkel, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4187
llvm-svn: 211281
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like there are two versions of LowerCallTo here: the
SelectionDAGBuilder one is designed to operate on LLVM IR, and the
TargetLowering one in the case where everything is at DAG level.
Previously, only the SelectionDAGBuilder variant could handle demoting
an impossible return to sret semantics (before delegating to the
TargetLowering version), but this functionality is also useful for
certain libcalls (e.g. 128-bit operations on 32-bit x86). So this
commit moves the sret handling down a level.
rdar://problem/17242889
llvm-svn: 211155
|
| |
|
|
| |
llvm-svn: 211108
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It's valid to use FP_TO_SINT when asking for a smaller type (e.g. all
"unsigned int16" values fit into a "signed int32"), but the reverse
isn't true.
Unfortunately, I'm not actually aware of any architecture with
asymmetric FP_TO_SINT and FP_TO_UINT handling and the logic happens to
work in the symmetric case, so I can't actually write a test for this.
llvm-svn: 210986
|
| |
|
|
|
|
| |
so make it take one. Fix up all users accordingly.
llvm-svn: 210948
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
llvm-svn: 210903
|
| |
|
|
|
|
|
| |
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).
llvm-svn: 210863
|
| |
|
|
|
|
|
|
| |
This commit adds MachineMemOperands to load and store instructions. This allows
the peephole optimizer to fold load instructions. Unfortunatelly the peephole
optimizer currently doesn't run at -O0.
llvm-svn: 210858
|
| |
|
|
|
|
|
|
|
| |
y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.
llvm-svn: 210792
|
| |
|
|
|
|
| |
This implements target-independent FastISel lowering for the stackmap intrinsic.
llvm-svn: 210742
|
| |
|
|
|
|
|
| |
version of TargetLowering/Machine from there on the way to avoiding
TargetMachine in TargetLowering.
llvm-svn: 210579
|
| |
|
|
|
|
|
| |
Add more instruction-specific statistics about failing intrinsic calls during
FastISel.
llvm-svn: 210556
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The SelectionDAG bad a special case for ISD::SELECT_CC, where it would
allow targets to specify:
setOperationAction(ISD::SELECT_CC, MVT::Other, Expand);
to indicate that they wanted to expand ISD::SELECT_CC for all types.
This wasn't applied correctly everywhere, and it makes writing new
DAG patterns with ISD::SELECT_CC difficult.
llvm-svn: 210541
|
| |
|
|
|
|
|
|
|
|
| |
vectors
This prevents a future commit from regressing:
test/CodeGen/R600/setcc-equivalent.ll
llvm-svn: 210540
|
| |
|
|
|
|
|
|
| |
This consolidates code from the Hexagon, R600, and XCore targets.
No functionality change intended.
llvm-svn: 210539
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.
This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:
(insert_vector_elt (insert_vector_elt A, I0), I1) ->
(insert_vecto_elt (insert_vector_elt A, I1), I0)
This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.
llvm-svn: 210477
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies SelectionDAGBuilder to construct SDNodes with associated
NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator
instructions.
Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing
nsw/nuw/exact flags during codegen.
Patch by Marcello Maggioni.
llvm-svn: 210467
|
| |
|
|
|
|
| |
a TargetMachine since the only thing it wants is DataLayout.
llvm-svn: 210366
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DAG cycle detection is only enabled with ENABLE_EXPENSIVE_CHECKS. However we
can run it just before we would crash in order to provide more informative
diagnostics.
Now in addition to the "Overran sorted position" message we also get the Node
printed if a cycle was detected.
Tested by building several configs: Debug+Assert, Debug+Assert+Check (this is
ENABLE_EXPENSIVE_CHECKS), Release+Assert and Release. Also tried that the
AssignTopologicalOrder assert produces the expected results.
llvm-svn: 209977
|
| |
|
|
|
|
|
|
|
| |
Pass the DAG down to checkForCycles from all callers where we have it. This
allows target-specific nodes to be printed properly.
Also print some missing newlines.
llvm-svn: 209976
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.
This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
(shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
(shuffle (BINOP A, B), Undef, <Mask>).
Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.
llvm-svn: 209930
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.
Added a test.
The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.
Reviewers: nadav, delena, grosbach, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3916
llvm-svn: 209929
|
| |
|
|
|
|
| |
Patch by Andrey Kuharev.
llvm-svn: 209902
|
| |
|
|
|
|
|
|
|
| |
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.
rdar://problem/16548260
llvm-svn: 209901
|
| |
|
|
| |
llvm-svn: 209798
|
| |
|
|
|
|
|
|
| |
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.
llvm-svn: 209788
|
| |
|
|
|
|
|
|
|
| |
No test because no in-tree targets change the bitwidth of the
setcc type depending on the bitwidth of the compared type.
Patch by Ke Bai
llvm-svn: 209771
|
| |
|
|
|
|
|
|
|
|
| |
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).
Clang still has to be updated to expose this feature to C.
llvm-svn: 209759
|
| |
|
|
|
|
|
|
|
|
| |
value is live"
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
llvm-svn: 209747
|
| |
|
|
|
|
|
|
|
|
|
| |
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
llvm-svn: 209650
|
| |
|
|
| |
llvm-svn: 209202
|
| |
|
|
|
|
|
|
|
|
| |
bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
|
| |
|
|
|
|
|
|
|
|
| |
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern. This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions. No functional change beyond the removal of the old
constructors are intended.
llvm-svn: 209082
|
| |
|
|
|
|
|
|
|
| |
This is a preliminary step to help ease the construction of CallLoweringInfo.
Changing the construction to a chained function pattern requires that the
parameter be nullable. However, rather than copying the vector, save a pointer
rather than the reference to permit a late binding of the arguments.
llvm-svn: 209080
|
| |
|
|
| |
llvm-svn: 209040
|
| |
|
|
|
|
|
| |
computeKnownBits, consolidate them into one assert at the end of
computeKnownBits itself.
llvm-svn: 208876
|
| |
|
|
|
|
| |
inappropriate since it lost its Mask parameter in r154011.
llvm-svn: 208811
|
| |
|
|
|
|
| |
in r154011.
llvm-svn: 208757
|
| |
|
|
|
|
|
|
|
|
|
|
| |
any targets so I couldn't find a test case to trigger it.
The problem occurs when a non-i1 setcc is inverted. For example 'i8 = setcc' will get 'xor 0xff' to invert this. This is clearly wrong when the boolean contents are ZeroOrOne.
This patch introduces getLogicalNOT and updates SetCC legalisation to use it.
Reviewed by Hal Finkel.
llvm-svn: 208641
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
llvm-svn: 208640
|
| |
|
|
| |
llvm-svn: 208598
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
llvm-svn: 208508
|
| |
|
|
|
|
|
|
|
|
| |
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
llvm-svn: 208413
|
| |
|
|
|
|
|
|
|
|
|
| |
When reducing the bitwidth of a comparison against a constant, the
original setcc's result type was used, which was incorrect.
No test since I don't think any other in tree targets change the
bitwidth of the setcc type depending on the bitwidth of the compared
type.
llvm-svn: 208236
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
|
| |
|
|
| |
llvm-svn: 207871
|
| |
|
|
| |
llvm-svn: 207850
|
| |
|
|
|
|
| |
vector VT but scalar values.
llvm-svn: 207835
|