| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
patches that aren't yet checked in.
llvm-svn: 247011
|
| |
|
|
|
|
|
|
| |
Added tests for encoding.
Differential Revision: http://reviews.llvm.org/D12061
llvm-svn: 247010
|
| |
|
|
| |
llvm-svn: 247008
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: And define them to have noop casts with address spaces 0-255.
Reviewers: pekka.jaaskelainen
Subscribers: pekka.jaaskelainen, llvm-commits
Differential Revision: http://reviews.llvm.org/D12678
llvm-svn: 246990
|
| |
|
|
|
|
|
|
| |
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.
Differential Revision: http://reviews.llvm.org/D10683
llvm-svn: 246981
|
| |
|
|
| |
llvm-svn: 246949
|
| |
|
|
|
|
| |
Only works for avx512f (dq) targets so far - need to add avx512bw tests once char/short shifts are supported.
llvm-svn: 246943
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In searching for a fix for the underlying code-quality bug highlighted by
r246937 (that SDAG simplification can lead to us generating an ISD::OR node
with a constant zero LHS), I ran across this:
We generically canonicalize commutative binary-operation nodes in SDAG getNode
so that, if only one operand is a constant, it will be on the RHS. However, we
were doing this only after a bunch of constant-based simplification checks that
all assume this canonical form (that any constant will be on the RHS). Moving
the operand-swapping canonicalization prior to these checks seems like the
right thing to do (and, as it turns out, causes SDAG to completely fold away the
computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine
would do).
llvm-svn: 246938
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Fixes PR24719.
llvm-svn: 246937
|
| |
|
|
| |
llvm-svn: 246927
|
| |
|
|
| |
llvm-svn: 246922
|
| |
|
|
| |
llvm-svn: 246921
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Fixes PR24704.
llvm-svn: 246900
|
| |
|
|
| |
llvm-svn: 246863
|
| |
|
|
|
|
| |
Using generic neon syntax to avoid test failure on apple platforms.
llvm-svn: 246833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.
The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.
llvm-svn: 246813
|
| |
|
|
|
|
|
|
|
|
|
| |
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.
I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.
llvm-svn: 246805
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In vectorized add reduction code, the final "reduce" step is sub-optimal.
This change wll combine :
ext v1.16b, v0.16b, v0.16b, #8
add v0.4s, v1.4s, v0.4s
dup v1.4s, v0.s[1]
add v0.4s, v1.4s, v0.4s
into
addv s0, v0.4s
PR21371
http://reviews.llvm.org/D12325
Patch by Jun Bum Lim <junbuml@codeaurora.org>!
llvm-svn: 246790
|
| |
|
|
| |
llvm-svn: 246785
|
| |
|
|
|
|
|
|
| |
This reverts commit r246769.
This appears to have broken Multisource/Benchmarks/tramp3d-v4.
llvm-svn: 246782
|
| |
|
|
| |
llvm-svn: 246781
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned
16-byte (SSE) memops for x86 targets where those accesses are slow.
This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905
and will help solve PR21711.
Differential Revision: http://reviews.llvm.org/D12573
llvm-svn: 246771
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
llvm-svn: 246769
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This intrinsic can be used to extract a pointer to the exception caught by
a given catchpad. Its argument has token type and must be a `catchpad`.
Also clarify ExtendingLLVM documentation regarding overloaded intrinsics.
Reviewers: majnemer, andrew.w.kaylor, sanjoy, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12533
llvm-svn: 246752
|
| |
|
|
|
|
|
|
|
|
| |
vpconflictq, vpconflictd
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11931
llvm-svn: 246750
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.
Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.
Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.
llvm-svn: 246733
|
| |
|
|
|
| |
Also: add a missing test for movntiq.
llvm-svn: 246730
|
| |
|
|
|
|
|
|
|
|
| |
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.
llvm-svn: 246712
|
| |
|
|
|
|
|
|
| |
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12342
llvm-svn: 246692
|
| |
|
|
|
|
|
|
| |
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12343
llvm-svn: 246691
|
| |
|
|
|
|
|
|
|
|
|
| |
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.
llvm-svn: 246675
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a continuation of the fix from:
http://reviews.llvm.org/D10662
and discussion in:
http://reviews.llvm.org/D12154
Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned
scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType)
assumes that unaligned scalar accesses are always ok, so this changes
allowsMisalignedMemoryAccesses() to match that behavior.
Differential Revision: http://reviews.llvm.org/D12543
llvm-svn: 246658
|
| |
|
|
|
|
|
|
|
| |
add byte shift left/right
add SAD - compute sum of absolute differences
Differential Revision: http://reviews.llvm.org/D12479
llvm-svn: 246654
|
| |
|
|
|
|
|
|
|
|
| |
instructions
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11593
llvm-svn: 246642
|
| |
|
|
|
|
|
|
| |
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11709
llvm-svn: 246640
|
| |
|
|
|
|
|
|
| |
Enabled DAG pattern lowering for SKX with DQI predicate.
Differential Revision: http://reviews.llvm.org/D12550
llvm-svn: 246625
|
| |
|
|
|
|
|
|
|
| |
Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence.
While looking for uniform base, we want to use the scalar base pointer of GEP, if exists.
Differential Revision: http://reviews.llvm.org/D11121
llvm-svn: 246622
|
| |
|
|
|
|
|
|
| |
Patch by Dylan McKay!
Differential Revision: http://reviews.llvm.org/D12099
llvm-svn: 246615
|
| |
|
|
|
|
|
|
|
| |
The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only
takes constant indices, but it does accept variables.
Bail out for those: we can't use them, as the shuffles we want to
reconstruct do require constant masks.
llvm-svn: 246594
|
| |
|
|
|
|
|
|
|
|
| |
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.
Differential Revision: http://reviews.llvm.org/D12425
llvm-svn: 246555
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12526
llvm-svn: 246551
|
| |
|
|
| |
llvm-svn: 246547
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Interleaved access lowering removes a memory operation and a
sequence of vector shuffles and replaces it with a series of
memory operations. This should be always beneficial.
This pass in only enabled on ARM/AArch64.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12145
llvm-svn: 246540
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generated in lowering switch.
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.
For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.
There are some exceptions:
For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.
Differential Revision: http://reviews.llvm.org/D12418
llvm-svn: 246522
|
| |
|
|
| |
llvm-svn: 246514
|
| |
|
|
| |
llvm-svn: 246513
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 246507
|
| |
|
|
|
|
|
|
|
| |
Summary: This handles all load/store operations that WebAssembly defines, and handles those necessary for C++ such as i1. I left a FIXME for outstanding features which aren't required for now.
Reviewers: sunfish
Subscribers: jfb, llvm-commits, dschuff
llvm-svn: 246500
|
| |
|
|
| |
llvm-svn: 246485
|
| |
|
|
| |
llvm-svn: 246481
|