| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
AESE Vn, _
AESMC Vn, Vn
and
AESD Vn, _
AESIMC Vn, Vn
The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.
A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.
I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.
Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga
Reviewed By: t.p.northover, evandro
Subscribers: aemerson, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35299
llvm-svn: 309495
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change gives a 0.25% speedup on execution time, a 0.82% improvement
in benchmark scores and a 0.20% increase in binary size on a Cortex-A53.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite and a range of proprietary suites.
Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin
Reviewed By: rengolin
Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35568
llvm-svn: 309494
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch is in 2 parts:
1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.
2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.
Differential Revision: https://reviews.llvm.org/D35896
llvm-svn: 309486
|
| |
|
|
|
|
|
| |
Checking the encoding is insufficient since now there can
be global or scratch instructions.
llvm-svn: 309472
|
| |
|
|
|
|
|
|
| |
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.
llvm-svn: 309471
|
| |
|
|
| |
llvm-svn: 309470
|
| |
|
|
| |
llvm-svn: 309447
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conditional tail call logic did the wrong thing when both
destinations of a conditional branch were the same:
BB#1: derived from LLVM BB %entry
Live Ins: %EFLAGS
Predecessors according to CFG: BB#0
JE_1 <BB#5>, %EFLAGS<imp-use,kill>
JMP_1 <BB#5>
BB#5: derived from LLVM BB %sw.epilog
Predecessors according to CFG: BB#1
TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ...
We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5
successor. Then BB#5 would be deleted as it had no predecessors, leaving
a dangling "JMP_1 <BB#5>" reference behind to cause assertions later.
This patch checks that both conditional branch destinations are
different before doing the transform. The standard branch folding logic
is able to remove both the JMP_1 and the JE_1, and for my test case we
end up forming a better conditional tail call later.
Fixes PR33980
llvm-svn: 309422
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.
This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.
llvm-svn: 309419
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.
I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.
I'm also not sure where it should run.I think it should
run before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.
llvm-svn: 309416
|
| |
|
|
|
|
|
| |
Eventually we may want to allow a pair of GPRs but absolutely nothing in the
entire world is ready for that yet.
llvm-svn: 309404
|
| |
|
|
|
|
|
|
|
|
|
| |
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.
Also fixes missing test for the intrinsic.
llvm-svn: 309398
|
| |
|
|
|
|
|
|
|
| |
This patch enables choice for accessing thread local
storage pointer (like '-mtp' in gcc).
Differential Revision: https://reviews.llvm.org/D34408
llvm-svn: 309381
|
| |
|
|
| |
llvm-svn: 309375
|
| |
|
|
| |
llvm-svn: 309374
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ARM Runtime ABI document (IHI0043) defines the AEABI floating point
helper functions in section 4.1.2 The floating-point helper functions.
The functions listed in this section must always use the base AAPCS calling
convention.
This test generates calls to all the helper functions that llvm supports
and checks that the base AAPCS calling convention has been used. We test
the equivalent of -mfloat-abi=soft, -mfloat-abi=softfp, -mfloat-abi=hardfp
with an FPU that supports single and double precision, and one that only
supports double precision.
Differential Revision: https://reviews.llvm.org/D35904
llvm-svn: 309371
|
| |
|
|
|
|
|
|
|
|
| |
The code assumed that unclobbered/unspilled callee saved registers are
unused in the function. This is not true for callee saved registers that are
also used to pass parameters such as swiftself.
rdar://33401922
llvm-svn: 309350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.
Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.
Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.
This was http://crbug.com/749826
llvm-svn: 309343
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573: when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.
This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node. r301019 added such an
assumption, which was broken by the TBI combine.
Instead, pass the correct flags to TLO.
llvm-svn: 309323
|
| |
|
|
| |
llvm-svn: 309315
|
| |
|
|
|
|
| |
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.
llvm-svn: 309302
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D35839
llvm-svn: 309298
|
| |
|
|
|
|
| |
Differential Revision https://reviews.llvm.org/D35834
llvm-svn: 309269
|
| |
|
|
| |
llvm-svn: 309266
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D35886
llvm-svn: 309262
|
| |
|
|
|
|
|
|
|
|
| |
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.
Differential Revision: https://reviews.llvm.org/D35801
llvm-svn: 309255
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64.
The xor is used to extract only the changed bits. In case of a simple
if region where the only use of that value is in the SI_END_CF to
restore the old exec mask, we can omit the xor and perform an or of
the exec mask with the original exec value saved by the
s_and_saveexec_b64.
Differential Revision: https://reviews.llvm.org/D35861
llvm-svn: 309185
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D35146
llvm-svn: 309178
|
| |
|
|
|
|
| |
These are not actually uniform values except in kernels.
llvm-svn: 309172
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35127
llvm-svn: 309165
|
| |
|
|
|
|
|
|
| |
throughput can't be calculated.
Differential revision https://reviews.llvm.org/D35831
llvm-svn: 309156
|
| |
|
|
| |
llvm-svn: 309139
|
| |
|
|
| |
llvm-svn: 309138
|
| |
|
|
| |
llvm-svn: 309137
|
| |
|
|
| |
llvm-svn: 309136
|
| |
|
|
| |
llvm-svn: 309124
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.
Example:
v4i32 build_vector((extract_elt V, 1),
(extract_elt V, 3),
(extract_elt V, 5),
(extract_elt V, 7))
-->
v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)
Reviewers: delena, RKSimon, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35700
llvm-svn: 309108
|
| |
|
|
|
|
| |
Test on 32/64 bit targets where appropriate
llvm-svn: 309107
|
| |
|
|
| |
llvm-svn: 309104
|
| |
|
|
| |
llvm-svn: 309103
|
| |
|
|
| |
llvm-svn: 309102
|
| |
|
|
|
|
| |
A G_GLOBAL_VALUE is basically a pointer, so it should live in the GPR.
llvm-svn: 309101
|
| |
|
|
|
|
| |
Cleaned up triple settings, added 32-bit/64-bit targets where useful, added broadcast comments
llvm-svn: 309100
|
| |
|
|
| |
llvm-svn: 309099
|
| |
|
|
| |
llvm-svn: 309098
|
| |
|
|
|
|
| |
Remove unused KNL checks and triple settings, added broadcast comments
llvm-svn: 309097
|
| |
|
|
|
|
| |
Tidied up triples and checks.
llvm-svn: 309095
|
| |
|
|
| |
llvm-svn: 309093
|
| |
|
|
| |
llvm-svn: 309090
|