| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 295873
|
| |
|
|
|
|
| |
This reverts commit r295867.
llvm-svn: 295871
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295867
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend AArch64RedundantCopyElimination to catch cases where the register
that is known to be zero is COPY'd in the predecessor block. Before
this change, this pass would catch cases like:
CBZW %W0, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
After this change, cases like the one below are also caught:
%W0 = COPY %W1
CBZW %W1, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
This change results in a 4% increase in static copies removed by this
pass when compiling the llvm test-suite. It also fixes regressions
caused by doing post-RA copy propagation (a separate change to be put up
for review shortly).
Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30113
llvm-svn: 295863
|
| |
|
|
|
|
|
|
|
|
| |
LLVM CodeGen emits references to external symbols that are never declared in
LLVM IR level, so they have no declared signature. However, WebAssembly requires
all functions be declared with signatures. This patch adds a table for providing
signatures for known runtime libcalls that will be used in subsequent patches to
emit declarations for such functions.
llvm-svn: 295857
|
| |
|
|
| |
llvm-svn: 295856
|
| |
|
|
| |
llvm-svn: 295855
|
| |
|
|
| |
llvm-svn: 295850
|
| |
|
|
|
|
|
|
| |
into masks.
Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead.
llvm-svn: 295848
|
| |
|
|
|
|
| |
separately. NFCI.
llvm-svn: 295845
|
| |
|
|
|
|
| |
checking
llvm-svn: 295830
|
| |
|
|
| |
llvm-svn: 295827
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .
Reviewers: qcolombet, rovka, zvi, ab
Reviewed By: rovka
Subscribers: mgorny, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D29816
llvm-svn: 295824
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
llvm-svn: 295816
|
| |
|
|
|
|
|
|
| |
Change integer memory operands to FP memory operands to some FP instructions.
Differential Revision: https://reviews.llvm.org/D30201
llvm-svn: 295813
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
llvm-svn: 295811
|
| |
|
|
|
|
|
|
|
|
|
|
| |
when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.
I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.
Differential revision: https://reviews.llvm.org/D30186
llvm-svn: 295810
|
| |
|
|
|
|
|
|
|
| |
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.
Differential Revision: https://reviews.llvm.org/D26722
llvm-svn: 295803
|
| |
|
|
|
|
| |
Convert llvm.SI.packf16 test uses
llvm-svn: 295797
|
| |
|
|
| |
llvm-svn: 295789
|
| |
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
llvm-svn: 295784
|
| |
|
|
| |
llvm-svn: 295783
|
| |
|
|
| |
llvm-svn: 295777
|
| |
|
|
|
|
| |
If both instrs are wild cards, the result can be a crash.
llvm-svn: 295776
|
| |
|
|
|
|
| |
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).
llvm-svn: 295762
|
| |
|
|
| |
llvm-svn: 295754
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
llvm-svn: 295753
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.
The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.
This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, mcrosier, sebpop, llvm-commits
Differential Revision: https://reviews.llvm.org/D28813
llvm-svn: 295746
|
| |
|
|
|
|
| |
There appears never to have been a time that the reference was updated.
llvm-svn: 295739
|
| |
|
|
|
|
| |
As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ.
llvm-svn: 295733
|
| |
|
|
|
|
|
|
|
|
| |
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.
Differential Revision: https://reviews.llvm.org/D30199
llvm-svn: 295732
|
| |
|
|
|
|
| |
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.
llvm-svn: 295723
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D30189
llvm-svn: 295718
|
| |
|
|
|
|
|
| |
For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair
with amount 0.
llvm-svn: 295716
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Reviewers: zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30181
llvm-svn: 295697
|
| |
|
|
| |
llvm-svn: 295695
|
| |
|
|
|
|
| |
in some patterns.
llvm-svn: 295693
|
| |
|
|
| |
llvm-svn: 295691
|
| |
|
|
|
|
| |
I will add one more use for this in a later change.
llvm-svn: 295685
|
| |
|
|
|
|
| |
broadcast loads where the passthru operand is not operand 0.
llvm-svn: 295673
|
| |
|
|
|
|
|
|
| |
Pull out repeated code for extraction index operand and source vector value type.
Use isNullConstant helper to check for zero extraction index.
llvm-svn: 295670
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.
In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.
llvm-svn: 295662
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Removing the VINSERT node, we don't need it any more.
Differential Revision: https://reviews.llvm.org/D29690
llvm-svn: 295660
|
| |
|
|
|
|
|
|
| |
Use v8i64 ASHR instructions if we don't have VLX.
Differential Revision: https://reviews.llvm.org/D28537
llvm-svn: 295656
|
| |
|
|
|
|
|
|
|
| |
Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup
that removes almost identical functions that differ only in a few constants.
Differential Revision: https://reviews.llvm.org/D30160
llvm-svn: 295649
|
| |
|
|
|
|
|
|
|
|
|
| |
all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).
Differential Revision: https://reviews.llvm.org/D29876
llvm-svn: 295643
|
| |
|
|
|
|
| |
passthru isn't operand 0.
llvm-svn: 295640
|
| |
|
|
|
|
| |
patterns.
llvm-svn: 295638
|
| |
|
|
|
|
| |
that aren't in operand 2.
llvm-svn: 295634
|