| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296060
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables support for .f16x2 operations.
Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.
Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310
llvm-svn: 296032
|
| |
|
|
|
|
|
|
|
|
|
| |
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.
The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.
llvm-svn: 296031
|
| |
|
|
| |
llvm-svn: 296026
|
| |
|
|
|
|
| |
The TLS slot did not exist back then.
llvm-svn: 296014
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.
We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).
The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors. It's not clear whether that
would be useful for other remarks though, so we're not there yet.
llvm-svn: 296013
|
| |
|
|
|
|
|
|
|
|
| |
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.
Differential Revision: https://reviews.llvm.org/D29835
llvm-svn: 296009
|
| |
|
|
| |
llvm-svn: 295997
|
| |
|
|
|
|
|
|
| |
Hit on ASICs that support 16bit instructions.
Differential Revision: https://reviews.llvm.org/D30281
llvm-svn: 295990
|
| |
|
|
| |
llvm-svn: 295981
|
| |
|
|
|
|
|
|
| |
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).
llvm-svn: 295973
|
| |
|
|
|
|
|
|
| |
Add support for lowering calls with parameters than can fit into regs. Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.
llvm-svn: 295971
|
| |
|
|
|
|
|
|
| |
class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).
llvm-svn: 295970
|
| |
|
|
|
|
|
|
|
|
| |
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.
This fixes PR31997.
llvm-svn: 295964
|
| |
|
|
|
|
|
|
|
|
| |
between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.
Differential Revision: https://reviews.llvm.org/D29988
llvm-svn: 295940
|
| |
|
|
|
|
|
| |
This is the pattern that falls out of the instruction's
definition if offset == 0.
llvm-svn: 295912
|
| |
|
|
| |
llvm-svn: 295908
|
| |
|
|
|
|
|
|
|
|
|
| |
The manual is unclear on the details of this. It's not
clear to me if denormals are not allowed with clamp,
or if that is only omod. Not allowing denorms for
fp16 or fp64 isn't useful so I also question if that
is really a restriction. Same with whether this is valid
without IEEE mode enabled.
llvm-svn: 295905
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295904
|
| |
|
|
| |
llvm-svn: 295899
|
| |
|
|
|
|
| |
Fixes the build.
llvm-svn: 295895
|
| |
|
|
| |
llvm-svn: 295892
|
| |
|
|
|
|
|
|
|
| |
This should avoid reporting any stack needs to be allocated in the
case where no stack is truly used. An unused stack slot is still
left around in other cases where there are real stack objects
but no spilling occurs.
llvm-svn: 295891
|
| |
|
|
|
|
| |
Patch by Harsha Jagasia.
llvm-svn: 295879
|
| |
|
|
|
|
| |
Fixes not adjusting using new intrinsics with chains.
llvm-svn: 295878
|
| |
|
|
|
|
|
|
|
| |
This allows us to ensure that 0 is never a valid pointer
to a user object, and ensures that the offset is always legal
without needing a register to access it. This comes at the cost
of usable offsets and wasted stack space.
llvm-svn: 295877
|
| |
|
|
| |
llvm-svn: 295873
|
| |
|
|
|
|
| |
This reverts commit r295867.
llvm-svn: 295871
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295867
|
| |
|
|
| |
llvm-svn: 295864
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend AArch64RedundantCopyElimination to catch cases where the register
that is known to be zero is COPY'd in the predecessor block. Before
this change, this pass would catch cases like:
CBZW %W0, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
After this change, cases like the one below are also caught:
%W0 = COPY %W1
CBZW %W1, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
This change results in a 4% increase in static copies removed by this
pass when compiling the llvm test-suite. It also fixes regressions
caused by doing post-RA copy propagation (a separate change to be put up
for review shortly).
Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30113
llvm-svn: 295863
|
| |
|
|
|
|
|
| |
llc mir output goes to stdout nowadays, so the 2>&1 is not necessary
anymore for most tests.
llvm-svn: 295859
|
| |
|
|
| |
llvm-svn: 295850
|
| |
|
|
|
|
|
|
|
|
|
| |
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE. See
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.
Reverting as per the developer's request.
llvm-svn: 295849
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .
Reviewers: qcolombet, rovka, zvi, ab
Reviewed By: rovka
Subscribers: mgorny, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D29816
llvm-svn: 295824
|
| |
|
|
| |
llvm-svn: 295819
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
llvm-svn: 295816
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
llvm-svn: 295811
|
| |
|
|
|
|
|
|
|
|
|
|
| |
when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.
I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.
Differential revision: https://reviews.llvm.org/D30186
llvm-svn: 295810
|
| |
|
|
|
|
| |
Convert llvm.SI.packf16 test uses
llvm-svn: 295797
|
| |
|
|
|
|
| |
Merge some of the old, smaller tests into more complete versions.
llvm-svn: 295792
|
| |
|
|
| |
llvm-svn: 295789
|
| |
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
llvm-svn: 295784
|
| |
|
|
|
|
|
| |
Add test case from https://reviews.llvm.org/D28698 that was somehow lost in
transit.
llvm-svn: 295775
|
| |
|
|
|
|
|
| |
Add test case from https://reviews.llvm.org/D28491 that was somehow lost in
transit.
llvm-svn: 295774
|
| |
|
|
|
|
| |
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).
llvm-svn: 295762
|
| |
|
|
| |
llvm-svn: 295757
|
| |
|
|
| |
llvm-svn: 295755
|
| |
|
|
| |
llvm-svn: 295754
|