| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.
llvm-svn: 293915
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds support for BEQL and BNEL macros with immediate operands.
Patch by: Srdjan Obucina
Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno
Differential Revision: https://reviews.llvm.org/D17040
llvm-svn: 293905
|
|
|
|
|
|
|
| |
(Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp)
Review: Ulrich Weigand
llvm-svn: 293900
|
|
|
|
|
|
| |
Patch by Colin LeMahieu.
llvm-svn: 293899
|
|
|
|
| |
llvm-svn: 293894
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
|
|
|
|
|
|
|
|
| |
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.
llvm-svn: 293892
|
|
|
|
|
|
|
|
|
| |
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
llvm-svn: 293889
|
|
|
|
|
|
|
|
| |
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).
llvm-svn: 293888
|
|
|
|
|
|
|
| |
Make it legal to load pointer values. Also check that pointers are assigned
to the GPR reg bank by default.
llvm-svn: 293886
|
|
|
|
|
|
|
|
|
|
| |
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
Differential Revision: https://reviews.llvm.org/D28810
llvm-svn: 293880
|
|
|
|
| |
llvm-svn: 293873
|
|
|
|
|
|
|
|
| |
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.
This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.
llvm-svn: 293872
|
|
|
|
| |
llvm-svn: 293862
|
|
|
|
|
|
|
|
|
|
|
| |
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
|
|
|
|
|
|
|
| |
After marking a 32bit register and all its super registers the 64bit
register does not need to be marked again.
llvm-svn: 293855
|
|
|
|
| |
llvm-svn: 293851
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28689
llvm-svn: 293844
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.
In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.
Differential Revision: https://reviews.llvm.org/D29423
llvm-svn: 293837
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 293836
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: LTO requires the debug-info-for-profiling to be a function attribute.
Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl
Reviewed By: mehdi_amini, dblaikie, aprantl
Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D29203
llvm-svn: 293833
|
|
|
|
| |
llvm-svn: 293809
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GAS assembler supports the ".set bopt" directive but according
to the sources it doesn't do anything. It's supposed to optimize
branches by filling the delay slot of a branch with it's target.
This patch teaches the MIPS asm parser to accept both and warn in
the case of 'bopt' that the bopt directive is unsupported.
This resolves PR/31841.
Thanks to Sean Bruno for reporting the issue!
llvm-svn: 293798
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.
Differential Revision: https://reviews.llvm.org/D29398
llvm-svn: 293788
|
|
|
|
| |
llvm-svn: 293777
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.
nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.
Tested valid combinations with alive.
llvm-svn: 293776
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DebugInfoDWARFTests is the only user so far which initializes the
MCObjectStreamer without initializing the ASMParser. The MIPS backend
relies on the ASMParser to initialize the MipsABIInfo object and to
update the target streamer with it. This should turn the mips buildbots
green.
Reviewers: atanasyan, zoran.jovanovic
Differential Revision: https://reviews.llvm.org/D28025
llvm-svn: 293772
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The the following instructions:
- LD/LWZ (expanded from sjLj pseudo-instructions)
- LXVL/LXVLL vector loads
- STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.
Differential Revision: https://reviews.llvm.org/D29289
Committed for Lei Huang
llvm-svn: 293769
|
|
|
|
|
|
| |
These are identical apart from the extra SSE41 guard for PINSRB.
llvm-svn: 293766
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Differential Revision: https://reviews.llvm.org/D29073
llvm-svn: 293761
|
|
|
|
| |
llvm-svn: 293744
|
|
|
|
|
|
|
|
|
|
| |
loads/stores of integer types.
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.
This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.
llvm-svn: 293743
|
|
|
|
|
|
|
|
|
| |
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, sections 4.14 and 4.15.
Differential revision: https://reviews.llvm.org/D28698
llvm-svn: 293739
|
|
|
|
|
|
|
|
|
|
| |
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, section 4.13, and on Exynos
M1.
Differential revision: https://reviews.llvm.org/D28491
llvm-svn: 293738
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 293729
|
|
|
|
|
|
| |
Use a more specific subtarget check and combine hasOneUse checks
llvm-svn: 293726
|
|
|
|
| |
llvm-svn: 293717
|
|
|
|
|
|
|
|
|
|
| |
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.
Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.
llvm-svn: 293713
|
|
|
|
|
|
|
|
| |
Technically, this is actually changes the expression and the original
assert was "wrong", but since the conjunction is with true, it doesn't
matter in this case.
llvm-svn: 293709
|
|
|
|
|
|
|
|
|
|
|
| |
@ABS8 can be applied to symbols which appear as immediate operands to
instructions that have a 8-bit immediate form for that operand. It causes
the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8
or R_X86_64_8) for the symbol.
Differential Revision: https://reviews.llvm.org/D28688
llvm-svn: 293667
|
|
|
|
| |
llvm-svn: 293654
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Insert calls to __fentry__ at function entry.
Reviewers: hfinkel, craig.topper
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D28000
llvm-svn: 293648
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For some reason instructions are being inserted in the wrong order with some
builds. I'm not sure why this is happening.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D29325
llvm-svn: 293639
|
|
|
|
| |
llvm-svn: 293637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.
This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).
Differential Revision: https://reviews.llvm.org/D29283
llvm-svn: 293634
|
|
|
|
| |
llvm-svn: 293631
|
|
|
|
|
|
| |
These can appear during shuffle combining.
llvm-svn: 293628
|
|
|
|
|
|
|
|
|
|
| |
Also add the ability to recognise PINSR(Vex, 0, Idx).
Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.
The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.
llvm-svn: 293627
|
|
|
|
|
|
|
|
|
|
|
| |
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29133
llvm-svn: 293626
|