| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
vec2, idx1), idx1) -> (blendi vec2, vec1).
llvm-svn: 294112
|
|
|
|
|
|
| |
Based on similar code for INSERT_VECTOR_ELT.
llvm-svn: 294110
|
|
|
|
| |
llvm-svn: 294105
|
|
|
|
|
|
| |
Extra tests for D29399
llvm-svn: 294101
|
|
|
|
|
|
|
|
|
| |
The code missed to check implicit operands of COPY instructions for
defs/uses.
Differential Revision: https://reviews.llvm.org/D29522
llvm-svn: 294088
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was originally introduced in r278321 to work around correctness
problems in the ExecutionDepsFix pass; Probably also to keep the
performance benefits of breaking the false dependencies which of course
also affect undef operands.
ExecutionDepsFix has been improved here recently (see for example
r278321) so we should not need this exception any longer.
Differential Revision: https://reviews.llvm.org/D29525
llvm-svn: 294087
|
|
|
|
| |
llvm-svn: 294082
|
|
|
|
| |
llvm-svn: 294080
|
|
|
|
|
|
|
|
|
|
|
| |
An assert occurs when calling SlotIndexes::getInstructionIndex with
a DBG_VALUE instruction because the function expects an instruction
with a slot index. However, there is no slot index for a DBG_VALUE
instruction.
Differential Revision: https://reviews.llvm.org/D29048
llvm-svn: 294070
|
|
|
|
| |
llvm-svn: 294038
|
|
|
|
| |
llvm-svn: 294031
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-applies commit r292189, reverted in r292191.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.
Use TLI instead, removing another heap of redundant code.
This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter; the SDAG code accepts any integer
type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
checking: Ulrich agrees on removing them.
I don't think it's worth supporting any of these (IMO) invalid
testcases. Instead, fix them to be more meaningful.
llvm-svn: 294028
|
|
|
|
| |
llvm-svn: 294022
|
|
|
|
|
|
|
|
| |
into a target shuffle.
Correctly flagging upper elements as undef.
llvm-svn: 294020
|
|
|
|
|
|
| |
Make it clear these tests sign-extend the comparison result. Some patterns zero-extend to a bool result that we still need to handle.
llvm-svn: 294018
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D29477
llvm-svn: 294011
|
|
|
|
|
|
|
|
|
|
|
|
| |
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.
NodeToMatch can be modified during matching, but code does not handle
this situation.
Differential Revision: https://reviews.llvm.org/D29292
llvm-svn: 294003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Reviewers: rengolin, jmolloy, rovka, olista01
Reviewed By: olista01
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D29020
llvm-svn: 294000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
llc would hit a fatal error for errors in inline assembly. The
diagnostics message is now printed.
Reviewers: rengolin, MatzeB, javed.absar, anemet
Reviewed By: anemet
Subscribers: jyknight, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D29408
llvm-svn: 293999
|
|
|
|
| |
llvm-svn: 293972
|
|
|
|
| |
llvm-svn: 293968
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29131
llvm-svn: 293964
|
|
|
|
|
|
| |
In multi-use cases this can save a few instructions.
llvm-svn: 293962
|
|
|
|
|
|
|
|
| |
DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.
llvm-svn: 293944
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.
Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127
llvm-svn: 293935
|
|
|
|
| |
llvm-svn: 293925
|
|
|
|
|
|
| |
Requested by @silvas
llvm-svn: 293916
|
|
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.
llvm-svn: 293915
|
|
|
|
| |
llvm-svn: 293907
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
|
|
|
|
|
|
|
|
|
| |
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
llvm-svn: 293889
|
|
|
|
|
|
|
| |
Make it legal to load pointer values. Also check that pointers are assigned
to the GPR reg bank by default.
llvm-svn: 293886
|
|
|
|
|
|
| |
Check that all scalars are loaded into the GPR by default.
llvm-svn: 293883
|
|
|
|
|
|
|
|
|
|
| |
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
Differential Revision: https://reviews.llvm.org/D28810
llvm-svn: 293880
|
|
|
|
| |
llvm-svn: 293862
|
|
|
|
|
|
| |
def list for VZEROALL/VZEROUPPER. YMM16-YMM31 should also be defs.
llvm-svn: 293861
|
|
|
|
| |
llvm-svn: 293860
|
|
|
|
|
|
|
|
|
|
|
| |
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
|
|
|
|
| |
llvm-svn: 293851
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28689
llvm-svn: 293844
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.
In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.
Differential Revision: https://reviews.llvm.org/D29423
llvm-svn: 293837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently the test implicit-null-checks.mir crashes if we run llc with
-enable-implicit-null-checks -start-before implicit-null-checks
options. Change fixes the RET instruction causing the crash.
Patch by Serguei Katkov!
Reviewers: sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29390
llvm-svn: 293789
|
|
|
|
|
|
|
|
|
|
|
|
| |
These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.
nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.
Tested valid combinations with alive.
llvm-svn: 293776
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change allows a re-order of two intructions if their uses
are overlapped.
Patch by Serguei Katkov!
Reviewers: reames, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29120
llvm-svn: 293775
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The the following instructions:
- LD/LWZ (expanded from sjLj pseudo-instructions)
- LXVL/LXVLL vector loads
- STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.
Differential Revision: https://reviews.llvm.org/D29289
Committed for Lei Huang
llvm-svn: 293769
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Differential Revision: https://reviews.llvm.org/D29073
llvm-svn: 293761
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
|
|
|
|
|
|
|
|
|
|
|
| |
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.
Differential Revision: https://reviews.llvm.org/D28583
llvm-svn: 293716
|
|
|
|
|
|
|
|
|
|
| |
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.
Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.
llvm-svn: 293713
|
|
|
|
|
|
|
| |
Otherwise bad things happen if the basic block order isn't trivial after an
invoke.
llvm-svn: 293679
|