| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.
Differential Revision: https://reviews.llvm.org/D30708
llvm-svn: 297716
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400
Reviewers: efriedma, jmolloy
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30829
llvm-svn: 297682
|
|
|
|
|
|
|
| |
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.
llvm-svn: 297621
|
|
|
|
|
|
|
|
|
| |
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.
Differential Revision: https://reviews.llvm.org/D30814
llvm-svn: 297608
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(defined in ARMInstrInfo.td)
Reviewers: grosbach, rengolin, jmolloy
Reviewed By: jmolloy
Subscribers: aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D30782
llvm-svn: 297456
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].
Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.
Reviewers: jmolloy, rogfer01
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30401
llvm-svn: 297453
|
|
|
|
|
|
|
|
|
|
|
|
| |
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
|
|
|
|
|
|
|
|
|
| |
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.
Differential Revision: https://reviews.llvm.org/D30745
llvm-svn: 297376
|
|
|
|
|
|
|
|
|
|
|
| |
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.
Differential Revision: https://reviews.llvm.org/D30692
llvm-svn: 297142
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
|
|
|
|
|
|
|
| |
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
|
|
|
|
| |
llvm-svn: 296901
|
|
|
|
|
|
|
|
|
|
| |
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.
This patch fixes some cases where we would move stores to the wrong
insert point.
Re-commit with a fix to increment NumMove in the right place.
Differential Revision: https://reviews.llvm.org/D30124
llvm-svn: 296815
|
|
|
|
|
|
|
|
|
| |
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.
Differential Revision: https://reviews.llvm.org/D29675
llvm-svn: 296751
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.
Differential Revision: https://reviews.llvm.org/D29466
llvm-svn: 296750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original commit message:
[ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
llvm-svn: 296718
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
Differential Revision: https://reviews.llvm.org/D30124
llvm-svn: 296708
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
Differential Revision: https://reviews.llvm.org/D30368
llvm-svn: 296701
|
|
|
|
|
|
| |
Apparently I forgot to run it after fixing up some things...
llvm-svn: 296634
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.
The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.
On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.
llvm-svn: 296631
|
|
|
|
|
|
|
|
|
|
|
|
| |
This parsing code was incorrectly checking for invalid characters, so an
invalid instruction like:
msr spsr_w, r0
would be emitted as:
msr spsr_cxsf, r0
Differential revision: https://reviews.llvm.org/D30462
llvm-svn: 296607
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.
Patch by Zhaoshi Zheng.
Differential Revision: https://reviews.llvm.org/D27910
llvm-svn: 296538
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.
We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.
llvm-svn: 296473
|
|
|
|
|
|
| |
Put it into a register by means of a MOVi.
llvm-svn: 296471
|
|
|
|
|
|
|
| |
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.
llvm-svn: 296469
|
|
|
|
| |
llvm-svn: 296468
|
|
|
|
|
|
| |
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.
llvm-svn: 296462
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.
Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.
Differential Revision: https://reviews.llvm.org/D30398
llvm-svn: 296459
|
|
|
|
|
|
| |
This should be the same as the mapping for G_ADD etc.
llvm-svn: 296455
|
|
|
|
|
|
|
| |
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.
llvm-svn: 296452
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of the condition
The transform in question claims to be doing:
// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))
...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.
This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.
The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's
not affected by this code change.
Differential Revision:
https://reviews.llvm.org/D30355
llvm-svn: 296389
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we handle this correctly in arm, but in thumb we don't which leads to
an unpredictable instruction being emitted for LSL #0 in an IT block and SP not
being permitted in some cases when it should be.
For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the
.td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to
get the IT handling right. We also need to adjust the handling of
MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We
should also adjust it to allow SP in the same way that it is allowed in
MOV rd, rn, but I haven't done that here because it looks like it would take
quite a lot of work to get right.
Additionally correct the selection of the 16-bit shift instructions in
processInstruction, where it was checking if the two registers were equal when
it should have been checking if they were low. It appears that previously this
code was never executed and the 16-bit encoding was selected by default, but
the other changes I've done here have somehow made it start being used.
Differential Revision: https://reviews.llvm.org/D30294
llvm-svn: 296342
|
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296279
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
|
|
|
|
|
|
| |
Same as selecting G_LOAD.
llvm-svn: 296122
|
|
|
|
|
|
| |
Same as the ones for loads.
llvm-svn: 296115
|
|
|
|
|
|
| |
Allow the same types that we allow for loads.
llvm-svn: 296108
|
|
|
|
|
|
| |
This reverts commit r296103 because the test broke on one of the bots. Sorry!
llvm-svn: 296104
|
|
|
|
|
|
| |
Allow the same types that we allow for loads.
llvm-svn: 296103
|
|
|
|
|
|
|
|
|
|
|
| |
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.
The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.
llvm-svn: 296031
|
|
|
|
|
|
|
|
| |
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).
llvm-svn: 295973
|
|
|
|
|
|
|
|
| |
Add support for lowering calls with parameters than can fit into regs. Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.
llvm-svn: 295971
|
|
|
|
|
|
|
|
|
|
| |
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.
This fixes PR31997.
llvm-svn: 295964
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
llvm-svn: 295816
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
llvm-svn: 295811
|
|
|
|
|
|
|
|
|
| |
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.
Differential Revision: https://reviews.llvm.org/D26722
llvm-svn: 295803
|