| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom.
The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)
llvm-svn: 331094
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(https://github.com/llvm-mirror/llvm/blob/c64f4dbfe31e509f9c1092b951e524b056245af8/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6000-L6006).
Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.
(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)
Reviewers: RKSimon, jgravelle-google
Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D46161
llvm-svn: 331092
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.
To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.
Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.
I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.
Reviewers: chandlerc, RKSimon, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46202
llvm-svn: 331091
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For local variables the first DW_OP_deref is consumed by turning the
location kind into a memeory location, but that only makes sense for
values that are in a register to begin with, which cannot happen for
global variables that are attached to a symbol.
rdar://problem/39741860
This reapplies r330970 after fixing an uncovered bug in r331086 and
working around the situation caused by it.
llvm-svn: 331090
|
| |
|
|
| |
llvm-svn: 331088
|
| |
|
|
|
|
|
|
|
|
| |
Now local value sinking only scans and numbers instructions added
between the current flush point and the last flush point. This ensures
that ISel is overall linear in the size of the BB.
Fixes PR37010 and re-enables local value sinking by default.
llvm-svn: 331087
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.
Thanks to Reid Kleckner for providing me a reproducer!
llvm-svn: 331086
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
|
| |
|
|
| |
llvm-svn: 331084
|
| |
|
|
|
|
|
|
|
|
| |
to a struct
Some of the bots were failing in a different way to the others. These were
unable to compare tuples. Fix this by changing to a struct, thereby avoiding
the quirks of tuples.
llvm-svn: 331081
|
| |
|
|
| |
llvm-svn: 331080
|
| |
|
|
|
|
|
|
|
|
|
| |
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Only allow a single unique .symver alias per symbol. This matches the
behavior of gas. I noticed that we ignored multiple mismatched symver
directives looking at https://reviews.llvm.org/D45798
Reviewers: pcc, tejohnson, espindola
Reviewed By: pcc
Subscribers: emaste, arichardson, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D45845
llvm-svn: 331078
|
| |
|
|
|
|
|
|
| |
Summary:
Commoning some obviously copy/paste code in
InnerLoopVectorizer::vectorizeMemoryInstruction
llvm-svn: 331076
|
| |
|
|
| |
llvm-svn: 331074
|
| |
|
|
|
|
|
|
| |
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.
llvm-svn: 331072
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Use RegUnits to track register aliases in PostRASink and AArch64LoadStoreOptimizer.
Reviewers: thegameg, mcrosier, gberry, qcolombet, sebpop, MatzeB, t.p.northover, javed.absar
Reviewed By: thegameg, sebpop
Subscribers: javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45695
llvm-svn: 331066
|
| |
|
|
|
|
|
|
| |
scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.
llvm-svn: 331065
|
| |
|
|
| |
llvm-svn: 331061
|
| |
|
|
|
|
|
|
| |
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.
This patch only includes a legacy pass. A follow up will add a new style pass as well.
llvm-svn: 331060
|
| |
|
|
| |
llvm-svn: 331055
|
| |
|
|
|
|
| |
This removes all the HADD/HSUB PS/PD InstRW overrides.
llvm-svn: 331054
|
| |
|
|
| |
llvm-svn: 331052
|
| |
|
|
|
|
| |
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.
llvm-svn: 331051
|
| |
|
|
|
|
|
|
|
|
|
| |
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D46113
llvm-svn: 331050
|
| |
|
|
| |
llvm-svn: 331048
|
| |
|
|
| |
llvm-svn: 331047
|
| |
|
|
|
|
|
|
|
|
| |
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.
This didn't work in case no frame was needed and resulted in code size
regressions.
llvm-svn: 331044
|
| |
|
|
|
|
|
|
| |
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.
Review: Ulrich Weigand
llvm-svn: 331040
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.
This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472
Reviewers: efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43237
llvm-svn: 331037
|
| |
|
|
|
|
|
|
|
| |
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.
Differential revisioon: https://reviews.llvm.org/D46107
llvm-svn: 331036
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
|
| |
|
|
|
|
| |
entry. NFCI.
llvm-svn: 331034
|
| |
|
|
|
|
|
|
|
|
| |
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.
Differential Revision: https://reviews.llvm.org/D46057
llvm-svn: 331033
|
| |
|
|
|
|
|
|
|
| |
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.
Differential revision: https://reviews.llvm.org/D46106
llvm-svn: 331032
|
| |
|
|
|
|
|
|
|
| |
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.
llvm-svn: 331027
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The value tracking analysis uses function alignment to infer that the
least significant bits of function pointers are known to be zero.
Unfortunately, this is not correct for ARM targets: the least
significant bit of a function pointer stores the ARM/Thumb state
information (i.e., the LSB is set for Thumb functions and cleared for
ARM functions).
The original approach (https://reviews.llvm.org/D44781) introduced a
new field for function pointer alignment in the DataLayout structure
to address this. But it seems unlikely that optimizations based on
function pointer alignment would bring much benefit in practice to
justify the additional maintenance burden, so this patch simply
assumes that function pointer alignment is always unknown.
Reviewers: javed.absar, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, llvm-commits, hfinkel, rogfer01
Differential Revision: https://reviews.llvm.org/D46110
llvm-svn: 331025
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This includes
Instructions: tlbginv, tlbginvf, tlbgp, tlbgr, tlbgwi, tlbgwr, hypcall
mfgc0, mtgc0, mfhgc0, mthgc0, dmfgc0, dmtgc0,
Assembler directives: .set virt, .set novirt, .module virt, .module novirt
Attribute: virt
.MIPS.abiflags: VZ (0x100)
Patch by Vladimir Stefanovic.
Differential Revision: https://reviews.llvm.org/D44905
llvm-svn: 331024
|
| |
|
|
|
|
|
|
|
| |
Reviewers: sanjoy, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46175
llvm-svn: 331022
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new umin creation method which accepts a list of operands.
SCEV does not represents umin which is required in getExact, so
it transforms umin to umax with not. As a result the transformation of
tree of max to max with several operands does not work.
We just use the new introduced method for creation umin from several operands.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46047
llvm-svn: 331015
|
| |
|
|
|
|
| |
This reverts r331002 due to sanitizer bot breakage.
llvm-svn: 331011
|
| |
|
|
|
|
|
|
|
| |
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.
Differential Revision: https://reviews.llvm.org/D46082
llvm-svn: 331008
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.
I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.
Differential Revision: https://reviews.llvm.org/D46091
llvm-svn: 331007
|
| |
|
|
| |
llvm-svn: 331006
|
| |
|
|
|
|
|
|
|
|
| |
Summary: Also test for symbols information in test/MC/WebAssembly/debug-info.ll.
Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46160
llvm-svn: 331005
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
|
| |
|
|
|
|
|
|
| |
The LLVM commit introduces a crash in LLVM's instruction selection.
I filed http://llvm.org/PR37260 with the test case.
llvm-svn: 330997
|
| |
|
|
|
|
| |
This reverts commit r3309704 while investigating bot breakage.
llvm-svn: 330993
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
|