| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
from InstCombine
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.
llvm-svn: 358913
|
|
|
|
|
|
|
|
| |
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable. We can't actually insert the unreachable instruction since that would require changing the CFG. We leave that to simplifycfg later.
This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.
llvm-svn: 358600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.
In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.
Patch by Shawn Landden.
Differential Revision: https://reviews.llvm.org/D60660
llvm-svn: 358515
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Enable some of the existing size optimizations for cold code under PGO.
A ~5% code size saving in big internal app under PGO.
The way it gets BFI/PSI is discussed in the RFC thread
http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html
Note it doesn't currently touch loop passes.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59514
llvm-svn: 358422
|
|
|
|
|
|
|
|
|
|
|
|
| |
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D60061
llvm-svn: 358099
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:
e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32
Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW
Differential Revision: https://reviews.llvm.org/D60256
llvm-svn: 357909
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR41337
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60148
llvm-svn: 357564
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR41273
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60096
llvm-svn: 357521
|
|
|
|
| |
llvm-svn: 356642
|
|
|
|
|
|
|
|
| |
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.
Differential Revision: https://reviews.llvm.org/D57247
llvm-svn: 356590
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.
Part of https://bugs.llvm.org/show_bug.cgi?id=38146.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D59471
llvm-svn: 356584
|
|
|
|
| |
llvm-svn: 356536
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
transforms
Follow-up to:
rL356338
rL356369
We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.
llvm-svn: 356372
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up to:
rL356338
Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.
llvm-svn: 356369
|
|
|
|
|
|
|
|
|
|
|
| |
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.
llvm-svn: 356338
|
|
|
|
|
|
|
|
|
|
|
|
| |
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.
The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.
Differential Revision: https://reviews.llvm.org/D57468
llvm-svn: 356293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.
We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.
Differential Revision: https://reviews.llvm.org/D59374
llvm-svn: 356192
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
llvm-svn: 355981
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.
Part of https://bugs.llvm.org/show_bug.cgi?id=38146.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D58881
llvm-svn: 355530
|
|
|
|
| |
llvm-svn: 353630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D57174
llvm-svn: 352914
|
|
|
|
|
|
|
|
|
| |
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
|
|
|
|
|
|
|
|
|
| |
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
|
|
|
|
|
|
|
|
|
| |
An unused variable problem was introduced with rL352870
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather
than just the assert.
llvm-svn: 352873
|
|
|
|
| |
llvm-svn: 352871
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we can reduce the x86-specific intrinsic to the generic op, it allows existing
simplifications and value tracking folds. AFAICT, this always results in identical
x86 codegen in the non-reduced case...which should be true because we semi-generically
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must
already combine/lower this intrinsic as expected.
This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.
Differential Revision: https://reviews.llvm.org/D57453
llvm-svn: 352870
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57494
llvm-svn: 352771
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
|
|
|
|
|
|
|
|
|
|
| |
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.
This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se.
Differential Revision: https://reviews.llvm.org/D57398
llvm-svn: 352653
|
|
|
|
|
|
|
|
| |
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.
Noticed while cleaning up vector integer comparison costs for PR40376.
llvm-svn: 351697
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.
Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582
llvm-svn: 351295
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.
And we can also get the mask from and i1, or i1, xor i1.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52060
llvm-svn: 351150
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.
Clang counterpart: https://reviews.llvm.org/D55890
Differential Revision: https://reviews.llvm.org/D55894
llvm-svn: 349892
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM
This has the potential to create less-than-8-bit scalar types as shown in
some of the test diffs, but it looks like the backend knows how to deal
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927
Differential Revision: https://reviews.llvm.org/D55529
llvm-svn: 348862
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.
Differential Revision: https://reviews.llvm.org/D55011
llvm-svn: 348072
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combine
sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.
In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347773
|
|
|
|
|
|
|
|
|
| |
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347772
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.
Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347770
|
|
|
|
|
|
|
|
|
| |
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347769
|
|
|
|
| |
llvm-svn: 347604
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following simplifications are implemented:
* `fshl(X, 0, C) -> shl X, C%BW`
* `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
* `fshl(0, X, C) -> lshr X, BW-C%BW`
* `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
* `fshr(X, 0, C) -> shl X, (BW-C%BW)`
* `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
* `fshr(0, X, C) -> lshr X, C%BW`
* `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)
The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.
Differential Revision: https://reviews.llvm.org/D54778
llvm-svn: 347505
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shift amount of a funnel shift is modulo the scalar bitwidth:
http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic
...so we can use demanded bits analysis on that operand to simplify it
when we have a power-of-2 bitwidth.
This is another step towards canonicalizing {shift/shift/or} to the
intrinsics in IR.
Differential Revision: https://reviews.llvm.org/D54478
llvm-svn: 346814
|
|
|
|
|
|
| |
Noticed via inspection. Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops. May be a minor compile time problem today.
llvm-svn: 346698
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When the 3rd argument to these intrinsics is zero, lowering them
to shift instructions produces poison values, since we end up with
shift amounts equal to the number of bits in the shifted value. This
means we can only lower these intrinsics if we can prove that the
3rd argument is not zero.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D53739
llvm-svn: 346422
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, spatel
Reviewed By: spatel
Subscribers: lebedev.ri, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D53774
llvm-svn: 345751
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Depends on D52765
Reviewers: aheejin, dschuff
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52766
llvm-svn: 344799
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502
|