| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.
For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).
Differential Revision: https://reviews.llvm.org/D72558
|
|
|
|
|
|
|
|
|
|
|
|
| |
and pext
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.
There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.
Fixes PR44389
Differential Revision: https://reviews.llvm.org/D71952
|
|
|
|
|
| |
This is another optimization suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
|
|
|
|
|
|
|
| |
Intrinsic has incorrect argument type!
i32 (i32*)* @llvm.setjmp
*wipes tear*
|
|
|
|
|
| |
This is another optimization suggested in PRPR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).
This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.
To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.
Fixes PR34095.
Differential Revision: https://reviews.llvm.org/D70425
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.
This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.
Differential Revision: https://reviews.llvm.org/D70792
|
|
|
|
|
|
| |
Patch by Chris Ye!
Differential Revision: https://reviews.llvm.org/D68004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.
mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);
then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.
To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:
vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70313
|
|
|
|
| |
Avoids warnings in Release builds. NFC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MVE VADC instruction reads and writes the carry bit at bit 29 of
the FPSCR register. The corresponding ACLE intrinsic is specified to
work with an integer in which the carry bit is stored at bit 0. So if
a user writes a code sequence in C that passes the carry from one VADC
to the next, like this,
s0 = vadcq_u32(a0, b0, &carry);
s1 = vadcq_u32(a1, b1, &carry);
then clang will generate IR for each of those operations that shifts
the carry bit up into bit 29 before the VADC, and after it, shifts it
back down and masks off all but the low bit. But in this situation
what you really wanted was two consecutive VADC instructions, so that
the second one directly reads the value left in FPSCR by the first,
without wasting several instructions on pointlessly clearing the other
flag bits in between.
This commit explains to InstCombine that the other bits of the flags
operand don't matter, and adds a test that demonstrates that all the
code between the two VADC instructions can be optimized away as a
result.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67162
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69303
llvm-svn: 375499
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69302
llvm-svn: 375498
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69253
llvm-svn: 375419
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, bollu, jdoerfert
Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68268
llvm-svn: 373595
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jdoerfert
Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68142
llvm-svn: 373195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.
This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.
Reviewers: spatel, lebedev.ri, reames, scanon
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D67434
llvm-svn: 372899
|
|
|
|
|
|
| |
If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.
llvm-svn: 372771
|
|
|
|
|
|
| |
"Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size."
llvm-svn: 372647
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).
strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.
Reviewers: efriedma, jdoerfert
Reviewed By: jdoerfert
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67679
llvm-svn: 372636
|
|
|
|
| |
llvm-svn: 372098
|
|
|
|
|
|
|
|
| |
This introduces additional rounding error in some cases. See D67434.
This reverts r371518 (git commit 18a1f0818b659cee13865b4fad2648d85984a4ed)
llvm-svn: 371634
|
|
|
|
| |
llvm-svn: 371602
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.
Reviewers: spatel, anemet, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D67351
llvm-svn: 371518
|
|
|
|
| |
llvm-svn: 370217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
%call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
ret i8* %call
}
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: aaron.ballman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66651
llvm-svn: 370168
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: eugenis
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D66695
llvm-svn: 369979
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.
Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.
llvm-svn: 365468
|
|
|
|
|
|
|
|
|
|
| |
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.
llvm-svn: 364187
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Signedness does not change number of trailing zeros.
Reviewers: spatel, lebedev.ri, nikic
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D63546
llvm-svn: 364064
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Signedness does not change number of trailing zeros.
Reviewers: spatel, lebedev.ri, nikic
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63534
llvm-svn: 363951
|
|
|
|
| |
llvm-svn: 363587
|
|
|
|
|
|
|
|
| |
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
llvm-svn: 363406
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63301
llvm-svn: 363339
|
|
|
|
|
|
|
|
| |
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.
llvm-svn: 362643
|
|
|
|
|
|
|
|
|
| |
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.
Differential Revision: https://reviews.llvm.org/D62544
llvm-svn: 362006
|
|
|
|
|
|
|
| |
Reduce duplication and make it easier to handle signed
always-overflows conditions in the future.
llvm-svn: 361863
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).
Differential Revision: https://reviews.llvm.org/D62463
llvm-svn: 361858
|
|
|
|
|
|
|
| |
Instead pass binary op and signedness. The extra enum only makes
things more complicated in this case.
llvm-svn: 361720
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
|
|
|
|
| |
llvm-svn: 360051
|
|
|
|
|
|
|
|
| |
truncate. NFCI.
This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics.
llvm-svn: 359191
|
|
|
|
| |
llvm-svn: 359177
|
|
|
|
| |
llvm-svn: 359163
|
|
|
|
| |
llvm-svn: 359160
|
|
|
|
|
|
|
|
|
|
| |
folding. NFCI.
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.
This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).
llvm-svn: 359111
|
|
|
|
|
|
|
|
|
|
| |
unconditional load
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.
Differential Revision: https://reviews.llvm.org/D59703
llvm-svn: 359000
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.
The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.
Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.
Differential Revision: https://reviews.llvm.org/D60659
llvm-svn: 358919
|