| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
floating point to integer conversion intrinsics.
llvm-svn: 289639
|
| |
|
|
|
|
|
|
| |
add/sub/mul/div/max/min intrinsics better.
Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking.
llvm-svn: 289636
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedVectorElts.
Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly.
llvm-svn: 289632
|
| |
|
|
|
|
|
|
|
|
|
|
| |
intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289629
|
| |
|
|
|
|
|
|
|
|
|
|
| |
min/max/cmp intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289628
|
| |
|
|
|
|
|
|
|
|
| |
intrinsics correctly.
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
|
| |
|
|
|
|
|
|
| |
scalar FMA intrinsics.
These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them.
llvm-svn: 289372
|
| |
|
|
|
|
|
|
| |
scalar cmp intrinsics with masking and rounding.
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.
llvm-svn: 289371
|
| |
|
|
|
|
|
|
| |
elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding.
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.
llvm-svn: 289370
|
| |
|
|
|
|
| |
to match 128 and 256-bit.
llvm-svn: 289354
|
| |
|
|
|
|
| |
shufflevector if the indices are constant.
llvm-svn: 289348
|
| |
|
|
|
|
|
| |
We were a little sloppy with adding tailcall markers. Be more
consistent by using setTailCallKind instead of setTailCall.
llvm-svn: 287955
|
| |
|
|
|
|
|
|
| |
for variable shift with 16-bit elements.
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
llvm-svn: 287316
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
llvm-svn: 287083
|
| |
|
|
|
|
| |
AVX-512 variable shift intrinsics.
llvm-svn: 286755
|
| |
|
|
|
|
|
|
| |
shift by immediate and shift by single value.
This does not include support for the AVX-512 variable shifts. That will be coming in a future patch.
llvm-svn: 286739
|
| |
|
|
|
|
|
|
|
|
|
| |
logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.
llvm-svn: 280807
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi
The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.
This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.
Added reproducible test cases to x86-sse4a.ll.
Differential Revision: https://reviews.llvm.org/D24256
llvm-svn: 280804
|
| |
|
|
|
|
|
|
|
|
|
|
| |
memcpy with ld/st.
When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.
Differential Revision: https://reviews.llvm.org/D23499
llvm-svn: 280617
|
| |
|
|
| |
llvm-svn: 280615
|
| |
|
|
|
|
|
| |
This allows more of the OCML builtin library to be
constant folded.
llvm-svn: 280586
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Also add popcount(n) == bitsize(n) -> n == -1 transformation.
Reviewers: majnemer, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23134
llvm-svn: 279141
|
| |
|
|
|
|
| |
Follow up to r278902. I had missed "fall through", with a space.
llvm-svn: 278970
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D23291
llvm-svn: 278364
|
| |
|
|
| |
llvm-svn: 278342
|
| |
|
|
| |
llvm-svn: 278340
|
| |
|
|
| |
llvm-svn: 278339
|
| |
|
|
|
|
|
|
|
|
| |
Note that this fold really belongs in InstSimplify.
Refactoring here anyway as an intermediate step because
there's a planned addition to this function in D23134.
Differential Revision: https://reviews.llvm.org/D23223
llvm-svn: 277883
|
| |
|
|
| |
llvm-svn: 277792
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277072
|
| |
|
|
|
|
| |
This reverts commits r277068 r277067 r277066.
llvm-svn: 277071
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277068
|
| |
|
|
| |
llvm-svn: 277067
|
| |
|
|
| |
llvm-svn: 277066
|
| |
|
|
|
|
|
|
| |
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
llvm-svn: 275380
|
| |
|
|
|
|
|
| |
This transform doesn't require any new instructions, it can safely live
in InstSimplify.
llvm-svn: 275344
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.
This also removes the dependence of functionattrs on TLI altogether.
llvm-svn: 274455
|
| |
|
|
|
|
|
| |
Specific instances of intrinsic calls may want to be convergent, such
as certain register reads but the intrinsic declaration is not.
llvm-svn: 273188
|
| |
|
|
|
|
| |
for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior.
llvm-svn: 272491
|
| |
|
|
|
|
|
|
| |
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
native shifts
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
llvm-svn: 271996
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
llvm-svn: 271990
|
| |
|
|
|
|
| |
them. Auto upgrade to native unaligned store instructions.
llvm-svn: 271236
|
| |
|
|
|
|
|
|
|
|
|
|
| |
intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131
|
| |
|
|
|
|
| |
extension intrinsics with generic IR (llvm)
llvm-svn: 270976
|
| |
|
|
|
|
|
|
|
|
|
|
| |
generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973
|
| |
|
|
|
|
|
|
| |
the same places that the SSE/SSE2 storeu intrinsics appear.
I don't really know how to test this. Just seemed like we should be consistent.
llvm-svn: 270819
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When a va_start or va_copy is immediately followed by a va_end (ignoring
debug information or other start/end in between), then it is safe to
remove the pair. As this code shares some commonalities with the lifetime
markers, this has been factored to helper functions.
This InstCombine pattern kicks-in 3 times when running the LLVM test
suite.
llvm-svn: 269033
|
| |
|
|
|
|
| |
accept UNDEF elements.
llvm-svn: 268206
|
| |
|
|
|
|
| |
UNDEF elements.
llvm-svn: 268204
|