|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | false logic (PR27869)
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Differential Revision: http://reviews.llvm.org/D21512
llvm-svn: 273200 | 
| | 
| 
| 
| 
| 
| 
| | Specific instances of intrinsic calls may want to be convergent, such
as certain register reads but the intrinsic declaration is not.
llvm-svn: 273188 | 
| | 
| 
| 
| 
| 
| 
| | This seems to be causing an infinite loop / crash in instcombine
on some bots.
llvm-svn: 273069 | 
| | 
| 
| 
| 
| 
| 
| | Reapply r272987. Condition should be in terms of the destination type,
and the flags should not be copied.
llvm-svn: 273045 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The motivating example for this transform is similar to D20774 where bitcasts interfere
with a single cmp/select sequence, but in this case we have 2 uses of each bitcast to 
produce min and max ops:
define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
  %cmp = fcmp olt <4 x float> %a, %b
  %bc1 = bitcast <4 x float> %a to <4 x i32>
  %bc2 = bitcast <4 x float> %b to <4 x i32>
  %sel1 = select <4 x i1> %cmp, <4 x i32> %bc1, <4 x i32> %bc2
  %sel2 = select <4 x i1> %cmp, <4 x i32> %bc2, <4 x i32> %bc1
  %bc3 = bitcast <4 x float>* %ptr1 to <4 x i32>*
  store <4 x i32> %sel1, <4 x i32>* %bc3
  %bc4 = bitcast <4 x float>* %ptr2 to <4 x i32>*
  store <4 x i32> %sel2, <4 x i32>* %bc4
  ret void
}
With this patch, we move the selects up to use the input args which allows getting rid of
all of the bitcasts:
define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
  %cmp = fcmp olt <4 x float> %a, %b
  %sel1.v = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
  %sel2.v = select <4 x i1> %cmp, <4 x float> %b, <4 x float> %a
  store <4 x float> %sel1.v, <4 x float>* %ptr1, align 16
  store <4 x float> %sel2.v, <4 x float>* %ptr2, align 16
  ret void
}
The asm for x86 SSE then improves from:
movaps  %xmm0, %xmm2
cmpltps %xmm1, %xmm2
movaps  %xmm2, %xmm3
andnps  %xmm1, %xmm3
movaps  %xmm2, %xmm4
andnps  %xmm0, %xmm4
andps %xmm2, %xmm0
orps  %xmm3, %xmm0
andps %xmm1, %xmm2
orps  %xmm4, %xmm2
movaps  %xmm0, (%rdi)
movaps  %xmm2, (%rsi)
To:
movaps  %xmm0, %xmm2
minps %xmm1, %xmm2
maxps %xmm0, %xmm1
movaps  %xmm2, (%rdi)
movaps  %xmm1, (%rsi)
The TODO comments show that we're limiting this transform only to vectors and only to bitcasts
because we need to improve other transforms or risk creating worse codegen.
Differential Revision: http://reviews.llvm.org/D21190
llvm-svn: 273011 | 
| | 
| 
| 
| 
| 
| 
| 
| | This reverts commit r272987.
This might be causing crashes on some bots.
llvm-svn: 272990 | 
| | 
| 
| 
| | llvm-svn: 272987 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The original check for load CSE or store-to-load forwarding is wrong
when the forwarded stored value happened to be a load.
Ref https://github.com/JuliaLang/julia/issues/16894
Differential Revision: http://reviews.llvm.org/D21271
Patch by Yichao Yu!
llvm-svn: 272868 | 
| | 
| 
| 
| 
| 
| | for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior.
llvm-svn: 272491 | 
| | 
| 
| 
| | llvm-svn: 272199 | 
| | 
| 
| 
| | llvm-svn: 272196 | 
| | 
| 
| 
| | llvm-svn: 272193 | 
| | 
| 
| 
| | llvm-svn: 272191 | 
| | 
| 
| 
| 
| 
| 
| | Avoids unnecessary copies. All changes audited & pass tests with asan.
No functional change intended.
llvm-svn: 272190 | 
| | 
| 
| 
| 
| 
| 
| 
| | As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | native shifts
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
llvm-svn: 271996 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
llvm-svn: 271990 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.
This should fix at least some of the  performance regressions from PR27988.
Differential Revision: http://reviews.llvm.org/D20983
llvm-svn: 271961 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check
above this to work for any Constant rather than ConstantInt. AFAICT, 
that part makes sense if we can determine that the shrunken/extended 
constant remained equal. But it doesn't make sense for this later 
transform where we assume that the constant DID change. 
This could assert for a ConstantExpr:
https://llvm.org/bugs/show_bug.cgi?id=28011
And it could be wrong for a vector as shown in the added regression test.
llvm-svn: 271908 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Since FoldOpIntoPhi speculates the binary operation to potentially each
of the predecessors of the PHI node (pulling it out of arbitrary control
dependence in the process), we can FoldOpIntoPhi only if we know the
operation doesn't have UB.
This also brings up an interesting profitability question -- the way it
is written today, commonIRemTransforms will hoist out work from
dynamically dead code into code that will execute at runtime.  Perhaps
that isn't the best canonicalization?
Fixes PR27968.
llvm-svn: 271857 | 
| | 
| 
| 
| | llvm-svn: 271843 | 
| | 
| 
| 
| | llvm-svn: 271839 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Change the name of the ICmpInst to 'ICmp' and the Constant (was a ConstantInt) to 'C',
so that it's hopefully clearer that 'CI' refers to CastInst in this context.
While we're scrubbing, fix the documentation comment and use 'auto' with 'dyn_cast'.
llvm-svn: 271817 | 
| | 
| 
| 
| 
| 
| 
| | This is step 1 of unknown towards fixing PR28001:
https://llvm.org/bugs/show_bug.cgi?id=28001
llvm-svn: 271810 | 
| | 
| 
| 
| | llvm-svn: 271807 | 
| | 
| 
| 
| | llvm-svn: 271804 | 
| | 
| 
| 
| 
| 
| 
| 
| | Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614
Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type.
llvm-svn: 271789 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | There was concern that creating bitcasts for the simpler potential select pattern:
define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) {
  %a2 = add <2 x i64> %a, %a
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %bc = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %a2, %bc
  ret <2 x i64> %and
}
might lead to worse code for some targets, so this patch is matching the larger
patterns seen in the test cases.
The motivating example for this patch is this IR produced via SSE intrinsics in C:
define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x i32>
  %t1 = bitcast <2 x i64> %b to <4 x i32>
  %cmp = icmp sgt <4 x i32> %t0, %t1
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %t2 = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %t2, %a
  %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
  %neg2 = bitcast <4 x i32> %neg to <2 x i64>
  %and2 = and <2 x i64> %neg2, %b
  %or = or <2 x i64> %and, %and2
  ret <2 x i64> %or
}
For an AVX target, this is currently:
vpcmpgtd  %xmm1, %xmm0, %xmm2
vpand     %xmm0, %xmm2, %xmm0
vpandn    %xmm1, %xmm2, %xmm1
vpor      %xmm1, %xmm0, %xmm0
retq
With this patch, it becomes:
vpmaxsd   %xmm1, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D20774
llvm-svn: 271676 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is effectively a revert of:
http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
and:
http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions.
This is intended to resolve the objections raised on the dev list:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html
and:
https://llvm.org/bugs/show_bug.cgi?id=24886#c4
In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later.
Differential Revision: http://reviews.llvm.org/D19391
llvm-svn: 271573 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is effectively NFC because we already do this transform after r175380:
http://reviews.llvm.org/rL175380
and also via foldBoolSextMaskToSelect().
This change should just make it a bit more efficient to match the pattern. 
The original guard was added in r95058:
http://reviews.llvm.org/rL95058
A sampling of codegen for current in-tree targets shows no problems. This
makes sense given that we're already producing the vector selects via the
other transforms.
llvm-svn: 271554 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This adds support to the backed to actually support SjLj EH as an exception
model.  This is *NOT* the default model, and requires explicitly opting into it
from the frontend.  GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244 | 
| | 
| 
| 
| 
| 
| | them. Auto upgrade to native unaligned store instructions.
llvm-svn: 271236 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131 | 
| | 
| 
| 
| 
| 
| | We need to enhance the pattern matching on these to look through bitcasts.
llvm-svn: 271051 | 
| | 
| 
| 
| 
| 
| | extension intrinsics with generic IR (llvm)
llvm-svn: 270976 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973 | 
| | 
| 
| 
| 
| 
| 
| | Fixes PR27824.
Differential Revision: http://reviews.llvm.org/D20591.
llvm-svn: 270853 | 
| | 
| 
| 
| 
| 
| 
| 
| | the same places that the SSE/SSE2 storeu intrinsics appear.
I don't really know how to test this. Just seemed like we should be consistent.
llvm-svn: 270819 | 
| | 
| 
| 
| 
| 
| 
| 
| | Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom,
so the ordering of the MatchBSwaps and MatchBitReversals arguments are
consistent with the function name.
llvm-svn: 270715 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | When an aggregate contains an opaque type its size cannot be
determined. This triggers an "Invalid GetElementPtrInst indices for type" assert
in function checkGEPType. The fix suppresses the conversion in this case.
http://reviews.llvm.org/D20319
llvm-svn: 270479 | 
| | 
| 
| 
| | llvm-svn: 270372 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | address and result of load instructions
This patch fixes https://llvm.org/bugs/show_bug.cgi?id=27703.
If there is a sequence of one or more load instructions, each loaded value is used as address of later load instruction, bitcast is necessary to change the value type, don't optimize it.
llvm-svn: 270135 | 
| | 
| 
| 
| 
| 
| | It doesn't matter if the comparison is unsigned; the inc/dec is always signed.
llvm-svn: 269831 | 
| | 
| 
| 
| 
| 
| 
| | This bug was introduced in r269728 and is the likely cause of many stage 2 ubsan bot failures.
I'll add a test in a follow-up commit assuming this fixes things properly.
llvm-svn: 269797 | 
| | 
| 
| 
| 
| 
| | Fixes PR27786.
llvm-svn: 269757 | 
| | 
| 
| 
| | llvm-svn: 269729 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | icmp (PR27756)
Fix a bug introduced with rL269426 :
[InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819)
We were assuming that a ConstantDataVector / ConstantVector / ConstantAggregateZero operand of
an ICMP was composed of ConstantInt elements, but it might have ConstantExpr or UndefValue 
elements. Handle those appropriately.
Also, refactor this function to join the scalar and vector paths and eliminate the switches.
Differential Revision: http://reviews.llvm.org/D20289
llvm-svn: 269728 | 
| | 
| 
| 
| | llvm-svn: 269494 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: Rename DataLayout::getLargestLegalIntTypeSize to DataLayout::getLargestLegalIntTypeSizeInBits() to prevent similar mistakes  fixed in r269433.
Reviewers: joker.eph, mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20248
llvm-svn: 269456 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Enhancement to: http://reviews.llvm.org/rL269426
With discussion in: http://reviews.llvm.org/D17859
This should complete the fixes for: PR26701, PR26819:
https://llvm.org/bugs/show_bug.cgi?id=26701
https://llvm.org/bugs/show_bug.cgi?id=26819
 
llvm-svn: 269439 |