summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): propagate undef shift ↵Roman Lebedev2019-10-075-5/+5
| | | | | | | | | | | | | | | | | | | | | | amounts Summary: When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`, which means that for patterns a/b we'd assume that we must not produce any bits for that channel, while in reality we simply didn't care about that channel - i.e. we don't need to mask it. Reviewers: spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68239 llvm-svn: 373960
* [SampleFDO] Add compression support for any section in ExtBinary profile formatWei Mi2019-10-073-2/+125
| | | | | | | | | | | | | Previously ExtBinary profile format only supports compression using zlib for profile symbol list. In this patch, we extend the compression support to any section. User can select some or all of the sections to compress. In an experiment, for a 45M profile in ExtBinary format, compressing name table reduced its size to 24M, and compressing all the sections reduced its size to 11M. Differential Revision: https://reviews.llvm.org/D68253 llvm-svn: 373914
* [LoopVectorize] add test that asserted after cost model change (PR43582); NFCSanjay Patel2019-10-071-0/+127
| | | | llvm-svn: 373913
* Revert "[SLP] avoid reduction transform on patterns that the backend can ↵Martin Storsjo2019-10-071-104/+52
| | | | | | | | | | load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882
* [InstCombine] fold fneg disguised as select+fmul (PR43497)Sanjay Patel2019-10-061-12/+16
| | | | | | | Extends rL373230 and solves the motivating bug (although in a narrow way): https://bugs.llvm.org/show_bug.cgi?id=43497 llvm-svn: 373851
* [InstCombine] add fast-math-flags for better test coverage; NFCSanjay Patel2019-10-061-4/+4
| | | | llvm-svn: 373848
* [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform ↵Sanjay Patel2019-10-065-13/+48
| | | | | | | | | | | | (PR43501) https://bugs.llvm.org/show_bug.cgi?id=43501 We can't declare a GEP 'inbounds' in general. But we may salvage that information if we have known dereferenceable bytes on the source pointer. Differential Revision: https://reviews.llvm.org/D68244 llvm-svn: 373847
* [SLP] avoid reduction transform on patterns that the backend can load-combineSanjay Patel2019-10-051-52/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833
* Invalidate assumption cache before outlining.Aditya Kumar2019-10-041-0/+12
| | | | | | | | | | | | | | Subscribers: llvm-commits Tags: #llvm Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson Reviewed by: vsk Differential Revision: https://reviews.llvm.org/D68478 llvm-svn: 373807
* [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> ↵Roman Lebedev2019-10-042-13/+9
| | | | | | | | | | | 'icmp sge/slt %x, 0' We do indeed already get it right in some cases, but only transitively, with one-use restrictions. Since we only need to produce a single comparison, it makes sense to match the pattern directly: https://rise4fun.com/Alive/kPg llvm-svn: 373802
* [InstCombine] Right-shift shift amount reassociation with truncation ↵Roman Lebedev2019-10-043-89/+33
| | | | | | | | | | | | | | | | | | | (PR43564, PR42391) Initially (D65380) i believed that if we have rightshift-trunc-rightshift, we can't do any folding. But as it usually happens, i was wrong. https://rise4fun.com/Alive/GEw https://rise4fun.com/Alive/gN2O In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have this very sequence, of two right shifts separated by trunc. And "just" so that happens, we apparently can fold the pattern if the total shift amount is either 0, or it's equal to the bitwidth of the innermost widest shift - i.e. if we are left with only the original sign bit. Which is exactly what is wanted there. llvm-svn: 373801
* [NFC][InstCombine] Autogenerate shift.ll testRoman Lebedev2019-10-041-114/+114
| | | | llvm-svn: 373800
* [NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll testRoman Lebedev2019-10-041-88/+89
| | | | llvm-svn: 373799
* [NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ ↵Roman Lebedev2019-10-041-0/+182
| | | | | | | | trunc) (PR43564) https://rise4fun.com/Alive/x5IS llvm-svn: 373798
* [NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ ↵Roman Lebedev2019-10-043-34/+402
| | | | | | | | trunc) (PR43564, PR42391) https://rise4fun.com/Alive/GEw llvm-svn: 373797
* [InstCombine] add tests for fneg disguised as fmul; NFCSanjay Patel2019-10-041-0/+74
| | | | llvm-svn: 373788
* [FPEnv] Strict FP tests should use the requisite function attributes.Kevin P. Neal2019-10-043-6/+15
| | | | | | | | | | | | | | | A set of function attributes is required in any function that uses constrained floating point intrinsics. None of our tests use these attributes. This patch fixes this. These tests have been tested against the IR verifier changes in D68233. Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand Approved by: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D67925 llvm-svn: 373761
* LowerTypeTests: Rename local functions to avoid collisions with identically ↵Peter Collingbourne2019-10-031-0/+15
| | | | | | | | | | | named functions in ThinLTO modules. Without this we can encounter link errors or incorrect behaviour at runtime as a result of the wrong function being referenced. Differential Revision: https://reviews.llvm.org/D67945 llvm-svn: 373678
* [MemorySSA] Don't hoist stores if interfering uses (as calls) exist.Alina Sbirlea2019-10-031-1/+1
| | | | llvm-svn: 373674
* [NFC][InstCombine] Some tests for sub-of-negatible patternRoman Lebedev2019-10-031-0/+292
| | | | | | | | | | | As we have previously estabilished, `sub` is an outcast, and should be considered non-canonical iff it can be converted to `add`. It can be converted to `add` if it's second operand can be negated. So far we mostly only do that for constants and negation itself, but we should be more through. llvm-svn: 373597
* [InstCombine] Bypass high bit extract before variable sign-extension (PR43523)Roman Lebedev2019-10-021-26/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext https://rise4fun.com/Alive/7jk - the variable sext can be redundant https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver https://rise4fun.com/Alive/IF63 - without trunc we could view this as more general "drop redundant mask before right-shift", but let's handle it here for now https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant. There's more patterns for sure - e.g. we can have 'lshr' as the final shift, but that might be best handled by some more generic transform, e.g. "drop redundant masking before right-shift" (PR42456) I'm singling-out this sext patch because you can only extract high bits with `*shr` (unlike abstract bit masking), and i *know* this fold is wanted by existing code. I don't believe there is much to review here, so i'm gonna opt into post-review mode here. https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373542
* [NFC][InstCombine] Add tests for 'variable sext of variable high bit ↵Roman Lebedev2019-10-021-0/+584
| | | | | | | | extract' pattern (PR43523) https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373541
* [InstCombine] Transform bcopy to memmoveDavid Bolvansky2019-10-021-0/+25
| | | | | | | bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here. llvm-svn: 373537
* [SLP] add test for vectorization of different widths (PR28457); NFCSanjay Patel2019-10-021-0/+105
| | | | llvm-svn: 373483
* [InstCombine] Precommit tests for D68265Florian Hahn2019-10-021-2/+204
| | | | llvm-svn: 373458
* [InstSimplify] fold fma/fmuladd with a NaN or undef operandSanjay Patel2019-10-021-24/+12
| | | | | | | | | | | This is intended to be similar to the constant folding results from D67446 and earlier, but not all operands are constant in these tests, so the responsibility for folding is left to InstSimplify. Differential Revision: https://reviews.llvm.org/D67721 llvm-svn: 373455
* [InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63)Roman Lebedev2019-10-011-17/+12
| | | | | | | | Identical to it's trunc-less variant, just pretent-to hoist trunc, and everything else still holds: https://rise4fun.com/Alive/JRU llvm-svn: 373364
* [InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) foldRoman Lebedev2019-10-011-2/+2
| | | | | | https://rise4fun.com/Alive/yR4 llvm-svn: 373363
* [NFC][InstCombine] (Better) tests for sign-bit-smearing patternRoman Lebedev2019-10-012-0/+279
| | | | | | | https://rise4fun.com/Alive/JRU https://rise4fun.com/Alive/yR4 <- we can preserve 'exact' llvm-svn: 373362
* [IndVars] An implementation of loop predication without a need for speculationPhilip Reames2019-10-011-0/+790
| | | | | | | | | | | | | | | | This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits. This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms). I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops. Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars. One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops. At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform. Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default. I also need to give some though to supporting widenable conditions in this framing. Differential Revision: https://reviews.llvm.org/D67408 llvm-svn: 373351
* Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)David Bolvansky2019-10-012-9/+8
| | | | | | Seems to be slower than memcpy + strlen. llvm-svn: 373335
* [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)David Bolvansky2019-10-012-8/+9
| | | | llvm-svn: 373333
* [InstCombine] Expand the simplification of log()Evandro Menezes2019-09-301-34/+28
| | | | | | | | | Expand the simplification of special cases of `log()` to include `log2()` and `log10()` as well as intrinsics and more types. Differential revision: https://reviews.llvm.org/D67199 llvm-svn: 373261
* [FunctionAttrs] Added noalias for memccpy/mempcpy argumentsDavid Bolvansky2019-09-301-2/+2
| | | | llvm-svn: 373251
* [NFC][InstCombine] Redundant-left-shift-input-masking: add some more undef testsRoman Lebedev2019-09-305-0/+115
| | | | llvm-svn: 373248
* [PGO] Don't group COMDAT variables for compiler generated profile variables ↵Rong Xu2019-09-301-3/+2
| | | | | | | | | | | | in ELF With this patch, compiler generated profile variables will have its own COMDAT name for ELF format, which syncs the behavior with COFF. Tested with clang PGO bootstrap. This shows a modest reduction in object sizes in ELF format. Differential Revision: https://reviews.llvm.org/D68041 llvm-svn: 373241
* [InstCombine] fold negate disguised as select+mulSanjay Patel2019-09-301-8/+12
| | | | | | | | | | | | | | | | | | | | Name: negate if true %sel = select i1 %cond, i32 -1, i32 1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %m, i32 %x Name: negate if false %sel = select i1 %cond, i32 1, i32 -1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %x, i32 %m https://rise4fun.com/Alive/Nlh llvm-svn: 373230
* [InstCombine] add tests for negate disguised as mul; NFCSanjay Patel2019-09-301-0/+74
| | | | llvm-svn: 373222
* [SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured"Paul Robinson2019-09-302-141/+0
| | | | | | | | | | | "Captured" and "relevant to Stack Protector" are not the same thing. This reverts commit f29366b1f594f48465c5a2754bcffac6d70fd0b1. aka r363169. Differential Revision: https://reviews.llvm.org/D67842 llvm-svn: 373216
* [DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in ↵Roman Lebedev2019-09-291-0/+36
| | | | | | | | | | | | | different BB's (PR43500) If we happen to have the same div in two basic blocks, and in one of those we also happen to have the rem part, we'd match the div-rem pair, but the wrong ones. So let's drop overly-ambiguous assert. Fixes https://bugs.llvm.org/show_bug.cgi?id=43500 llvm-svn: 373167
* [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && ↵Alexey Bataev2019-09-2921-1263/+829
| | | | | | | | | | | | | | | | | | | | "SCEVAddRecExpr operand is not loop-invariant!") Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 373166
* [SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handleWei Mi2019-09-273-18/+47
| | | | | | | | | | | | | | | | | | | | profile symbol list. Currently many existing users using profile-sample-accurate want to reduce code size as much as possible. Their use cases are different from the scenario profile symbol list tries to handle -- the major motivation of adding profile symbol list is to get the major memory/code size saving without introduce performance regression. So to keep the behavior of profile-sample-accurate unchanged, we think decoupling these two things and using a new flag to control the handling of profile symbol list may be better. When profile-sample-accurate and the new flag profile-accurate-for-symsinlist are both present, since profile-sample-accurate is a user assertion we let it have a higher precedence. Differential Revision: https://reviews.llvm.org/D68047 llvm-svn: 373133
* [NFC][PhaseOrdering] Add end-to-end tests for the 'two shifts by sext' problemRoman Lebedev2019-09-271-0/+125
| | | | | | | | We start with two separate sext's, but EarlyCSE runs before InstCombine, so when we get them, they are a single sext, and we just ignore that. Likewise, if we had a single sext, we don't do anything there. llvm-svn: 373115
* [InstSimplify] add tests for fma/fmuladd with undef operand; NFCSanjay Patel2019-09-271-0/+54
| | | | llvm-svn: 373109
* [InstCombine] Simplify shift-by-sext to shift-by-zextRoman Lebedev2019-09-272-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is valid for any `sext` bitwidth pair: ``` Processing /tmp/opt.ll.. ---------------------------------------- %signed = sext %y %r = shl %x, %signed ret %r => %unsigned = zext %y %r = shl %x, %unsigned ret %r %signed = sext %y Done: 2016 Optimization is correct! ``` (This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.) Main motivation is the C++ semantics: ``` int shl(int a, char b) { return a << b; } ``` ends as ``` %3 = sext i8 %1 to i32 %4 = shl i32 %0, %3 ``` https://godbolt.org/z/0jgqUq which is, as this shows, too pessimistic. There is another problem here - we can only do the fold if sext is one-use. But we can trivially have cases where several shifts have the same sext shift amount. This should be resolved, later. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: efriedma, hiraditya, nlopes, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68103 llvm-svn: 373106
* [SLPVectorizer][X86] Regenerate arith-fp testsSimon Pilgrim2019-09-271-0/+40
| | | | llvm-svn: 373063
* [NFC][InstCombine] Revisit shift-by-signext testsRoman Lebedev2019-09-271-18/+86
| | | | llvm-svn: 373055
* Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicatedWei Mi2019-09-271-102/+0
| | | | | | | | | | | exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044
* Revert [SLP] Fix for PR31847: Assertion failed: ↵Jordan Rupprecht2019-09-2620-300/+1263
| | | | | | | | (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit 6a278d9073bdc158d31d4f4b15bbe34238f22c18) llvm-svn: 373019
* [LoopFusion] Add ability to fuse guarded loopsKit Barton2019-09-261-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch extends the current capabilities in loop fusion to fuse guarded loops (as defined in https://reviews.llvm.org/D63885). The patch adds the necessary safety checks to ensure that it safe to fuse the guarded loops (control flow equivalent, no intervening code, and same guard conditions). It also provides an alternative method to perform the actual fusion of guarded loops. The mechanics to fuse guarded loops are slightly different then fusing non-guarded loops, so I opted to keep them separate methods. I will be cleaning this up in later patches, and hope to converge on a single method to fuse both guarded and non-guarded loops, but for now I think the review will be easier to keep them separate. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65464 llvm-svn: 373018
OpenPOWER on IntegriCloud