summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [LVI][CVP] LazyValueInfoImpl::solveBlockValueBinaryOp(): use no-wrap flags ↵Roman Lebedev2019-10-231-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from `add` op Summary: This was suggested in https://reviews.llvm.org/D69277#1717210 In this form (this is what was suggested, right?), the results aren't staggering (especially since given LVI cross-block focus) this does catch some things (as per test-suite), but not too much: | statistic | old | new | delta | % change | | correlated-value-propagation.NumAddNSW | 4981 | 4982 | 1 | 0.0201% | | correlated-value-propagation.NumAddNW | 12125 | 12126 | 1 | 0.0082% | | correlated-value-propagation.NumCmps | 1199 | 1202 | 3 | 0.2502% | | correlated-value-propagation.NumDeadCases | 112 | 111 | -1 | -0.8929% | | correlated-value-propagation.NumMulNSW | 275 | 278 | 3 | 1.0909% | | correlated-value-propagation.NumMulNUW | 1323 | 1326 | 3 | 0.2268% | | correlated-value-propagation.NumMulNW | 1598 | 1604 | 6 | 0.3755% | | correlated-value-propagation.NumNSW | 7158 | 7167 | 9 | 0.1257% | | correlated-value-propagation.NumNUW | 13304 | 13310 | 6 | 0.0451% | | correlated-value-propagation.NumNW | 20462 | 20477 | 15 | 0.0733% | | correlated-value-propagation.NumOverflows | 4 | 7 | 3 | 75.0000% | | correlated-value-propagation.NumPhis | 15366 | 15381 | 15 | 0.0976% | | correlated-value-propagation.NumSExt | 6273 | 6277 | 4 | 0.0638% | | correlated-value-propagation.NumShlNSW | 1172 | 1171 | -1 | -0.0853% | | correlated-value-propagation.NumShlNUW | 2793 | 2794 | 1 | 0.0358% | | correlated-value-propagation.NumSubNSW | 730 | 736 | 6 | 0.8219% | | correlated-value-propagation.NumSubNUW | 2044 | 2046 | 2 | 0.0978% | | correlated-value-propagation.NumSubNW | 2774 | 2782 | 8 | 0.2884% | | instcount.NumAddInst | 277586 | 277569 | -17 | -0.0061% | | instcount.NumAndInst | 66056 | 66054 | -2 | -0.0030% | | instcount.NumBrInst | 709147 | 709146 | -1 | -0.0001% | | instcount.NumCallInst | 528579 | 528576 | -3 | -0.0006% | | instcount.NumExtractValueInst | 18307 | 18301 | -6 | -0.0328% | | instcount.NumOrInst | 102660 | 102665 | 5 | 0.0049% | | instcount.NumPHIInst | 318008 | 318007 | -1 | -0.0003% | | instcount.NumSelectInst | 46373 | 46370 | -3 | -0.0065% | | instcount.NumSExtInst | 79496 | 79488 | -8 | -0.0101% | | instcount.NumShlInst | 40654 | 40657 | 3 | 0.0074% | | instcount.NumTruncInst | 62251 | 62249 | -2 | -0.0032% | | instcount.NumZExtInst | 68211 | 68221 | 10 | 0.0147% | | instcount.TotalBlocks | 843910 | 843909 | -1 | -0.0001% | | instcount.TotalInsts | 7387448 | 7387423 | -25 | -0.0003% | Reviewers: nikic, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69321
* [NFC][LVI][CVP] Tests where pre-specified `add` no-wrap flags could be used ↵Roman Lebedev2019-10-221-0/+136
| | | | | | | by LVI There's `ConstantRange::addWithNoWrap()`, LVI could use it to further constrain the range, if an `add` already has some no-wrap flags specified.
* [InstCombine] Signed saturation patternsDavid Green2019-10-221-124/+30
| | | | | | | | | | | | This adds an instcombine matcher for code that attempts to perform signed saturating arithmetic by casting to a higher type. Unsigned cases are already matched, this adds extra matches for the more complex signed cases, which involves matching the min(max(add a b)) nodes with proper extends to ensure legality. Differential Revision: https://reviews.llvm.org/D68651 llvm-svn: 375505
* [InstCombine] Signed saturation tests. NFCDavid Green2019-10-221-0/+597
| | | | llvm-svn: 375503
* [CVP] No-wrap deduction for `shl`Roman Lebedev2019-10-212-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the last `OverflowingBinaryOperator` for which we don't deduce flags. D69217 taught `ConstantRange::makeGuaranteedNoWrapRegion()` about it. The effect is better than of the `mul` patch (D69203): | statistic | old | new | delta | % change | | correlated-value-propagation.NumAddNUW | 7145 | 7144 | -1 | -0.0140% | | correlated-value-propagation.NumAddNW | 12126 | 12125 | -1 | -0.0082% | | correlated-value-propagation.NumAnd | 443 | 446 | 3 | 0.6772% | | correlated-value-propagation.NumNSW | 5986 | 7158 | 1172 | 19.5790% | | correlated-value-propagation.NumNUW | 10512 | 13304 | 2792 | 26.5601% | | correlated-value-propagation.NumNW | 16498 | 20462 | 3964 | 24.0272% | | correlated-value-propagation.NumShlNSW | 0 | 1172 | 1172 | | | correlated-value-propagation.NumShlNUW | 0 | 2793 | 2793 | | | correlated-value-propagation.NumShlNW | 0 | 3965 | 3965 | | | instcount.NumAShrInst | 13824 | 13790 | -34 | -0.2459% | | instcount.NumAddInst | 277584 | 277586 | 2 | 0.0007% | | instcount.NumAndInst | 66061 | 66056 | -5 | -0.0076% | | instcount.NumBrInst | 709153 | 709147 | -6 | -0.0008% | | instcount.NumICmpInst | 483709 | 483708 | -1 | -0.0002% | | instcount.NumSExtInst | 79497 | 79496 | -1 | -0.0013% | | instcount.NumShlInst | 40691 | 40654 | -37 | -0.0909% | | instcount.NumSubInst | 61997 | 61996 | -1 | -0.0016% | | instcount.NumZExtInst | 68208 | 68211 | 3 | 0.0044% | | instcount.TotalBlocks | 843916 | 843910 | -6 | -0.0007% | | instcount.TotalInsts | 7387528 | 7387448 | -80 | -0.0011% | Reviewers: nikic, reames, sanjoy, timshen Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69277 llvm-svn: 375455
* [NFC][CVP] Add `shl` no-wrap deduction test coverageRoman Lebedev2019-10-211-0/+378
| | | | llvm-svn: 375441
* Pre-commit test cases for D64713.Jay Foad2019-10-212-0/+64
| | | | llvm-svn: 375418
* [MemCpyOpt] Fixing Incorrect Code Motion while Handling Aggregate Type ValuesSam Elliott2019-10-211-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When MemCpyOpt is handling aggregate type values, if an instruction (let's call it P) between the targeting load (L) and store (S) clobbers the source pointer of L, it will try to hoist S before P. This process will also hoist S's data dependency instructions. However, the current implementation has a bug that if one of S's dependency instructions is //also// a user of P, MemCpyOpt will not prevent it from being hoisted above P and cause a use-before-define error. For example, in the newly added test file (i.e. `aggregate-type-crash.ll`), it will try to hoist both `store %my_struct %1, %my_struct* %3` and its dependent, `%3 = bitcast i8* %2 to %my_struct*`, above `%2 = call i8* @my_malloc(%my_struct* %0)`. Creating the following BB: ``` entry: %1 = bitcast i8* %4 to %my_struct* %2 = bitcast %my_struct* %1 to i8* %3 = bitcast %my_struct* %0 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 4 %3, i64 8, i1 false) %4 = call i8* @my_malloc(%my_struct* %0) ret void ``` Where there is a use-before-define error between `%1` and `%4`. Update: The compiler for the Pony Programming Language [also encounter the same bug](https://github.com/ponylang/ponyc/issues/3140) Patch by Min-Yih Hsu (myhsu) Reviewers: eugenis, pcc, dblaikie, dneilson, t.p.northover, lattner Reviewed By: eugenis Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66060 llvm-svn: 375403
* [CVP] Deduce no-wrap on `mul`Roman Lebedev2019-10-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: `ConstantRange::makeGuaranteedNoWrapRegion()` knows how to deal with `mul` since rL335646, there is exhaustive test coverage. This is already used by CVP's `processOverflowIntrinsic()`, and by SCEV's `StrengthenNoWrapFlags()` That being said, currently, this doesn't help much in the end: | statistic | old | new | delta | percentage | | correlated-value-propagation.NumMulNSW | 4 | 275 | 271 | 6775.00% | | correlated-value-propagation.NumMulNUW | 4 | 1323 | 1319 | 32975.00% | | correlated-value-propagation.NumMulNW | 8 | 1598 | 1590 | 19875.00% | | correlated-value-propagation.NumNSW | 5715 | 5986 | 271 | 4.74% | | correlated-value-propagation.NumNUW | 9193 | 10512 | 1319 | 14.35% | | correlated-value-propagation.NumNW | 14908 | 16498 | 1590 | 10.67% | | instcount.NumAddInst | 275871 | 275869 | -2 | 0.00% | | instcount.NumBrInst | 708234 | 708232 | -2 | 0.00% | | instcount.NumMulInst | 43812 | 43810 | -2 | 0.00% | | instcount.NumPHIInst | 316786 | 316784 | -2 | 0.00% | | instcount.NumTruncInst | 62165 | 62167 | 2 | 0.00% | | instcount.NumUDivInst | 2528 | 2526 | -2 | -0.08% | | instcount.TotalBlocks | 842995 | 842993 | -2 | 0.00% | | instcount.TotalInsts | 7376486 | 7376478 | -8 | 0.00% | (^ test-suite plain, tests still pass) Reviewers: nikic, reames, luqmana, sanjoy, timshen Reviewed By: reames Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69203 llvm-svn: 375396
* [InstCombine] Allow values with multiple users in SimplifyDemandedVectorEltsPiotr Sobczak2019-10-211-4/+71
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Allow for ignoring the check for a single use in SimplifyDemandedVectorElts to be able to simplify operands if DemandedElts is known to contain the union of elements used by all users. It is a responsibility of a caller of SimplifyDemandedVectorElts to supply correct DemandedElts. Simplify a series of extractelement instructions if only a subset of elements is used. Reviewers: reames, arsenm, majnemer, nhaehnle Reviewed By: nhaehnle Subscribers: wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67345 llvm-svn: 375395
* [IR] Fix mayReadFromMemory() for writeonly callsYevgeny Rouban2019-10-211-0/+15
| | | | | | | | | | | | | | Current implementation of Instruction::mayReadFromMemory() returns !doesNotAccessMemory() which is !ReadNone. This does not take into account that the writeonly attribute also indicates that the call does not read from memory. The patch changes the predicate to !doesNotReadMemory() that reflects the intended behavior. Differential Revision: https://reviews.llvm.org/D69086 llvm-svn: 375389
* [Attributor] Teach AANoCapture to use information in-flight more aggressivelyJohannes Doerfert2019-10-215-13/+40
| | | | | | | | | AAReturnedValues, AAMemoryBehavior, and AANoUnwind, can provide information that helps during the tracking or even justifies no-capture. We now use this information and enable no-capture in some test cases designed a long while a ago for these cases. llvm-svn: 375382
* [IndVars] Eliminate loop exits with equivalent exit countsPhilip Reames2019-10-203-8/+40
| | | | | | | | | | | | We can end up with two loop exits whose exit counts are equivalent, but whose textual representation is different and non-obvious. For the sub-case where we have a series of exits which dominate one another (common), eliminate any exits which would iterate *after* a previous exit on the exiting iteration. As noted in the TODO being removed, I'd always thought this was a good idea, but I've now seen this in a real workload as well. Interestingly, in review, Nikita pointed out there's let another oppurtunity to leverage SCEV's reasoning. If we kept track of the min of dominanting exits so far, we could discharge exits with EC >= MDE. This is less powerful than the existing transform (since later exits aren't considered), but potentially more powerful for any case where SCEV can prove a >= b, but neither a == b or a > b. I don't have an example to illustrate that oppurtunity, but won't be suprised if we find one and return to handle that case as well. Differential Revision: https://reviews.llvm.org/D69009 llvm-svn: 375379
* [InstCombine] conditional sign-extend of high-bit-extract: 'or' pattern.Roman Lebedev2019-10-201-3/+3
| | | | | | | | | | | | | | In this pattern, all the "magic" bits that we'd `add` are all high sign bits, and in the value we'd be adding to they are all unset, not unexpectedly, so we can have an `or` there: https://rise4fun.com/Alive/ups It is possible that `haveNoCommonBitsSet()` should be taught about this pattern so that we never have an `add` variant, but the reasoning would need to be recursive (because of that `select`), so i'm not really sure that would be worth it just yet. llvm-svn: 375378
* [NFC][InstCombine] conditional sign-extend of high-bit-extract: 'and' pat. ↵Roman Lebedev2019-10-201-0/+99
| | | | | | | | | | | can be 'or' pattern. In this pattern, all the "magic" bits that we'd add are all high sign bits, and in the value we'd be adding to they are all unset, not unexpectedly, so we can have an `or` there: https://rise4fun.com/Alive/ups llvm-svn: 375377
* [InstCombine] Fold uadd.sat(a, b) == 0 and usub.sat(a, b) == 0Nikita Popov2019-10-201-10/+8
| | | | | | | | | | | | | This adds folds for comparing uadd.sat/usub.sat with zero: * uadd.sat(a, b) == 0 => a == 0 && b == 0 => (a | b) == 0 * usub.sat(a, b) == 0 => a <= b And inverted forms for !=. Differential Revision: https://reviews.llvm.org/D69224 llvm-svn: 375374
* [InstCombine] Add tests for uadd/sub.sat(a, b) == 0; NFCNikita Popov2019-10-201-0/+44
| | | | llvm-svn: 375372
* [InstCombine] Shift amount reassociation in shifty sign bit test (PR43595)Roman Lebedev2019-10-201-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This problem consists of several parts: * Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`. This is trivial, and easy to do, we have a fold for it. * Shift amount reassociation - if we have two identical shifts, and we can simplify-add their shift amounts together, then we likely can just perform them as a single shift. But this is finicky, has one-use restrictions, and shift opcodes must be identical. But there is a super-pattern where both of these work together. to produce sign bit test from two shifts + comparison. We do indeed already handle this in most cases. But since we get that fold transitively, it has one-use restrictions. And what's worse, in this case the right-shifts aren't required to be identical, and we can't handle that transitively: If the total shift amount is bitwidth-1, only a sign bit will remain in the output value. But if we look at this from the perspective of two shifts, we can't fold - we can't possibly know what bit pattern we'd produce via two shifts, it will be *some* kind of a mask produced from original sign bit, but we just can't tell it's shape: https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN But it will *only* contain sign bit and zeros. So from the perspective of sign bit test, we're good: https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU Superb! So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a sudo-analysis mode that will ignore extra-uses, and will only check whether a) those are two right shifts and b) they end up with bitwidth(x)-1 shift amount and return either the original value that we sign-checking, or null. This does not have any functionality change for the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`. All that being said, as disscussed in the review, this yet again increases usage of instsimplify in instcombine as utility. Some day that may need to be reevaluated. https://bugs.llvm.org/show_bug.cgi?id=43595 Reviewers: spatel, efriedma, vsk Reviewed By: spatel Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68930 llvm-svn: 375371
* [SampleFDO] Add profile remapping support for profile on-demand loading usedWei Mi2019-10-181-1/+5
| | | | | | | | | | | | | | | | | | | | by ExtBinary format profile Profile on-demand loading was added for ExtBinary format profile in rL374233, but currently profile on-demand loading doesn't work well with profile remapping. The patch adds the support. Suppose a function in the current module has outline instance in the profile. The function name in the module is different from the name of the outline instance, but remapper knows the two names are equal. When loading profile on-demand, the outline instance has to be loaded with remapper's help. At the same time SampleProfileReaderItaniumRemapper is changed from a proxy of SampleProfileReader to a helper member in SampleProfileReader. Differential Revision: https://reviews.llvm.org/D68901 llvm-svn: 375295
* [NFC][CVP] Some tests for `mul` no-wrap deductionRoman Lebedev2019-10-181-0/+175
| | | | llvm-svn: 375285
* [CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, ↵Roman Lebedev2019-10-181-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | also try to prove other no-wrap Summary: CVP, unlike InstCombine, does not run till exaustion. It only does a single pass. When dealing with those special binops, if we prove that they can safely be demoted into their usual binop form, we do set the no-wrap we deduced. But when dealing with usual binops, we try to deduce both no-wraps. So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`, we won't attempt to check whether it can be `add nuw nsw`. This patch proposes to call `processBinOp()` on newly-created binop, which is identical to what we do for div/rem already. Reviewers: nikic, spatel, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69183 llvm-svn: 375273
* [NFC][CVP] Add @llvm.*.sat tests where we could prove both no-overflowsRoman Lebedev2019-10-181-8/+100
| | | | llvm-svn: 375260
* [InstCombine] Fix miscompile bug in canEvaluateShuffledBjorn Pettersson2019-10-181-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add restrictions in canEvaluateShuffled to prevent that we for example transform %0 = insertelement <2 x i16> undef, i16 %a, i32 0 %1 = srem <2 x i16> %0, <i16 2, i16 1> %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0> into %1 = insertelement <2 x i16> undef, i16 %a, i32 1 %2 = srem <2 x i16> %1, <i16 undef, i16 2> as having an undef denominator makes the srem undefined (for all vector elements). Fixes: https://bugs.llvm.org/show_bug.cgi?id=43689 Reviewers: spatel, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69038 llvm-svn: 375208
* [InstCombine] Pre-commit of test case showing miscompile bug in ↵Bjorn Pettersson2019-10-181-0/+101
| | | | | | | | | | | | | | | | | | | | canEvaluateShuffled Adding the reproducer from https://bugs.llvm.org/show_bug.cgi?id=43689, showing that instcombine is doing a bad transform. It transforms %0 = insertelement <2 x i16> undef, i16 %a, i32 0 %1 = srem <2 x i16> %0, <i16 2, i16 1> %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0> into %1 = insertelement <2 x i16> undef, i16 %a, i32 1 %2 = srem <2 x i16> %1, <i16 undef, i16 2> The undef denominator makes the whole srem undefined. llvm-svn: 375207
* [NFC][InstCombine] Tests for "fold variable mask before variable ↵Roman Lebedev2019-10-1711-0/+2430
| | | | | | | | shift-of-trunc" (PR42563) https://bugs.llvm.org/show_bug.cgi?id=42563 llvm-svn: 375135
* [LoopIdiom] BCmp: check, not assert that loop exits exit out of the loop ↵Roman Lebedev2019-10-171-0/+468
| | | | | | | | | | | | | | | (PR43687) We can't normally stumble into that assertion because a tautological *conditional* `br` in loop body is required, one that always branches to loop latch. But that should have been always folded to an unconditional branch before we get it. But that is not guaranteed if the pass is run standalone. So let's just promote the assertion into a proper check. Fixes https://bugs.llvm.org/show_bug.cgi?id=43687 llvm-svn: 375100
* Reland: Dead Virtual Function EliminationOliver Stannard2019-10-179-0/+749
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
* [Analysis] Don't assume that unsigned overflow can't happen in EmitGEPOffset ↵Mikhail Maltsev2019-10-175-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR42699) Summary: Currently when computing a GEP offset using the function EmitGEPOffset for the following instruction getelementptr inbounds i32, i32* %p, i64 %offs we get mul nuw i64 %offs, 4 Unfortunately we cannot assume that unsigned wrapping won't happen here because %offs is allowed to be negative. Making such assumptions can lead to miscompilations: see the new test test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine would generate the following comparison: icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe Whereas the correct value to compare with is -2. This patch replaces the NUW flag with NSW in the multiplication instructions generated by EmitGEPOffset and adjusts the test suite. https://bugs.llvm.org/show_bug.cgi?id=42699 Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune Reviewed By: lebedev.ri Subscribers: reames, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68342 llvm-svn: 375089
* [DAGCombine][ARM] Enable extending masked loadsSam Parker2019-10-171-3/+139
| | | | | | | | | | | Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
* Remove a stale comment, noted in post commit review for rL375038Philip Reames2019-10-161-1/+0
| | | | llvm-svn: 375040
* [IndVars] Fix a miscompile in off-by-default loop predication implementationPhilip Reames2019-10-161-9/+4
| | | | | | | | | | | | The problem is that we can have two loop exits, 'a' and 'b', where 'a' and 'b' would exit at the same iteration, 'a' precedes 'b' along some path, and 'b' is predicated while 'a' is not. In this case (see the previously submitted test case), we causing the loop to exit through 'b' whereas it should have exited through 'a'. This only applies to loop exits where the exit counts are not provably inequal, but that isn't as much of a restriction as it appears. If we could order the exit counts, we'd have already removed one of the two exits. In theory, we might be able to prove inequality w/o ordering, but I didn't really explore that piece. Instead, I went for the obvious restriction and ensured we didn't predicate exits following non-predicateable exits. Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (failures seen), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark (he manually enabled the transform). Once this is fixed, I'll try to filter through the fuzzer failures to see if there's anything additional lurking. Differential Revision https://reviews.llvm.org/D68956 llvm-svn: 375038
* [SLP] avoid reduction transform on patterns that the backend can ↵Sanjay Patel2019-10-161-52/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load-combine (2nd try) The 1st attempt at this modified the cost model in a bad way to avoid the vectorization, but that caused problems for other users (the loop vectorizer) of the cost model. I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a cost-independent bailout with a conservative pattern match for a multi-instruction sequence that can probably be reduced later. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 375025
* [InstCombine][AMDGPU] Fix crash with v3i16/v3f16 buffer intrinsicsPiotr Sobczak2019-10-161-0/+45
| | | | | | | | | | | | | | | | | | | | Summary: This is something of a workaround to avoid a crash later on in type legalizer (WidenVectorResult()). Also added some f16 tests, including a non-working v3f16 case with a FIXME. Reviewers: arsenm, tpr, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68865 llvm-svn: 374993
* Revert "[HardwareLoops] Optimisation remarks"Sjoerd Meijer2019-10-162-13/+2
| | | | | | | | while I investigate the PPC build bot failures. This reverts commit ad763751565b9663bc338fa2ca5ade86c6ca22ec. llvm-svn: 374992
* [HardwareLoops] Optimisation remarksSjoerd Meijer2019-10-162-2/+13
| | | | | | | | | | | | This adds the initial plumbing to support optimisation remarks in the IR hardware-loop pass. I have left a todo in a comment where we can improve the reporting, and will iterate on that now that we have this initial support in. Differential Revision: https://reviews.llvm.org/D68579 llvm-svn: 374980
* [NewGVN] Check that call has an access.Alina Sbirlea2019-10-151-0/+42
| | | | | | | | | | Check that a call has an attached MemoryAccess before calling getClobbering on the instruction. If no access is attached, the instruction does not access memory. Resolves PR43441. llvm-svn: 374920
* [InstCombine] fold a shifted bool zext to a select (2nd try)Sanjay Patel2019-10-152-8/+15
| | | | | | | | | | | | | | | | | | | | The 1st attempt at rL374828 inserted the code at the wrong position (outside of the constant-shift-amount block). Trying again with an additional test to verify const-ness. For a constant shift amount, add the following fold. shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0) https://rise4fun.com/Alive/IZ9 Fixes PR42257. Based on original patch by @zvi (Zvi Rackover) Differential Revision: https://reviews.llvm.org/D63382 llvm-svn: 374886
* Revert [SROA] Reuse existing lifetime markers if possibleDavid L. Jones2019-10-151-69/+0
| | | | | | | | This reverts r374692 (git commit 92694eba933ef4ea0b1b6377809ff266df37d61b) Reproducer sent to commit thread on llvm-commits. llvm-svn: 374859
* Revert [InstCombine] fold a shifted bool zext to a selectSanjay Patel2019-10-142-4/+8
| | | | | | This reverts r374828 (git commit 1f40f15d54aac06421448b6de131231d2d78bc75) due to bot breakage llvm-svn: 374851
* [MemorySSA] Update for partial unswitch.Alina Sbirlea2019-10-149-0/+106
| | | | | | | | Update MSSA for blocks cloned when doing partial unswitching. Enable additional testing with MSSA. Resolves PR43641. llvm-svn: 374850
* Revert "Dead Virtual Function Elimination"Jorge Gorbe Moya2019-10-149-749/+0
| | | | | | This reverts commit 9f6a873268e1ad9855873d9d8007086c0d01cf4f. llvm-svn: 374844
* [InstCombine] fold a shifted bool zext to a selectSanjay Patel2019-10-142-8/+4
| | | | | | | | | | | | | | | For a constant shift amount, add the following fold. shl (zext (i1 X)), ShAmt --> select (X, 1 << ShAmt, 0) https://rise4fun.com/Alive/IZ9 Fixes PR42257. Based on original patch by @zvi (Zvi Rackover) Differential Revision: https://reviews.llvm.org/D63382 llvm-svn: 374828
* [InstCombine] add tests for select/shift transforms; NFCSanjay Patel2019-10-142-0/+59
| | | | | | A transform proposal for the shift form is in D63382. llvm-svn: 374818
* [Tests] Add a test demonstrating a miscompile in the off-by-default ↵Philip Reames2019-10-141-0/+75
| | | | | | | | | | loop-pred transform Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (lots of failures), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark. llvm-svn: 374812
* [LoopIdiom] BCmp: loop exit count must not be wider than size_t that `bcmp` ↵Roman Lebedev2019-10-141-0/+59
| | | | | | | | | | | | takes As reported by Joerg Sonnenberger in IRC, for 32-bit systems, where pointer and size_t are 32-bit, if you use 64-bit-wide variable in the loop, you could end up with loop exit count being of the type wider than the size_t. Now, i'm not sure if we can produce `bcmp` from that (just truncate?), but we certainly should not assert/miscompile. llvm-svn: 374811
* [Tests] Add a few more tests for idioms with FP induction variablesPhilip Reames2019-10-141-0/+231
| | | | llvm-svn: 374807
* [CostModel][X86] Add CTLZ scalar costsSimon Pilgrim2019-10-141-142/+160
| | | | | | | | Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
* Reapply r374743 with a fix for the ocaml bindingJoerg Sonnenberger2019-10-145-154/+122
| | | | | | | | | | | | | | | | | | | Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374784
* [IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperatorCameron McInally2019-10-144-17/+17
| | | | | | | | Reapply r374240 with fix for Ocaml test, namely Bindings/OCaml/core.ml. Differential Revision: https://reviews.llvm.org/D61675 llvm-svn: 374782
* [CostModel][X86] Add CTPOP scalar costs (PR43656)Simon Pilgrim2019-10-141-22/+46
| | | | | | | | Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
OpenPOWER on IntegriCloud