summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵Craig Topper2019-08-143-208/+262
| | | | | | | | | | | | 128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
* [SimplifyCFG] Add "safe abs" test from CMSIS DSP 'abs_with_clamp()'Roman Lebedev2019-08-141-0/+34
| | | | | | | | | With -phi-node-folding-threshold=3 this branch would get flattened into select. See https://reviews.llvm.org/D65148#1629010 llvm-svn: 368847
* [InstCombine][NFC] Autogenerate checks in adjust-for-minmax.llRoman Lebedev2019-08-141-82/+82
| | | | | | Being affected by WIP patch. llvm-svn: 368807
* [LV] Fold-tail flagDorit Nuzman2019-08-141-1/+19
| | | | | | | | | | | This is the compiler-flag equivalent of the Predicate pragma (https://reviews.llvm.org/D65197), to direct the vectorizer to fold the remainder-loop into the main-loop using predication. Differential Revision: https://reviews.llvm.org/D66108 Reviewers: Ayal, hsaito, fhahn, SjoerdMeije llvm-svn: 368801
* Revert '[LICM] Make Loop ICM profile aware' and 'Fix pass dependency for LICM'David L. Jones2019-08-141-8/+2
| | | | | | | This reverts r368526 (git commit 7e71aa24bc0788690fea7f0d7eab400c6a784deb) This reverts r368542 (git commit cb5a90fd314a7914cf293797bb4fd7a6841052cf) llvm-svn: 368800
* Support swifterror in coroutine lowering.John McCall2019-08-141-0/+143
| | | | | | | | | | The support for swifterror allocas should work in all lowerings. The support for swifterror arguments only really works in a lowering with prototypes where you can ensure that the prototype also has a swifterror argument; I'm not really sure how it could possibly be made to work in the switch lowering. llvm-svn: 368795
* Update for optimizer changes.John McCall2019-08-141-1/+1
| | | | | | rdar://37352868 llvm-svn: 368794
* In coro.retcon lowering, don't explode if the optimizer messes around with ↵John McCall2019-08-141-0/+20
| | | | | | the linkage of the prototype or the exact types of the yielded values. llvm-svn: 368793
* Fix a use-after-free in the coro.alloca treatment.John McCall2019-08-141-0/+28
| | | | llvm-svn: 368792
* Add intrinsics for doing frame-bound dynamic allocations within a coroutine.John McCall2019-08-141-0/+219
| | | | | | | These rely on having an allocator provided to the coroutine and thus, for now, only work in retcon lowerings. llvm-svn: 368791
* Generalize llvm.coro.suspend.retcon to allow an arbitrary number of ↵John McCall2019-08-146-39/+220
| | | | | | arguments to be passed back to the continuation function. llvm-svn: 368789
* Extend coroutines to support a "returned continuation" lowering.John McCall2019-08-145-3/+388
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A quick contrast of this ABI with the currently-implemented ABI: - Allocation is implicitly managed by the lowering passes, which is fine for frontends that are fine with assuming that allocation cannot fail. This assumption is necessary to implement dynamic allocas anyway. - The lowering attempts to fit the coroutine frame into an opaque, statically-sized buffer before falling back on allocation; the same buffer must be provided to every resume point. A buffer must be at least pointer-sized. - The resume and destroy functions have been combined; the continuation function takes a parameter indicating whether it has succeeded. - Conversely, every suspend point begins its own continuation function. - The continuation function pointer is directly returned to the caller instead of being stored in the frame. The continuation can therefore directly destroy the frame when exiting the coroutine instead of having to leave it in a defunct state. - Other values can be returned directly to the caller instead of going through a promise allocation. The frontend provides a "prototype" function declaration from which the type, calling convention, and attributes of the continuation functions are taken. - On the caller side, the frontend can generate natural IR that directly uses the continuation functions as long as it prevents IPO with the coroutine until lowering has happened. In combination with the point above, the frontend is almost totally in charge of the ABI of the coroutine. - Unique-yield coroutines are given some special treatment. llvm-svn: 368788
* [SimplifyLibCalls] Add noalias from known callsitesDavid Bolvansky2019-08-1314-38/+118
| | | | | | | | | | | | | | | | | | Summary: Should be fine for memcpy, strcpy, strncpy. Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66135 llvm-svn: 368724
* [ValueTracking] Improve reverse assumption inferenceNikita Popov2019-08-131-2/+2
| | | | | | | | | | | | | | | Use isGuaranteedToTransferExecutionToSuccessor() instead of isSafeToSpeculativelyExecute() when seeing whether we can propagate the information in an assume backwards in isValidAssumeForContext(). The latter is more general - it also allows arbitrary loads/stores - and is also the condition we want: if our assume is guaranteed to execute, its condition not holding would be UB. Original patch by arielb1. Differential Revision: https://reviews.llvm.org/D37215 llvm-svn: 368723
* [NFC] Revisited/updated testsDavid Bolvansky2019-08-131-4/+22
| | | | llvm-svn: 368722
* [SLC] Improve dereferenceable bytes annotationDavid Bolvansky2019-08-131-1/+1
| | | | llvm-svn: 368715
* [InstCombine] Non-canonical clamp-like pattern handlingRoman Lebedev2019-08-133-155/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given a pattern like: ``` %old_cmp1 = icmp slt i32 %x, C2 %old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high %old_x_offseted = add i32 %x, C1 %old_cmp0 = icmp ult i32 %old_x_offseted, C0 %r = select i1 %old_cmp0, i32 %x, i32 %old_replacement ``` it can be rewritten as more canonical pattern: ``` %new_cmp1 = icmp slt i32 %x, -C1 %new_cmp2 = icmp sge i32 %x, C0-C1 %new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x %r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low ``` Iff `-C1 s<= C2 s<= C0-C1` Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result) Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result) If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses. NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting. So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants: | | ULT | UGE | UGT | | SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH | | SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk | https://rise4fun.com/Alive/PlzS | | SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY | https://rise4fun.com/Alive/KrzC | {F9730206} This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch. This patch requires D65530. Reviewers: spatel, nikic, xbolva00, dmgreen Reviewed By: spatel Subscribers: hiraditya, llvm-commits, dmgreen Tags: #llvm Differential Revision: https://reviews.llvm.org/D65765 llvm-svn: 368687
* [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if ↵Roman Lebedev2019-08-132-43/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | all users are freely invertible Summary: This is rather unconventional.. As the comment there says, we don't have much folds for xor-of-icmps, we try to turn them into an and-of-icmps, for which we have plenty of folds. But if the ICmp we need to invert is not single-use - we give up. As discussed in https://reviews.llvm.org/D65148#1603922, we may have a non-canonical CLAMP pattern, with bit match and select-of-threshold that we'll potentially clamp. As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`, out of all 8 variations of the pattern, only two are **not** canonicalized into the variant with and+icmp instead of bit math. The reason is because the ICmp we need to invert is not single-use - we give up. We indeed can't perform this fold at will, the general rule is that we should not increase instruction count in InstCombine, But we wouldn't end up increasing instruction count if we can adapt every other user to the inverted value. This way the `not` we create **will** get folded, and in the end the instruction count did not increase. For that, of course, we need to look at the users of a Value, which is again rather unconventional for InstCombine :S Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`. The alternatives would be to not create that `not`, but add duplicate code to manually invert all users; or to add some even less general combine to handle some more specific pattern[s]. Reviewers: spatel, nikic, RKSimon, craig.topper Reviewed By: spatel Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65530 llvm-svn: 368685
* [SimplifyLibCalls] Add dereferenceable bytes from known callsitesDavid Bolvansky2019-08-1319-344/+808
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: int mm(char *a, char *b) { return memcmp(a,b,16); } Currently: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16) ret i32 %call } After patch: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* dereferenceable(16) %a, i8* dereferenceable(16) %b, i64 16) ret i32 %call } Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: javed.absar, spatel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66079 llvm-svn: 368657
* [NFC][InstCombine] Non-canonical clamp pattern: non-canonical predicate testsRoman Lebedev2019-08-131-0/+43
| | | | | | | | | | We can't handle 'uge' case because we can't ever get it, there needs to be extra use on that compare or else it will be canonicalized, but because of extra use we can't handle it. 'sge' case we can have. llvm-svn: 368656
* [InstCombine] add tests for scalar-select-of-vectors; NFCSanjay Patel2019-08-121-21/+82
| | | | llvm-svn: 368583
* [InstCombine] x /c fabs(x) -> copysign(1.0, x)David Bolvansky2019-08-121-11/+28
| | | | | | | | | | | | | | | | | | Summary: x / fabs(x) -> copysign(1.0, x) fabs(x) / x -> copysign(1.0, x) Reviewers: spatel, foad, RKSimon, efriedma Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65898 llvm-svn: 368570
* [InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid ↵Roman Lebedev2019-08-121-0/+12
| | | | | | | | | | | constantexpr pitfail (PR42962) Instead of matching value and then blindly casting to BinaryOperator just to get the opcode, just match instruction and do no cast. Fixes https://bugs.llvm.org/show_bug.cgi?id=42962 llvm-svn: 368554
* [MVE] Don't try to unroll vectorised MVE loopsDavid Green2019-08-111-0/+127
| | | | | | | | | | | | | | | Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530
* [ARM] Permit auto-vectorization using MVEDavid Green2019-08-111-0/+5
| | | | | | | | | | | | | With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529
* [LICM] Make Loop ICM profile awareWenlei He2019-08-111-2/+8
| | | | | | | | | | | | | | | | | | | | | Summary: Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block. Test Plan: ninja check Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk Reviewed By: asbirlea Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65060 llvm-svn: 368526
* [NFC][InstCombine] Tests for shift amount reassociation in bittest with ↵Roman Lebedev2019-08-101-0/+475
| | | | | | | | | | | | | truncated shl (PR42399) trunc-of-shl: https://rise4fun.com/Alive/zGx https://rise4fun.com/Alive/sl0L I.e. no extra legality check needed. https://bugs.llvm.org/show_bug.cgi?id=42399 llvm-svn: 368520
* [InstCombine] Shift amount reassociation in bittest: relax one-use check ↵Roman Lebedev2019-08-101-6/+4
| | | | | | | | | | when shifting constant If one of the values being shifted is a constant, since the new shift amount is known-constant, the new shift will end up being constant-folded so, we don't need that one-use restriction then. llvm-svn: 368519
* [InstCombine] Shift amount reassociation in bittest: drop pointless one-use ↵Roman Lebedev2019-08-101-6/+6
| | | | | | | | | | restriction That one-use restriction is not needed for correctness - we have already ensured that one of the shifts will go away, so we know we won't increase the instruction count. So there is no need for that restriction. llvm-svn: 368518
* [NFC][InstCombine] Tests for shift amount reassociation in bittest with ↵Roman Lebedev2019-08-101-10/+67
| | | | | | shift of const llvm-svn: 368517
* [Reassociate] try harder to convert negative FP constants to positiveSanjay Patel2019-08-103-86/+85
| | | | | | | | | | | | | | | | | | | | | | This is an extension of a transform that tries to produce positive floating-point constants to improve canonicalization (and hopefully lead to more reassociation and CSE). The original patches were: D4904 D5363 (rL221721) But as the test diffs show, these were limited to basic patterns by walking from an instruction to its single user rather than recursively moving up the def-use sequence. No fast-math is required here because we're only rearranging implicit FP negations in intermediate ops. A motivating bug is: https://bugs.llvm.org/show_bug.cgi?id=32939 Differential Revision: https://reviews.llvm.org/D65954 llvm-svn: 368512
* cfi-icall: Allow the jump table to be optionally made non-canonical.Peter Collingbourne2019-08-092-5/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The default behavior of Clang's indirect function call checker will replace the address of each CFI-checked function in the output file's symbol table with the address of a jump table entry which will pass CFI checks. We refer to this as making the jump table `canonical`. This property allows code that was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address of a function, but it comes with a couple of caveats that are especially relevant for users of cross-DSO CFI: - There is a performance and code size overhead associated with each exported function, because each such function must have an associated jump table entry, which must be emitted even in the common case where the function is never address-taken anywhere in the program, and must be used even for direct calls between DSOs, in addition to the PLT overhead. - There is no good way to take a CFI-valid address of a function written in assembly or a language not supported by Clang. The reason is that the code generator would need to insert a jump table in order to form a CFI-valid address for assembly functions, but there is no way in general for the code generator to determine the language of the function. This may be possible with LTO in the intra-DSO case, but in the cross-DSO case the only information available is the function declaration. One possible solution is to add a C wrapper for each assembly function, but these wrappers can present a significant maintenance burden for heavy users of assembly in addition to adding runtime overhead. For these reasons, we provide the option of making the jump table non-canonical with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump table is made non-canonical, symbol table entries point directly to the function body. Any instances of a function's address being taken in C will be replaced with a jump table address. This scheme does have its own caveats, however. It does end up breaking function address equality more aggressively than the default behavior, especially in cross-DSO mode which normally preserves function address equality entirely. Furthermore, it is occasionally necessary for code not compiled with ``-fsanitize=cfi-icall`` to take a function address that is valid for CFI. For example, this is necessary when a function's address is taken by assembly code and then called by CFI-checking C code. The ``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make the jump table entry of a specific function canonical so that the external code will end up taking a address for the function that will pass CFI checks. Fixes PR41972. Differential Revision: https://reviews.llvm.org/D65629 llvm-svn: 368495
* [NFC] Added tests for D65898David Bolvansky2019-08-091-0/+24
| | | | llvm-svn: 368447
* [GlobalOpt] prevent crashing on large integer types (PR42932)Sanjay Patel2019-08-091-0/+23
| | | | | | | | | | | This is a minimal fix (copy the predicate for the assert) to prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=42932 ...when converting a constant integer of arbitrary width to uint64_t. Differential Revision: https://reviews.llvm.org/D65970 llvm-svn: 368437
* [InstSimplify] Report "Changed" also when only deleting dead instructionsBjorn Pettersson2019-08-091-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Make sure that we report that changes has been made by InstSimplify also in situations when only trivially dead instructions has been removed. If for example a call is removed the call graph must be updated. Bug seem to have been introduced by llvm-svn r367173 (commit 02b9e45a7e4b81), since the code in question was rewritten in that commit. Reviewers: spatel, chandlerc, foad Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65973 llvm-svn: 368401
* [InstCombine][NFC] Added comments about constants in tests for pow->exp2 foldDavid Bolvansky2019-08-082-0/+2
| | | | llvm-svn: 368360
* [LICM] Support unary FNeg in LICMCameron McInally2019-08-081-2/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D65908 llvm-svn: 368350
* Add llvm.licm.disable metadataTim Corringham2019-08-081-0/+33
| | | | | | | | | | | | | | For some targets the LICM pass can result in sub-optimal code in some cases where it would be better not to run the pass, but it isn't always possible to suppress the transformations heuristically. Where the front-end has insight into such cases it is beneficial to attach loop metadata to disable the pass - this change adds the llvm.licm.disable metadata to enable that. Differential Revision: https://reviews.llvm.org/D64557 llvm-svn: 368296
* [Reassociate] add more tests with negative FP constants; NFCSanjay Patel2019-08-081-0/+332
| | | | llvm-svn: 368290
* [ScalarizeMaskedMemIntrin] Add test case for expanding scatter.Craig Topper2019-08-071-0/+64
| | | | | | | This pass expands 6 intrinsics, but we only had test for 5 of them. llvm-svn: 368234
* [Attributor] Provide easier checkForallReturnedValues functionalityJohannes Doerfert2019-08-072-3/+3
| | | | | | | | | | | | | | | | | | | Summary: So far, whenever one wants to look at returned values, one had to deal with the AAReturnedValues and potentially with the AAIsDead attribute. In the same spirit as other checkForAllXXX methods, we add this functionality now to the Attributor. By adopting the use sites we got better results when return instructions were dead. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65733 llvm-svn: 368222
* [LoopVectorize][X86] Clamp interleave factor if we have a known constant ↵Craig Topper2019-08-071-15/+35
| | | | | | | | | | | | trip count that is less than VF*interleave If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it. Improves one of the cases from PR42674 Differential Revision: https://reviews.llvm.org/D65896 llvm-svn: 368215
* [NFC][LICM] Pre-commit test for unary FNeg support in LICM.Cameron McInally2019-08-071-0/+23
| | | | llvm-svn: 368211
* [NFC] Fixed newly added testsDavid Bolvansky2019-08-071-6/+9
| | | | llvm-svn: 368201
* [NFC] Added tests for x/fabs(X) foldDavid Bolvansky2019-08-071-0/+75
| | | | llvm-svn: 368200
* [LoopVectorize][X86] Add test case for missed vectorization from PR42674.Craig Topper2019-08-071-0/+41
| | | | | | | We do end vectorizing the code, but use an interleave factor that is too high and causes the vector code to be dead. llvm-svn: 368197
* [ValueTracking] When calculating known bits for integer abs, make sure we're ↵Craig Topper2019-08-071-1/+6
| | | | | | | | | | | | | | | looking at a negate and not just any instruction with the nsw flag set. The matchSelectPattern code can match patterns like (x >= 0) ? x : -x for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x). If the latter form was matched we can only use the nsw flag if its set on both subtracts. This match makes sure we're looking at the former case only. Differential Revision: https://reviews.llvm.org/D65692 llvm-svn: 368195
* Recommit r367901 "[X86] Enable ↵Craig Topper2019-08-076-147/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
* [InstCombine] Add a TODO commentJay Foad2019-08-071-0/+1
| | | | llvm-svn: 368176
* [InstCombine] Propagate fast math flags through selectsJay Foad2019-08-071-11/+4
| | | | | | | | | | | | | | | | | Summary: In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the outer op into both arms of the new select, to take advantage of simplifications that require fast math flags. Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00 Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65658 llvm-svn: 368175
OpenPOWER on IntegriCloud