summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* InstCombine: Fold fabs on select of constantsMatt Arsenault2017-01-031-0/+12
| | | | llvm-svn: 290913
* [InstCombine] use 'match' to reduce code bloat; NFCISanjay Patel2017-01-031-15/+11
| | | | | | | | | | | I wrote this patch before seeing the comment in: https://reviews.llvm.org/D27114 ...that suggests we should actually be canonicalizing the other way. So just in case we decide this is the right way, we might as well have a cleaner implementation. llvm-svn: 290912
* InstCombine: Add fma with constant transformsMatt Arsenault2017-01-031-3/+17
| | | | | | DAGCombine already does these. llvm-svn: 290860
* InstCombine: Add fma + fabs/fneg transformsMatt Arsenault2017-01-031-0/+30
| | | | | | | fma (fneg x), (fneg y), z -> fma x, y, z fma (fabs x), (fabs x), z -> fma x, x, z llvm-svn: 290859
* [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and ↵Craig Topper2016-12-311-0/+2
| | | | | | | | llvm.x86.avx512.vcomi.ss don't use the upper elements of their input. This was already done for the SSE/SSE2 version of the intrinsics. llvm-svn: 290776
* [InstCombine][AVX-512] When turning intrinsics with masking into native IR, ↵Craig Topper2016-12-301-9/+20
| | | | | | | | don't emit a select if the mask is known to be all ones. This saves InstCombine the burden of having to optimize the select later. llvm-svn: 290774
* [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ ↵Craig Topper2016-12-271-1/+3
| | | | | | | | | | instructions PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. This builds on r290554 which added supported for 128 and 256-bit. llvm-svn: 290582
* [AVX-512][InstCombine] Teach InstCombine to turn masked scalar ↵Craig Topper2016-12-271-37/+42
| | | | | | | | add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed. llvm-svn: 290566
* [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with ↵Craig Topper2016-12-271-0/+44
| | | | | | rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. llvm-svn: 290559
* [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructionsSimon Pilgrim2016-12-261-0/+15
| | | | | | | | PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. Differential Revision: https://reviews.llvm.org/D28119 llvm-svn: 290554
* [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with ↵Craig Topper2016-12-261-3/+49
| | | | | | | | | | | | | | | | | rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. Summary: I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon. I'll do the same thing for packed add/sub/mul/div in a future patch. Reviewers: delena, RKSimon, zvi, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27879 llvm-svn: 290535
* [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv ↵Craig Topper2016-12-251-4/+50
| | | | | | | | | | | | | | | | | intrinsics into shufflevector instructions Summary: This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants. We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that. Reviewers: zvi, delena, spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27825 llvm-svn: 290530
* [Analysis] Centralize objectsize lowering logic.George Burgess IV2016-12-201-10/+5
| | | | | | | | | We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214
* Revert @llvm.assume with operator bundles (r289755-r289757)Daniel Jasper2016-12-191-83/+11
| | | | | | | This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086
* [AVX-512][InstCombine] Add masked scalar FMA intrinsics to ↵Craig Topper2016-12-151-0/+10
| | | | | | SimplifyDemandedVectorElts. llvm-svn: 289759
* Remove the AssumptionCacheHal Finkel2016-12-151-11/+11
| | | | | | | | | After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756
* Make processing @llvm.assume more efficient by using operand bundlesHal Finkel2016-12-151-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755
* [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar ↵Craig Topper2016-12-141-1/+17
| | | | | | floating point to integer conversion intrinsics. llvm-svn: 289639
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar ↵Craig Topper2016-12-141-20/+2
| | | | | | | | add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636
* [X86][InstCombine] Handle scalar fmadd intrinsics correctly in ↵Craig Topper2016-12-141-8/+8
| | | | | | | | SimplifyDemandedVectorElts. Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly. llvm-svn: 289632
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round ↵Craig Topper2016-12-141-27/+2
| | | | | | | | | | | | intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289629
* [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar ↵Craig Topper2016-12-141-17/+6
| | | | | | | | | | | | min/max/cmp intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289628
* [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar ↵Craig Topper2016-12-131-0/+13
| | | | | | | | | | intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523
* [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for ↵Craig Topper2016-12-111-0/+8
| | | | | | | | scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for ↵Craig Topper2016-12-111-1/+3
| | | | | | | | scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371
* [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded ↵Craig Topper2016-12-111-0/+31
| | | | | | | | elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370
* [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls ↵Craig Topper2016-12-111-10/+10
| | | | | | to match 128 and 256-bit. llvm-svn: 289354
* [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a ↵Craig Topper2016-12-111-2/+3
| | | | | | shufflevector if the indices are constant. llvm-svn: 289348
* Replace some callers of setTailCall with setTailCallKindDavid Majnemer2016-11-251-6/+5
| | | | | | | We were a little sloppy with adding tailcall markers. Be more consistent by using setTailCallKind instead of setTailCall. llvm-svn: 287955
* [InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics ↵Craig Topper2016-11-181-0/+18
| | | | | | | | for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316
* [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmulCraig Topper2016-11-161-8/+0
| | | | | | | | | | | | Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083
* [InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked ↵Craig Topper2016-11-131-4/+18
| | | | | | AVX-512 variable shift intrinsics. llvm-svn: 286755
* [InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 ↵Craig Topper2016-11-131-1/+45
| | | | | | | | shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739
* [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining ↵Andrea Di Biagio2016-09-071-3/+3
| | | | | | | | | | | logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807
* [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the ↵Andrea Di Biagio2016-09-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804
* [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacingDorit Nuzman2016-09-041-0/+6
| | | | | | | | | | | | memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617
* Test commit.Dorit Nuzman2016-09-041-0/+1
| | | | llvm-svn: 280615
* AMDGPU: Do basic folding of class intrinsicMatt Arsenault2016-09-031-0/+79
| | | | | | | This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586
* Make cltz and cttz zero undef when the operand cannot be zero in InstCombineAmaury Sechet2016-08-181-5/+20
| | | | | | | | | | | | Summary: Also add popcount(n) == bitsize(n) -> n == -1 transformation. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23134 llvm-svn: 279141
* Replace a few more "fall through" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970
* Fix some Clang-tidy modernize and Include What You Use warnings.Eugene Zelenko2016-08-111-11/+39
| | | | | | Differential revision: https://reviews.llvm.org/D23291 llvm-svn: 278364
* fix comment; NFCSanjay Patel2016-08-111-2/+3
| | | | llvm-svn: 278342
* use auto* with dyn_cast ; NFCSanjay Patel2016-08-111-2/+1
| | | | llvm-svn: 278340
* getParent()->getParent() == getFunction() ; NFCSanjay Patel2016-08-111-2/+1
| | | | llvm-svn: 278339
* [InstCombine] refactor ctlz/cttz folds (NFCI)Sanjay Patel2016-08-051-34/+33
| | | | | | | | | | Note that this fold really belongs in InstSimplify. Refactoring here anyway as an intermediate step because there's a planned addition to this function in D23134. Differential Revision: https://reviews.llvm.org/D23223 llvm-svn: 277883
* InstCombine: Replace some never-null pointers with references. NFCJustin Bogner2016-08-051-24/+25
| | | | llvm-svn: 277792
* Do not remove empty lifetime.start/lifetime.end rangesVitaly Buka2016-07-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277072
* Should be committed as one CL.Vitaly Buka2016-07-281-5/+0
| | | | | | This reverts commits r277068 r277067 r277066. llvm-svn: 277071
* Do not remove empty lifetime.start/lifetime.end rangesVitaly Buka2016-07-281-8/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277068
* manedVitaly Buka2016-07-281-6/+8
| | | | llvm-svn: 277067
OpenPOWER on IntegriCloud