summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [APInt] Use lshrInPlace to replace lshr where possibleCraig Topper2017-04-181-2/+3
| | | | | | | | | | This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566
* [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimizationMichael Zuckerman2017-04-161-0/+31
| | | | | | | | | This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b)) to instCombineCall pass and was written specific for X86 CMP intrinsics. Differential Revision: https://reviews.llvm.org/D31398 llvm-svn: 300422
* [IR] Make paramHasAttr to use arg indices instead of attr indicesReid Kleckner2017-04-141-1/+1
| | | | | | | | | This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367
* [IR] Make getParamAttributes take argument numbers, not ArgNo+1Reid Kleckner2017-04-131-16/+19
| | | | | | | | | | | | Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272
* [InstCombine] Fix !prof metadata preservation for invokesReid Kleckner2017-04-131-18/+16
| | | | | | | | | | | | | | | | | | | | Summary: Bug noticed by inspection. Extend the test to handle invokes as well as calls, and rewrite it to not depend on the inliner and other passes. Also simplify the call site replacement code with CallSite, similar to what I did to dead arg elimination and arg promotion (rL300235 and rL300229). Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32041 llvm-svn: 300251
* [InstCombine] Simplify attribute code with new AttributeList::get NFCReid Kleckner2017-04-131-31/+20
| | | | llvm-svn: 300230
* [IR] Take func, ret, and arg attrs separately in AttributeList::getReid Kleckner2017-04-131-11/+7
| | | | | | | | | | | | | This seems like a much more natural API, based on Derek Schuff's comments on r300015. It further hides the implementation detail of AttributeList that function attributes come last and appear at index ~0U, which is easy for the user to screw up. git diff says it saves code as well: 97 insertions(+), 137 deletions(-) This also makes it easier to change the implementation, which I want to do next. llvm-svn: 300153
* [IR] Add AttributeSet to hide AttributeSetNode* again, NFCReid Kleckner2017-04-121-4/+3
| | | | | | | | | | | | | | | | | Summary: For now, it just wraps AttributeSetNode*. Eventually, it will hold AvailableAttrs as an inline bitset, and adding and removing enum attributes will be super cheap. This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h. Reviewers: pete, chandlerc Subscribers: llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D31940 llvm-svn: 300014
* Reland "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"Reid Kleckner2017-04-101-30/+19
| | | | | | | | | | | | | | | | | | | | | | | | | This re-lands r299875. I introduced a bug in Clang code responsible for replacing K&R, no prototype declarations with a real function definition with a prototype. The bug was here: // Collect any return attributes from the call. - if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex)) - newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(), - oldAttrs.getRetAttributes())); + newAttrs.push_back(oldAttrs.getRetAttributes()); Previously getRetAttributes() carried AttributeList::ReturnIndex in its AttributeList. Now that we return the AttributeSetNode* directly, it no longer carries that index, and we call this overload with a single node: AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>) That aborted with an assertion on x86_32 targets. I added an explicit triple to the test and added CHECKs to help find issues like this in the future sooner. llvm-svn: 299899
* Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"Reid Kleckner2017-04-101-19/+30
| | | | | | | This reverts r299875. A Linux bot came back with a test failure: http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c llvm-svn: 299878
* [IR] Make AttributeSetNode public, avoid temporary AttributeList copiesReid Kleckner2017-04-101-30/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary AttributeList just to hide the AttributeSetNode type. I've also added a factory method to create AttributeLists from a parallel array of AttributeSetNodes. I think this simplifies construction of AttributeLists when rewriting function prototypes. Previously we would test if a particular index had attributes, and conditionally add a temporary attribute list to a vector. Now the attribute set vector is parallel to the argument vector already that these passes already construct. My long term vision is to wrap AttributeSetNode* inside an AttributeSet type that holds the enum attributes, but that will come in a follow up change. I haven't done any performance measurements for this change because profiling hasn't shown that any of the affected code is hot. Reviewers: pete, chandlerc, sanjoy, hfinkel Reviewed By: pete Subscribers: jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D31198 llvm-svn: 299875
* Do not translate rint into nearbyint, but truncate it like nearbyint.Joerg Sonnenberger2017-03-311-0/+1
| | | | | | | | | | | | | | A common way to implement nearbyint is by fiddling with the floating point environment and calling rint. This is used at least by the BSD libm and musl. As such, canonicalizing the latter to the former will create infinite loops for libm and generally pessimize performance, at least when the generic C versions are used. This change preserves the rint in the libcall translation and also handles the domain truncation logic, so that rint with float argument will be reduced to rintf etc. llvm-svn: 299247
* Fix the InstCombine to reserve the VP metadata and sets correct call count.Dehao Chen2017-03-311-0/+6
| | | | | | | | | | | | | | Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31344 llvm-svn: 299228
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-301-1/+1
| | | | | | Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072
* AMDGPU: Fold rcp/rsq of undef to undefMatt Arsenault2017-03-241-2/+15
| | | | llvm-svn: 298725
* Rename AttributeSet to AttributeListReid Kleckner2017-03-211-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* AMDGPU: Fold icmp/fcmp into icmp intrinsicMatt Arsenault2017-03-131-0/+87
| | | | | | | The typical use is a library vote function which compares to 0. Fold the user condition into the intrinsic. llvm-svn: 297650
* Remove sometimes faulty rewrite of memcpy in instcombine.Mikael Holmen2017-03-011-55/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Solves PR 31990. The bad rewrite could replace a memcpy of one word with store i4 -1 while it should actually be store i8 -1 Hopefully opt and llc has improved enough so the original optimization done by the code isn't needed anymore. One already existing testcase is affected. It originally tested that the memcpy was replaced with load double but since we now remove that rewrite it will be load i64 instead. Patch suggestion by Eli Friedman. Reviewers: eli.friedman, majnemer, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D30254 llvm-svn: 296585
* AMDGPU: Basic folds for fmed3 intrinsicMatt Arsenault2017-02-271-0/+76
| | | | | | | Constant fold, canonicalize constants to RHS, reduce to minnum/maxnum when inputs are nan/undef. llvm-svn: 296409
* AMDGPU: Replace disabled exp inputs with undefMatt Arsenault2017-02-231-0/+28
| | | | llvm-svn: 295914
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-221-0/+73
| | | | llvm-svn: 295899
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+25
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* InstCombine: Canonicalize fast fmuladd to fmul + faddMatt Arsenault2017-02-161-1/+14
| | | | llvm-svn: 295353
* [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus ↵Craig Topper2017-02-161-2/+4
| | | | | | intrinsics like it does 128/256-bit. llvm-svn: 295294
* [InstComobineCalls] Fix buildbot failures after r294453.Igor Laevsky2017-02-081-1/+1
| | | | | | | | Some targets don't support uint64_t options. Change type to unsigned. Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294461
* [InstCombineCalls] Unfold element atomic memcpy instructionIgor Laevsky2017-02-081-0/+81
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453
* [InstCombineCalls] Remove zero length atomic memcpy intrinsicsIgor Laevsky2017-02-081-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294452
* [InstCombine] Allow InstCombine to merge adjacent guardsSanjoy Das2017-02-011-6/+14
| | | | | | | | | | | | | | | | | | | | Summary: If there are two adjacent guards with different conditions, we can remove one of them and include its condition into the condition of another one. This patch allows InstCombine to merge them by the following pattern: guard(a); guard(b) -> guard(a & b). Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29378 llvm-svn: 293778
* [Instcombine] Combine consecutive identical fencesDavide Italiano2017-01-311-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661
* [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.Justin Lebar2017-01-271-0/+1
| | | | llvm-svn: 293253
* [NVPTX] Fix use-after-stack-free bug in InstCombineCalls.Justin Lebar2017-01-271-1/+1
| | | | | | Introduced in r293244. llvm-svn: 293251
* [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.Justin Lebar2017-01-271-0/+250
| | | | | | | | | | | | | | | | | | | | | | | | Summary: There are many NVVM intrinsics that we can't entirely get rid of, but that nonetheless often correspond to target-generic LLVM intrinsics. For example, if flush denormals to zero (ftz) is enabled, we can convert @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a non-ftz PTX instruction. In this case, we can, however, simplify the non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32. These transformations are particularly useful because they let us constant fold instructions that appear in libdevice, the bitcode library that ships with CUDA and essentially functions as its libm. Reviewers: tra Subscribers: hfinkel, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D28794 llvm-svn: 293244
* Revert a couple of InstCombine/Guard checkinsSanjoy Das2017-01-261-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | This change reverts: r293061: "[InstCombine] Canonicalize guards for NOT OR condition" r293058: "[InstCombine] Canonicalize guards for AND condition" They miscompile cases like: ``` declare void @llvm.experimental.guard(i1, ...) define void @test_guard_not_or(i1 %A, i1 %B) { %C = or i1 %A, %B %D = xor i1 %C, true call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ] ret void } ``` because they do transfer the `i32 20, i32 30` parameters to newly created guard instructions. llvm-svn: 293227
* [X86] Add demanded elts support for the inputs to pclmul intrinsicCraig Topper2017-01-261-0/+38
| | | | | | | | This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors. Differential Revision: https://reviews.llvm.org/D28979 llvm-svn: 293151
* [InstCombine] Canonicalize guards for NOT OR conditionArtur Pilipenko2017-01-251-0/+12
| | | | | | | | | | | | This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29075 Patch by Maxim Kazantsev. llvm-svn: 293061
* [InstCombine][SSE] Add support for PACKSS/PACKUS constant foldingSimon Pilgrim2017-01-251-0/+94
| | | | | | Differential Revision: https://reviews.llvm.org/D28949 llvm-svn: 293060
* [InstCombine] Canonicalize guards for AND conditionArtur Pilipenko2017-01-251-0/+17
| | | | | | | | | | | | This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29074 Patch by Maxim Kazantsev. llvm-svn: 293058
* [InstCombine] Allow InstrCombine to remove one of adjacent guards if they ↵Artur Pilipenko2017-01-251-0/+10
| | | | | | | | | | | | | | are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056
* [InstCombine][X86] MULDQ/MULUDQ undef -> zeroSimon Pilgrim2017-01-241-1/+1
| | | | | | | | Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst llvm-svn: 292913
* SimplifyLibCalls: Replace more unary libcalls with intrinsicsMatt Arsenault2017-01-231-1/+8
| | | | llvm-svn: 292855
* [InstCombine][X86] Add MULDQ/MULUDQ constant folding supportSimon Pilgrim2017-01-231-3/+40
| | | | llvm-svn: 292793
* [InstCombine][X86] MULDQ/MULUDQ undef -> zeroSimon Pilgrim2017-01-231-2/+2
| | | | | | Match generic mul behaviour so that <X x i64> multiply and muldq/muludq pattern act the same llvm-svn: 292784
* [InstCombine][X86] Add MULDQ/MULUDQ undef handlingSimon Pilgrim2017-01-201-0/+15
| | | | llvm-svn: 292627
* [InstCombine] Remove unnecessary intrinsics demanded elts handlingSimon Pilgrim2017-01-181-22/+2
| | | | | | As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need. llvm-svn: 292365
* [InstCombine][X86][AVX] Add DemandedElts support for VPERMILPD/VPERMILPS ↵Simon Pilgrim2017-01-171-1/+11
| | | | | | | | instructions Simplify a vpermilvar shuffle mask based on the elements of the mask that are actually demanded. llvm-svn: 292209
* SimplifyLibCalls: Replace fabs libcalls with intrinsicsMatt Arsenault2017-01-171-0/+12
| | | | | | | | Add missing fabs(fpext) optimzation that worked with the call, and also fixes it creating a second fpext when there were multiple uses. llvm-svn: 292172
* [InstCombine][SSE] Add DemandedElts support for PSHUFB instructionsSimon Pilgrim2017-01-161-1/+11
| | | | | | | | Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded. Differential Revision: https://reviews.llvm.org/D28745 llvm-svn: 292101
* Make processing @llvm.assume more efficient - Add affected values to the ↵Hal Finkel2017-01-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | assumption cache Here's my second try at making @llvm.assume processing more efficient. My previous attempt, which leveraged operand bundles, r289755, didn't end up working: it did make assume processing more efficient but eliminating the assumption cache made ephemeral value computation too expensive. This is a more-targeted change. We'll keep the assumption cache, but extend it to keep a map of affected values (i.e. values about which an assumption might provide some information) to the corresponding assumption intrinsics. This allows ValueTracking and LVI to find assumptions relevant to the value being queried without scanning all assumptions in the function. The fact that ValueTracking started doing O(number of assumptions in the function) work, for every known-bits query, has become prohibitively expensive in some cases. As discussed during the review, this is a pragmatic fix that, longer term, will likely be replaced by a more-principled solution (perhaps based on an extended SSA form). Differential Revision: https://reviews.llvm.org/D28459 llvm-svn: 291671
* InstCombine: Set operands instead of creating new callMatt Arsenault2017-01-101-10/+6
| | | | llvm-svn: 291612
* InstCombine: Fold cos(-x) -> cos(x)Matt Arsenault2017-01-041-0/+14
| | | | | | Also cos(fabs(x)) -> cos(x) llvm-svn: 291022
OpenPOWER on IntegriCloud