summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] simplify masked load intrinsics with all ones or zeros masksSanjay Patel2016-02-011-0/+30
| | | | | | | | | A masked load with a zero mask means there's no load. A masked load with an allOnes mask means it's a normal vector load. Differential Revision: http://reviews.llvm.org/D16691 llvm-svn: 259369
* add helper function for minnum/maxnum ; NFCSanjay Patel2016-01-311-74/+80
| | | | llvm-svn: 259326
* function names start with a lower case letter ; NFCSanjay Patel2016-01-291-25/+25
| | | | llvm-svn: 259264
* fix formatting; NFCSanjay Patel2016-01-291-4/+8
| | | | llvm-svn: 259262
* less indenting; NFCISanjay Patel2016-01-281-107/+109
| | | | llvm-svn: 259002
* AMDGPU: Rename intrinsics to use amdgcn prefixMatt Arsenault2016-01-221-1/+1
| | | | | | | | | | | The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
* don't repeat function names in comments; NFCSanjay Patel2016-01-201-19/+14
| | | | llvm-svn: 258360
* 80-cols; NFCSanjay Patel2016-01-201-2/+2
| | | | llvm-svn: 258323
* remove outdated comment; NFCSanjay Patel2016-01-191-4/+0
| | | | llvm-svn: 258147
* GlobalValue: use getValueType() instead of getType()->getPointerElementType().Manuel Jacob2016-01-161-2/+1
| | | | | | | | | | | | Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999
* [Statepoints] Refactor GCRelocateOperands into an intrinsic wrapper. NFC.Manuel Jacob2016-01-051-2/+1
| | | | | | | | | | | | | | | Summary: This commit renames GCRelocateOperands to GCRelocateInst and makes it an intrinsic wrapper, similar to e.g. MemCpyInst. Also, all users of GCRelocateOperands were changed to use the new intrinsic wrapper instead. Reviewers: sanjoy, reames Subscribers: reames, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15762 llvm-svn: 256811
* getParent() ^ 3 == getModule() ; NFCISanjay Patel2015-12-141-3/+3
| | | | llvm-svn: 255511
* [AttributeSet] Overload AttributeSet::addAttribute to reduce compileAkira Hatanaka2015-12-021-7/+14
| | | | | | | | | | | | | | | | | | | | time. The new overloaded function is used when an attribute is added to a large number of slots of an AttributeSet (for example, to function parameters). This is much faster than calling AttributeSet::addAttribute once per slot, because AttributeSet::getImpl (which calls FoldingSet::FIndNodeOrInsertPos) is called only once per function instead of once per slot. With this commit, clang compiles a file which used to take over 22 minutes in just 13 seconds. rdar://problem/23581000 Differential Revision: http://reviews.llvm.org/D15085 llvm-svn: 254491
* [OperandBundles] Extract duplicated code into a helper function, NFCSanjoy Das2015-11-251-5/+1
| | | | llvm-svn: 254047
* [InstCombine] Don't drop operand bundlesSanjoy Das2015-11-251-3/+10
| | | | | | | | | | Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14857 llvm-svn: 254046
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-191-16/+13
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-181-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* [InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> xJames Molloy2015-11-121-0/+10
| | | | | | There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review. llvm-svn: 252879
* [InstCombine] SSE4A constant folding and conversion to shuffles.Simon Pilgrim2015-10-171-53/+270
| | | | | | | | | | | | | This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics: 1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant 2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful) 3 - Constant folding support 4 - Add zeroinitializer handling Differential Revision: http://reviews.llvm.org/D13348 llvm-svn: 250609
* InstCombine: Remove ilist iterator implicit conversions, NFCDuncan P. N. Exon Smith2015-10-131-7/+7
| | | | | | | Stop relying on implicit conversions of ilist iterators in LLVMInstCombine. No functionality change intended. llvm-svn: 250183
* [InstCombine][SSE4A] Remove broken INSERTQI range combining optimizationSimon Pilgrim2015-10-131-45/+4
| | | | | | | | As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index. The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization. llvm-svn: 250160
* [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IRSimon Pilgrim2015-10-111-0/+53
| | | | | | We now have lowering support for XOP PCOM/PCOMU instructions. llvm-svn: 249977
* [InstCombine] Remove trivially empty lifetime start/end ranges.Arnaud A. de Grandmaison2015-10-011-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char *, char *); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018
* [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵Andrea Di Biagio2015-09-301-0/+41
| | | | | | | | | | | | | | | | shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
* [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IRSimon Pilgrim2015-09-231-6/+0
| | | | | | | | | | This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368
* [InstCombine] Use SimplifyDemandedVectorEltsLow helper function. NFCI.Simon Pilgrim2015-09-191-17/+8
| | | | | | Use the SimplifyDemandedVectorEltsLow helper function introduced in D12680. llvm-svn: 248089
* [InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ ↵Simon Pilgrim2015-09-171-0/+73
| | | | | | | | | | instructions The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results. Differential Revision: http://reviews.llvm.org/D12680 llvm-svn: 247934
* [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing ↵Chen Li2015-09-141-2/+2
| | | | | | | | | | | | | | arguments at callsite Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12779 llvm-svn: 247587
* Fixed unused variable warning.Simon Pilgrim2015-09-121-2/+2
| | | | llvm-svn: 247505
* [InstCombine] CVTPH2PS Vector Demanded Elements + Constant FoldingSimon Pilgrim2015-09-121-0/+47
| | | | | | | | | | | | Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion): <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion. Added constant folding support. Differential Revision: http://reviews.llvm.org/D12731 llvm-svn: 247504
* Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of ↵Mehdi Amini2015-09-111-1/+1
| | | | | | | | | | | | | | | passing arguments at callsite" This reverts commit r247356. Breaks test/Transforms/InstCombine/pr8547.ll with: Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1) %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0 LLVM ERROR: Broken function found, compilation aborted! From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247371
* [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing ↵Chen Li2015-09-101-1/+1
| | | | | | | | | | | | | | arguments at callsite Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12779 llvm-svn: 247356
* [InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate ↵Chen Li2015-09-101-1/+1
| | | | | | | | | | | | | | return value Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D12772 llvm-svn: 247353
* [InstCombine] SSE/AVX vector shifts demanded shift amount bitsSimon Pilgrim2015-08-131-27/+84
| | | | | | | | | | Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this. I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift. Differential Revision: http://reviews.llvm.org/D11938 llvm-svn: 244872
* unused variable warning fix.Simon Pilgrim2015-08-121-1/+1
| | | | llvm-svn: 244725
* [InstCombine] Move SSE/AVX vector blend folding to instcombinerSimon Pilgrim2015-08-121-4/+15
| | | | | | | | | | | | As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723
* [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombinerSimon Pilgrim2015-08-101-7/+31
| | | | | | | | As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495
* [InstCombine] Fix SSE2/AVX2 vector logical shift by constantSimon Pilgrim2015-08-071-16/+39
| | | | | | | | | | | | | | | | This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes. e.g. %1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>) In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result. In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). Differential Revision: http://reviews.llvm.org/D11760 llvm-svn: 244341
* Fixed line endings.Simon Pilgrim2015-08-051-72/+72
| | | | llvm-svn: 244021
* [InstCombine] Moved SSE vector shift constant folding into its own helper ↵Simon Pilgrim2015-08-041-80/+72
| | | | | | | | function. NFCI. This will make some upcoming bugfixes + improvements easier to manage. llvm-svn: 243962
* fix formatting; NFCSanjay Patel2015-07-281-2/+1
| | | | llvm-svn: 243424
* Fixed signed/unsigned comparison warning.Simon Pilgrim2015-07-271-1/+1
| | | | llvm-svn: 243306
* [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IRSimon Pilgrim2015-07-271-14/+45
| | | | | | | | Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code. Differential Revision: http://reviews.llvm.org/D11503 llvm-svn: 243303
* [InstCombine][SSE4A] Standardized references to Length/Width and Index/Start ↵Simon Pilgrim2015-07-251-34/+31
| | | | | | to match AMD docs. NFCI. llvm-svn: 243226
* Reapply 239795 - [InstCombine] Propagate non-null facts to call parametersPhilip Reames2015-06-161-0/+18
| | | | | | | | | | | The original change broke clang side tests. I will be submitting those momentarily. This change includes post commit feedback on the original change from from Pete Cooper. Original Submission comments: If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239849
* Revert 239795Philip Reames2015-06-161-18/+0
| | | | | | I forgot to update some clang test cases. I'll fix and resubmit tomorrow. llvm-svn: 239800
* [InstCombine] Propagate non-null facts to call parametersPhilip Reames2015-06-161-0/+18
| | | | | | | | If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239795
* [InstSimplify] Handle some overflow intrinsics in InstSimplifyDavid Majnemer2015-05-221-0/+6
| | | | | | | | | This change does a few things: - Move some InstCombine transforms to InstSimplify - Run SimplifyCall from within InstCombine::visitCallInst - Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0. llvm-svn: 237995
* [RewriteStatepointsForGC] Fix a bug on creating gc_relocate for pointer to ↵Sanjoy Das2015-05-111-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector of pointers Summary: In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for each relocated pointer, and the gc_relocate has the same type with the pointer. During the creation of gc_relocate intrinsic, llvm requires to mangle its type. However, llvm does not support mangling of all possible types. RewriteStatepointsForGC will hit an assertion failure when it tries to create a gc_relocate for pointer to vector of pointers because mangling for vector of pointers is not supported. This patch changes the way RewriteStatepointsForGC pass creates gc_relocate. For each relocated pointer, we erase the type of pointers and create an unified gc_relocate of type i8 addrspace(1)*. Then a bitcast is inserted to convert the gc_relocate to the correct type. In this way, gc_relocate does not need to deal with different types of pointers and the unsupported type mangling is no longer a problem. This change would also ease further merge when LLVM erases types of pointers and introduces an unified pointer type. Some minor changes are also introduced to gc_relocate related part in InstCombineCalls, CodeGenPrepare, and Verifier accordingly. Patch by Chen Li! Reviewers: reames, AndyAyers, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9592 llvm-svn: 237009
* [InstCombine/PowerPC] Fix single-precision QPX load/store replacementHal Finkel2015-05-111-5/+10
| | | | | | | | | | The QPX single-precision load/store intrinsics have implied truncation/extension from/to the declared value type of <4 x double> to the memory type of <4 x float>. When we can prove the alignment of the pointer argument, and thus replace the intrinsic with a regular load or store, we need to load or store the correct data type (<4 x float>) instead of (<4 x double>). llvm-svn: 236973
OpenPOWER on IntegriCloud