summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"Adam Nemet2016-04-141-95/+20
| | | | | | | | This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-04-121-20/+95
| | | | | | | | | | | | | | This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086
* [DebugInfo/Test] Add CU as required.Davide Italiano2016-04-111-0/+2
| | | | llvm-svn: 265999
* Loop vectorization with uniform loadElena Demikhovsky2016-04-101-0/+47
| | | | | | | | | Vectorization cost of uniform load wasn't correctly calculated. As a result, a simple loop that loads a uniform value wasn't vectorized. Differential Revision: http://reviews.llvm.org/D18940 llvm-svn: 265901
* [LoopVectorize] Register cloned assumptionsDavid Majnemer2016-04-081-0/+32
| | | | | | | | | | | | InstCombine cannot effectively remove redundant assumptions without them registered in the assumption cache. The vectorizer can create identical assumptions but doesn't register them with the cache, resulting in slower compile times because InstCombine tries to reason about a lot more assumptions. Fix this by registering the cloned assumptions. llvm-svn: 265800
* Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA ↵Silviu Baranga2016-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786
* Revert r265535 until we know how we can fix the bots Silviu Baranga2016-04-061-2/+1
| | | | llvm-svn: 265541
* [SCEV] Introduce a guarded backedge taken count and use it in LAA and LVSilviu Baranga2016-04-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535
* [SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnanDavid Majnemer2016-04-061-1/+25
| | | | | | | | | | | | | | To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0 (which allows for better optimization, because there is no need to worry about errno being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt." This means that it's unsafe to replace sqrt with llvm.sqrt unless the call is annotated with nnan. Thanks to Hal Finkel for pointing this out! llvm-svn: 265521
* [SLPVectorizer] Vectorize libcalls of sqrtDavid Majnemer2016-04-061-25/+1
| | | | | | | We didn't realize that we could transform the libcall into a vectorized intrinsic. llvm-svn: 265493
* [DebugInfo] Fix tests so that each subprogram belongs to a CU.Davide Italiano2016-04-051-0/+6
| | | | llvm-svn: 265490
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-012-2/+2
| | | | llvm-svn: 265081
* [LoopVectorize] Don't vectorize loops when everything will be scalarizedHal Finkel2016-03-301-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
* Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"Matthias Braun2016-03-221-95/+20
| | | | | | | | | | | This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088
* Support arbitrary addrspace pointers in masked load/store intrinsicsArtur Pilipenko2016-03-101-20/+95
| | | | | | | | | | | | This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-2/+2
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* add a test RUN to show unexpected behaviorSanjay Patel2016-03-091-7/+10
| | | | llvm-svn: 263037
* Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage ↵Hans Wennborg2016-02-192-72/+1
| | | | | | | | calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368
* Create masked gather and scatter intrinsics in Loop Vectorizer.Elena Demikhovsky2016-02-172-1/+238
| | | | | | | | | Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140
* AVX1 : Enable vector masked_load/store to AVX1.Igor Breger2016-01-251-30/+28
| | | | | | | | Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-152-1/+72
| | | | | | | | | | | | | | | | | | | | | | ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691
* Revert r255460, which still causes test failures on some platforms.Cong Hou2015-12-131-1/+1
| | | | | | Further investigation on the failures is ongoing. llvm-svn: 255463
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460
* Revert r255454 as it leads to several test failers on buildbots.Cong Hou2015-12-131-1/+1
| | | | llvm-svn: 255456
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-131-1/+1
| | | | | | | | | | | | | | | | | ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454
* Pointers in Masked Load, Store, Gather, Scatter intrinsicsElena Demikhovsky2015-11-191-0/+142
| | | | | | | | | | The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544
* DI: Reverse direction of subprogram -> function edge.Peter Collingbourne2015-11-054-16/+16
| | | | | | | | | | | | | | | | | | | | | | | Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
* LoopVectorizer - skip 'bitcast' between GEP and load.Elena Demikhovsky2015-11-031-86/+85
| | | | | | | | | | | | Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907
* Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵Cong Hou2015-11-022-4/+50
| | | | | | | | | | | | larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 llvm-svn: 251850
* Revert the revision 251592 as it fails a test on some platforms.Cong Hou2015-10-292-50/+4
| | | | llvm-svn: 251617
* Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵Cong Hou2015-10-292-4/+50
| | | | | | | | larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. llvm-svn: 251592
* Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"NAKAMURA Takumi2015-10-271-85/+86
| | | | | | | It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. llvm-svn: 251436
* Loop Vectorizer - skipping "bitcast" before GEPElena Demikhovsky2015-10-261-86/+85
| | | | | | | | | | Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit. In some cases we have a "bitcast" between GEP and memory instruction. I added code that skips the "bitcast". http://reviews.llvm.org/D13886 llvm-svn: 251291
* [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatibleChandler Carruth2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are *within* a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167
* DI: Require subprogram definitions to be distinctDuncan P. N. Exon Smith2015-08-284-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' | grep -v test/Bitcode | xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
* Improve vectorization diagnostic messages and extend vectorize(enable) pragma.Tyler Nowicki2015-08-271-1/+1
| | | | | | | | | | | | | | | | This patch changes the analysis diagnostics produced when loops with floating-point recurrences or memory operations are identified. The new messages say "cannot prove it is safe to reorder * operations; allow reordering by specifying #pragma clang loop vectorize(enable)". Depending on the type of diagnostic the message will include additional options such as ffast-math or __restrict__. This patch also allows the vectorize(enable) pragma to override the low pointer memory check threshold. When the hint is given a higher threshold is used. See the clang patch for the options produced for each diagnostic. llvm-svn: 246187
* Cleanup test whitespace or lack thereof. NFC.Chad Rosier2015-08-141-4/+4
| | | | llvm-svn: 245065
* Make fp vectorization test X86 specified to avoid cost-model related ↵Tyler Nowicki2015-08-101-0/+104
| | | | | | problems on arm-thumb and hexagon. llvm-svn: 244505
* Modify diagnostic messages to clearly indicate the why interleaving wasn't done.Tyler Nowicki2015-08-103-3/+117
| | | | | | Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done. llvm-svn: 244485
* DI: Disallow uniquable DICompileUnitsDuncan P. N. Exon Smith2015-08-031-1/+1
| | | | | | | | | | | | | | | | | | Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test | grep -v test/Bitcode | xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885
* Roll forward r243250Jingyue Wu2015-07-261-3/+3
| | | | | | | | | r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build slaves, but I couldn't reproduce this failure locally. Probably a false positive as I saw this test was broken by r243246 or r243247 too but passed later without people fixing anything. llvm-svn: 243253
* Revert r243250Jingyue Wu2015-07-261-3/+3
| | | | | | breaks tests llvm-svn: 243251
* [TTI/CostModel] improve TTI::getGEPCost and use it in ↵Jingyue Wu2015-07-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CostModel::getInstructionCost Summary: This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider addressing modes. It now returns TCC_Free when the GEP can be completely folded to an addresing mode. I started this patch as I refactored SLSR. Function isGEPFoldable looks common and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost. Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best testing bed seems CostModel, but its getInstructionCost method invokes getAddressComputationCost for GEPs which provides very coarse estimation. So this patch also makes getInstructionCost call the updated getGEPCost for GEPs. This change inevitably breaks some tests because the cost model changes, but nothing looks seriously wrong -- if we believe the new cost model is the right way to go, these tests should be updated. This patch is not perfect yet -- the comments in some tests need to be updated. I want to know whether this is a right approach before fixing those details. Reviewers: chandlerc, hfinkel Subscribers: aschwaighofer, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D9819 llvm-svn: 243250
* Renamed some uses of unroll to interleave in the vectorizer.Tyler Nowicki2015-07-111-2/+2
| | | | llvm-svn: 241971
* Correct a typo for a LoopVectorize testDavid Majnemer2015-06-301-1/+1
| | | | | | I forgot to specify the correct pass. llvm-svn: 241054
* [LoopVectorize] Pointer indicies may be wider than the pointerDavid Majnemer2015-06-271-0/+20
| | | | | | | | | | | If we are dealing with a pointer induction variable, isInductionPHI gives back a step value of Stride / size of pointer. However, we might be indexing with a legal type wider than the pointer width. Handle this by inserting casts where appropriate instead of crashing. This fixes PR23954. llvm-svn: 240877
* NFC - Test case invokes llc on a file rather than redirected from a file.Nemanja Ivanovic2015-05-151-1/+1
| | | | | | | | This has caused some local failures. Updating the test case to be more like the majority of the similar test cases. Committing on behalf of Hubert Tong (hstong@ca.ibm.com). llvm-svn: 237449
* Populate list of vectorizable functions for Accelerate library.Michael Zolotukhin2015-05-071-0/+450
| | | | | | | | | | | | | | | | | | | Summary: This patch adds majority of supported by Accelerate library functions to the list of vectorizable functions. The full list of available vector functions could be found here: https://developer.apple.com/library/mac/documentation/Performance/Conceptual/vecLib/index.html Test Plan: Unit tests are added. Reviewers: hfinkel, aschwaighofer, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9543 llvm-svn: 236747
* [X86] Disable loop unrolling in loop vectorization pass when VF is 1.Wei Mi2015-05-061-1/+3
| | | | | | | | | | | | | The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613
* IR: Give 'DI' prefix to debug info metadataDuncan P. N. Exon Smith2015-04-292-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Finish off PR23080 by renaming the debug info IR constructs from `MD*` to `DI*`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the *previous* commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to *this* commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120
OpenPOWER on IntegriCloud