summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Vectorize
Commit message (Collapse)AuthorAgeFilesLines
* SLPVectorizer: add a second limit for the number of alias checks.Erik Eckstein2015-01-221-21/+49
| | | | | | | | Even with the current limit on the number of alias checks, the containing loop has quadratic complexity. This begins to hurt for blocks containing > 1K load/store instructions. This commit introduces a limit for the loop count. It reduces the runtime for such very large blocks. llvm-svn: 226792
* Fixed a bug in masked load/store in reversed loop.Elena Demikhovsky2015-01-221-0/+2
| | | | | | | | | Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791
* Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain.Karthik Bhat2015-01-201-102/+211
| | | | | | | | | | This patch fixes 2 issues in reorderInputsAccordingToOpcode 1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases. 2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode. Thanks Michael for inputs and review. Review: http://reviews.llvm.org/D6677 llvm-svn: 226547
* SLPVectorizer: limit the number of alias checks to reduce the runtime.Erik Eckstein2015-01-191-1/+16
| | | | | | | | In case of blocks with many memory-accessing instructions, alias checking can take lot of time (because calculating the memory dependencies has quadratic complexity). I chose a limit which resulted in no changes when running the benchmarks. llvm-svn: 226439
* [PM] Now that LoopInfo isn't in the Pass type hierarchy, it is muchChandler Carruth2015-01-181-11/+11
| | | | | | | | | | | | cleaner to derive from the generic base. Thise removes a ton of boiler plate code and somewhat strange and pointless indirections. It also remove a bunch of the previously needed friend declarations. To fully remove these, I also lifted the verify logic into the generic LoopInfoBase, which seems good anyways -- it is generic and useful logic even for the machine side. llvm-svn: 226385
* [PM] Split the LoopInfo object apart from the legacy pass, creatingChandler Carruth2015-01-172-7/+7
| | | | | | | | | | a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373
* Replace size method call of containers to empty method where appropriateAlexander Kornienko2015-01-153-4/+4
| | | | | | | | | | | | | | | | This patch was generated by a clang tidy checker that is being open sourced. The documentation of that checker is the following: /// The emptiness of a container should be checked using the empty method /// instead of the size method. It is not guaranteed that size is a /// constant-time function, and it is generally more efficient and also shows /// clearer intent to use empty. Furthermore some containers may implement the /// empty method but not implement the size method. Using empty whenever /// possible makes it easier to switch to another container in the future. Patch by Gábor Horváth! llvm-svn: 226161
* [PM] Separate the TargetLibraryInfo object from the immutable pass.Chandler Carruth2015-01-152-2/+4
| | | | | | | | | | | | | | The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157
* Update libdeps since TLI was moved from Target to Analysis in r226078.NAKAMURA Takumi2015-01-151-1/+1
| | | | llvm-svn: 226126
* reapply: SLPVectorizer: Cache results from memory alias checking.Erik Eckstein2015-01-141-21/+71
| | | | | | | | | | This speeds up the dependency calculations for blocks with many load/store/call instructions. Beside the improved runtime, there is no functional change. Compared to the original commit, this re-applied commit contains a bug fix which ensures that there are no incorrect collisions in the alias cache. llvm-svn: 225977
* Fix a wrong comment in LoopVectorize.Hao Liu2015-01-141-5/+5
| | | | | | | | I.E. more than two -> exactly two Fix a typo function name in LoopVectorize. I.E. collectStrideAcccess() -> collectStrideAccess() llvm-svn: 225935
* Fix non-determinism issue in SLPJulien Lerouge2015-01-131-1/+1
| | | | | | | | | | | | | | | The issue was introduced in r214638: + for (auto &BSIter : BlocksSchedules) { + scheduleBlock(BSIter.second.get()); + } Because BlocksSchedules is a DenseMap with BasicBlock* keys, blocks are scheduled in non-deterministic order, resulting in unpredictable IR. Patch by Daniel Reynaud! llvm-svn: 225821
* Revert "SLPVectorizer: Cache results from memory alias checking."Erik Eckstein2015-01-131-49/+19
| | | | | | The alias cache has a problem of incorrect collisions in case a new instruction is allocated at the same address as a previously deleted instruction. llvm-svn: 225790
* SLPVectorizer: Cache results from memory alias checking.Erik Eckstein2015-01-131-19/+49
| | | | | | | This speeds up the dependency calculations for blocks with many load/store/call instructions. Beside the improved runtime, there is no functional change. llvm-svn: 225786
* Update comment.Michael Zolotukhin2015-01-091-2/+2
| | | | llvm-svn: 225553
* Remove duplicating code. NFC.Michael Zolotukhin2015-01-091-2/+2
| | | | | | The removed condition is checked in the previous loop. llvm-svn: 225542
* Assumption that "VectorizedValue" will always be an Instruction is not correct.Suyog Sarda2015-01-091-2/+1
| | | | | | | | | | | | | | | | | | | | It can be a constant or a vector argument. ex : define i32 @hadd(<4 x i32> %a) #0 { entry: %vecext = extractelement <4 x i32> %a, i32 0 %vecext1 = extractelement <4 x i32> %a, i32 1 %add = add i32 %vecext, %vecext1 %vecext2 = extractelement <4 x i32> %a, i32 2 %add3 = add i32 %add, %vecext2 %vecext4 = extractelement <4 x i32> %a, i32 3 %add5 = add i32 %add3, %vecext4 ret i32 %add5 } llvm-svn: 225517
* Fixed a bug in memory dependence checking module of loop vectorization. The ↵Jiangning Liu2015-01-051-48/+57
| | | | | | | | | | | | | | | | | | | | | following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159
* [PM] Split the AssumptionTracker immutable pass into two separate APIs:Chandler Carruth2015-01-042-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the *many* utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131
* Some code improvements in Masked Load/Store. Elena Demikhovsky2014-12-301-21/+10
| | | | | | No functional changes. llvm-svn: 224986
* Masked Load/Store - Changed the order of parameters in intrinsics.Elena Demikhovsky2014-12-251-15/+5
| | | | | | | No functional changes. The documentation is coming. llvm-svn: 224829
* [BBVectorize] Remove two more redundant assignments.Tilmann Scheller2014-12-191-2/+0
| | | | | | Found by the Clang static analyzer. llvm-svn: 224590
* [BBVectorize] Remove redundant assignment.Tilmann Scheller2014-12-191-1/+0
| | | | | | Found by the Clang static analyzer. llvm-svn: 224589
* [LoopVectorize] Remove redundant assignment.Tilmann Scheller2014-12-191-1/+0
| | | | | | Found by the Clang static analyzer. llvm-svn: 224587
* Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders ↵Suyog Sarda2014-12-171-24/+2
| | | | | | | | | | them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424
* Masked Load and Store Intrinsics in loop vectorizer.Elena Demikhovsky2014-12-161-21/+100
| | | | | | | | | | The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334
* Loop Vectorizer minor changes in the code - Elena Demikhovsky2014-12-141-3/+3
| | | | | | | | some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
* This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling ↵Suyog Sarda2014-12-121-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119
* Remove redundant variable.Michael Zolotukhin2014-12-091-4/+2
| | | | | | | Tested by adding assert(LoopVectorPreHeader == VecPreheader) on LLVM test suite and SPECs. llvm-svn: 223847
* IR: Split Metadata from ValueDuncan P. N. Exon Smith2014-12-091-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do *not* have a `Type`. - `MDNode`'s operands are all `Metadata *` (instead of `Value *`). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode*`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode *N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode *N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode *N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the *only* class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802
* LoopVectorize: Remove unnecessary RAUWDuncan P. N. Exon Smith2014-12-031-2/+0
| | | | | | | | | | Remove an unnecessary `MDNode::replaceAllUsesWith()`. In the preceding line, `TheLoop->setLoopID()` visits all backedges and sets the new loop ID. This sufficiently updates the loop metadata. Metadata RAUW is going away as part of PR21532. llvm-svn: 223210
* PR21302. Vectorize only bottom-tested loops.Michael Zolotukhin2014-12-021-0/+9
| | | | | | rdar://problem/18886083 llvm-svn: 223171
* Revert "Masked Vector Load and Store Intrinsics."Duncan P. N. Exon Smith2014-11-281-83/+15
| | | | | | | | | | | This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
* Masked Vector Load and Store Intrinsics.Elena Demikhovsky2014-11-231-15/+83
| | | | | | | | | | | | | | Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
* Vectorize a reduction chain feeding into a 'return' statement.Suyog Sarda2014-11-191-0/+15
| | | | | | | | | e.x return (a[0]+b[0]) + (a[1]+b[1]) Differential Revision: http://reviews.llvm.org/D6227 llvm-svn: 222364
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-192-6/+7
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* IR: Make MDString::getName() privateDuncan P. N. Exon Smith2014-11-131-1/+1
| | | | | | | | | | Hide the fact that `MDString`'s string is stored in `Value::Name` -- that's going to change soon. Update the only in-tree client that was using it instead of `Value::getString()`. Part of PR21532. llvm-svn: 221951
* Revert "IR: MDNode => Value"Duncan P. N. Exon Smith2014-11-112-4/+4
| | | | | | | | | | | | | | | | | Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711
* LoopVectorize: Don't assume pointees are sizedDavid Majnemer2014-11-071-1/+7
| | | | | | | | | | A pointer's pointee might not be sized: the pointee could be a function. Report this as IK_NoInduction when calculating isInductionVariable. This fixes PR21508. llvm-svn: 221501
* IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc()Duncan P. N. Exon Smith2014-11-032-3/+3
| | | | | | | Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of `MDNode` to one of `Value`. Part of PR21433. llvm-svn: 221167
* IR: MDNode => Value: Instruction::getMetadata()Duncan P. N. Exon Smith2014-11-011-1/+1
| | | | | | | | | | Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024
* Correctly update dom-tree after loop vectorizer.Michael Zolotukhin2014-10-311-1/+1
| | | | llvm-svn: 221009
* Reformat partially, where I touched for whitespace changes.NAKAMURA Takumi2014-10-281-2/+5
| | | | llvm-svn: 220773
* Untabify and whitespace cleanups.NAKAMURA Takumi2014-10-281-5/+5
| | | | llvm-svn: 220771
* LoopVectorize: Simplify code. No functionality change.Benjamin Kramer2014-10-221-19/+7
| | | | llvm-svn: 220405
* Add minnum / maxnum intrinsicsMatt Arsenault2014-10-211-0/+2
| | | | | | | | | | | | These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341
* [SLPVectorize] Basic ephemeral-value awarenessHal Finkel2014-10-151-3/+30
| | | | | | | | | | | | | | The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816
* No need to cache this unused variable.Eric Christopher2014-10-141-3/+1
| | | | | | Patch by Ehsan Akhgari. llvm-svn: 219749
* [LoopVectorize] Ignore @llvm.assume for cost estimates and legalityHal Finkel2014-10-141-3/+32
| | | | | | | | | | | | | | A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741
* [SCEV] Fix one more caller blindly passing the latch to SCEV'sChandler Carruth2014-10-111-2/+1
| | | | | | | | | | | getSmallConstantTripCount even when it isn't the exiting block. I missed this in my first audit, very sorry. This was found in LNT and elsewhere. I don't have a test case, but it was completely obvious from inspection that this was the problem. I'll see if I can reduce a test case, but I'm not really hopeful, and the value seems quite low. llvm-svn: 219562
OpenPOWER on IntegriCloud