summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/LoopVectorize
Commit message (Collapse)AuthorAgeFilesLines
...
* [LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.Hao Liu2015-06-082-20/+484
| | | | | | | | | | | | | | | | | | | | | | | | Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector. E.g. for (i = 0; i < N; i+=2) { a = A[i]; // load of even element b = A[i+1]; // load of odd element ... // operations on a, b, c, d A[i] = c; // store of even element A[i+1] = d; // store of odd element } The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like: %wide.vec = load <8 x i32>, <8 x i32>* %ptr %vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7> The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like: %interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> store <8 x i32> %interleaved.vec, <8 x i32>* %ptr This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'. llvm-svn: 239291
* [LoopVectorize] Don't crash on zero-sized types in isInductionPHIDavid Majnemer2015-06-051-0/+27
| | | | | | | | | isInductionPHI wants to calculate the stride based on the pointee size. However, this is not possible when the pointee is zero sized. This fixes PR23763. llvm-svn: 239143
* NFC - Test case invokes llc on a file rather than redirected from a file.Nemanja Ivanovic2015-05-151-1/+1
| | | | | | | | This has caused some local failures. Updating the test case to be more like the majority of the similar test cases. Committing on behalf of Hubert Tong (hstong@ca.ibm.com). llvm-svn: 237449
* Populate list of vectorizable functions for Accelerate library.Michael Zolotukhin2015-05-071-0/+450
| | | | | | | | | | | | | | | | | | | Summary: This patch adds majority of supported by Accelerate library functions to the list of vectorizable functions. The full list of available vector functions could be found here: https://developer.apple.com/library/mac/documentation/Performance/Conceptual/vecLib/index.html Test Plan: Unit tests are added. Reviewers: hfinkel, aschwaighofer, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9543 llvm-svn: 236747
* [X86] Disable loop unrolling in loop vectorization pass when VF is 1.Wei Mi2015-05-062-1/+40
| | | | | | | | | | | | | The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613
* IR: Give 'DI' prefix to debug info metadataDuncan P. N. Exon Smith2015-04-298-129/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Finish off PR23080 by renaming the debug info IR constructs from `MD*` to `DI*`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the *previous* commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to *this* commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120
* DebugInfo: Fix bad debug info for compile units and typesDuncan P. N. Exon Smith2015-03-272-2/+2
| | | | | | | | | | | | | | | | Fix debug info in these tests, which started failing with a WIP patch to verify compile units and types. The problems look like they were all caused by bitrot. They fell into these categories: - Using `!{i32 0}` instead of `!{}`. - Using `!{null}` instead of `!{}`. - Using `!MDExpression()` instead of `!{}`. - Using `!8` instead of `!{!8}`. - `file:` references that pointed at `MDCompileUnit`s instead of the same `MDFile` as the compile unit. - `file:` references that were numerically off-by-one or (off-by-ten). llvm-svn: 233415
* TLI: Add addVectorizableFunctionsFromVecLib.Michael Zolotukhin2015-03-171-0/+182
| | | | | | | | Also, add several entries to vectorizable functions table, and corresponding tests. The table isn't complete, it'll be populated later. Review: http://reviews.llvm.org/D8131 llvm-svn: 232531
* TTI: Honour cost model for estimating cost of vector-intrinsic and calls.Michael Zolotukhin2015-03-171-18/+10
| | | | | Review: http://reviews.llvm.org/D8096 llvm-svn: 232528
* Verifier: Check debug info intrinsic argumentsDuncan P. N. Exon Smith2015-03-152-8/+8
| | | | | | | | | | | | | | | | | | | Verify that debug info intrinsic arguments are valid. (These checks will not recurse through the full debug info graph, so they don't need to be cordoned of in `DebugInfoVerifier`.) With those checks in place, changing the `DbgIntrinsicInst` accessors to downcast to `MDLocalVariable` and `MDExpression` is natural (added isa specializations in `Metadata.h` to support this). Added tests to `test/Verifier` for the new -verify checks, and fixed the debug info in all the in-tree tests. If you have out-of-tree testcases that have started to fail to -verify, hopefully the verify checks are helpful. The most likely problem is that the expression argument is `!{}` (instead of `!MDExpression()`). llvm-svn: 232296
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-03-135-66/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184
* Introduce runtime unrolling disable matadata and use it to mark the scalar ↵Kevin Qin2015-03-092-2/+4
| | | | | | | | | | | loop from vectorization. Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases. llvm-svn: 231631
* Do not restrict interleaved unrolling to small loops, depending on the target.Olivier Sallenave2015-03-061-0/+73
| | | | llvm-svn: 231528
* DebugInfo: Move new hierarchy into placeDuncan P. N. Exon Smith2015-03-038-87/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. llvm-svn: 231082
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-27113-771/+771
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-27124-716/+716
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* [x86,sdag] Two interrelated changes to the x86 and sdag code.Chandler Carruth2015-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as *legal* so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the *hilarious* deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835
* [LoopAccesses] Modify test to also check symbolic strides with memchecksAdam Nemet2015-02-181-3/+9
| | | | | | | | | See the comment in the code. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229627
* Reformat test case to be easier to migrate to typeless pointers.David Blaikie2015-02-151-1/+4
| | | | llvm-svn: 229275
* Check interleaving without relying on debug output.Olivier Sallenave2015-02-131-3/+14
| | | | llvm-svn: 229027
* llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES ↵NAKAMURA Takumi2015-02-131-0/+1
| | | | | | +Asserts due to -debug. llvm-svn: 228989
* Change max interleave factor to 12 for POWER7 and POWER8.Olivier Sallenave2015-02-121-0/+35
| | | | llvm-svn: 228973
* Update test case.Benjamin Kramer2015-02-121-2/+2
| | | | llvm-svn: 228956
* Move the target specific test case arbitrary-induction-step.ll to ↵Hao Liu2015-01-301-0/+0
| | | | | | test/Transforms/LoopVectorize/AArch64 folder. llvm-svn: 227561
* [LoopVectorize] Induction variables: support arbitrary constant step.Hao Liu2015-01-303-4/+153
| | | | | | | | | | Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support arbitrary constant steps. Initial patch by Alexey Volkov. Some bug fixes are added in the following version. Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193 llvm-svn: 227557
* Fixed a bug in masked load/store in reversed loop.Elena Demikhovsky2015-01-221-0/+82
| | | | | | | | | Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791
* IR: Move MDLocation into placeDuncan P. N. Exon Smith2015-01-148-42/+42
| | | | | | | | | | | | | | | | | | | | This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048
* IR: Add 'distinct' MDNodes to bitcode and assemblyDuncan P. N. Exon Smith2015-01-082-4/+4
| | | | | | | | | | | | | | | | | | Propagate whether `MDNode`s are 'distinct' through the other types of IR (assembly and bitcode). This adds the `distinct` keyword to assembly. Currently, no one actually calls `MDNode::getDistinct()`, so these nodes only get created for: - self-references, which are never uniqued, and - nodes whose operands are replaced that hit a uniquing collision. The concept of distinct nodes is still not quite first-class, since distinct-ness doesn't yet survive across `MapMetadata()`. Part of PR22111. llvm-svn: 225474
* Fix broken test from r225159.Michael Kuperstein2015-01-051-1/+1
| | | | llvm-svn: 225164
* Fixed a bug in memory dependence checking module of loop vectorization. The ↵Jiangning Liu2015-01-051-0/+26
| | | | | | | | | | | | | | | | | | | | | following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159
* Masked Load and Store Intrinsics in loop vectorizer.Elena Demikhovsky2014-12-161-0/+420
| | | | | | | | | | The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334
* IR: Make metadata typeless in assemblyDuncan P. N. Exon Smith2014-12-1525-272/+272
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257
* PR21302. Vectorize only bottom-tested loops.Michael Zolotukhin2014-12-021-0/+31
| | | | | | rdar://problem/18886083 llvm-svn: 223171
* Apply loop-rotate to several vectorizer tests.Michael Zolotukhin2014-12-023-181/+151
| | | | | | | | Such loops shouldn't be vectorized due to the loops form. After applying loop-rotate (+simplifycfg) the tests again start to check what they are intended to check. llvm-svn: 223170
* Revert "Masked Vector Load and Store Intrinsics."Duncan P. N. Exon Smith2014-11-284-334/+0
| | | | | | | | | | | This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
* Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisonsMatt Arsenault2014-11-241-8/+8
| | | | llvm-svn: 222705
* Masked Vector Load and Store Intrinsics.Elena Demikhovsky2014-11-234-0/+334
| | | | | | | | | | | | | | Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
* Addition to r216371 (SLP and Loop Vectorization) and r218607 whereSuyog Sarda2014-11-111-0/+31
| | | | | | | | | | cost model for signed division by power of 2 was improved for AArch64. The revision r218607 missed test case for Loop Vectorization. Adding it in this revision. Differential Revision: http://reviews.llvm.org/D6181 llvm-svn: 221674
* LoopVectorize: Don't assume pointees are sizedDavid Majnemer2014-11-071-0/+24
| | | | | | | | | | A pointer's pointee might not be sized: the pointee could be a function. Report this as IK_NoInduction when calculating isInductionVariable. This fixes PR21508. llvm-svn: 221501
* Correctly update dom-tree after loop vectorizer.Michael Zolotukhin2014-10-311-0/+142
| | | | llvm-svn: 221009
* Add minnum / maxnum intrinsicsMatt Arsenault2014-10-211-0/+56
| | | | | | | | | | | | These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341
* [LoopVectorize] Ignore @llvm.assume for cost estimates and legalityHal Finkel2014-10-141-0/+100
| | | | | | | | | | | | | | A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741
* Revert "Revert "DI: Fold constant arguments into a single MDString""Duncan P. N. Exon Smith2014-10-038-79/+79
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010
* Revert "DI: Fold constant arguments into a single MDString"Duncan P. N. Exon Smith2014-10-028-79/+79
| | | | | | This reverts commit r218914 while I investigate some bots. llvm-svn: 218918
* DI: Fold constant arguments into a single MDStringDuncan P. N. Exon Smith2014-10-028-79/+79
| | | | | | | | | | | | | This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-012-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-012-12/+12
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-012-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵Sanjay Patel2014-09-1083-92/+92
| | | | | | | | | | | names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
* Add a convenience method to copy wrapping, exact, and fast-math flags (NFC).Sanjay Patel2014-09-011-0/+24
| | | | | | | | | | | | | | The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions. This patch adds a convenience method to make that operation easier because we need to do this in the loop vectorizer, SLP vectorizer, and possibly other places. Although this is a 'no functional change' patch, I've added a testcase to verify that the exact flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked in existing testcases. Differential Revision: http://reviews.llvm.org/D5138 llvm-svn: 216886
OpenPOWER on IntegriCloud