summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] Emits "generic()" depending on the original address spaceJingyue Wu2015-04-241-0/+2
| | | | | | | | | | | | | | | | | | Summary: Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()" on aggregates of addrspacecasts. Test Plan: addrspacecast-gvar.ll Reviewers: eliben, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9130 llvm-svn: 235689
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()*" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () * %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)') addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$") func_end = re.compile("(?:void.*|\)\s*)\*$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145
* Revert revisions r234755, r234759, r234760Jan Vesely2015-04-131-1/+3
| | | | | | | | | | | Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)" Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO" Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB" Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll on hexagon, nvptx, and r600. Revert while I investigate. llvm-svn: 234768
* LegalizeDAG: Try to use Overflow operations when expanding ADD/SUBJan Vesely2015-04-131-3/+1
| | | | | | | | | | v2: consider BooleanContents when processing overflow Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewers: resistor, jholewinsky (nvidia parts) Differential Revision: http://reviews.llvm.org/D6340 llvm-svn: 234755
* [NVPTX] Associate a minimum PTX version for each SM architectureJustin Holewinski2015-03-309-12/+13
| | | | | | | | | | When a new SM architecture is introduced, it is only supported by the current PTX version and later. Make sure we are using at least the minimum PTX version for the target architecture. This also removes support for PTX ISA < 3.2. llvm-svn: 233583
* [NVPTX] Add options for PTX 4.1/4.2 and SM 3.2/3.7/5.2/5.3Justin Holewinski2015-03-304-0/+24
| | | | llvm-svn: 233575
* Add support for __nvvm_reflect changes in libdevice in CUDA-7.0Artem Belevich2015-03-191-0/+31
| | | | | | | | | | | | | | | | | | Summary: CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect and that triggers an assertion in nvvm_reflect optimization pass. This change allows nvvm_reflect pass to deal with both old and new ways to pass an argument to __nvvm_reflect. Test Plan: ninja check-all Reviewers: eliben, echristo Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8399 llvm-svn: 232732
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-03-133-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184
* [NVPTXAsmPrinter] do not print .align on function headersJingyue Wu2015-03-121-0/+7
| | | | | | | | | | | | | | | | | | | Summary: PTX does not allow .align directives on function headers. Fixes PR21551. Test Plan: test/Codegen/NVPTX/function-align.ll Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: llvm-commits, eliben, jpienaar, jholewinski Differential Revision: http://reviews.llvm.org/D8274 llvm-svn: 232004
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-2728-108/+108
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-2711-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* [NVPTX] Emit .pragma "nounroll" for loops marked with nounrollJingyue Wu2015-02-011-0/+37
| | | | | | | | | | | | | | | | | | | | | | | Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703
* [NVPTX] Generate a more optimal sequence for select of i1Justin Holewinski2015-01-261-0/+14
| | | | | | | | | Instead of creating a pattern like "(p && a) || ((!p) && b)", just expand the i8 operands to i32 and perform the selp on them. Fixes PR22246 llvm-svn: 227123
* [NVPTX] Handle floating-point conversion patterns that are not explicitly ↵Justin Holewinski2015-01-261-0/+62
| | | | | | | | ordered or unordered Fixes PR22322 llvm-svn: 227117
* Check that the TLI callback enableAggressiveFMAFusion has the desired effect ↵Olivier Sallenave2015-01-142-0/+50
| | | | | | on FMA folding. llvm-svn: 225987
* [NVPTX] Fix bugs related to isSingleValueTypeJingyue Wu2014-12-172-0/+25
| | | | | | | | | | | | | | | | | | | | Summary: With isSingleValueType starting to treat vector types as single-value types, code that uses this interface needs to be updated. Test Plan: vector-global.ll nvcl-param-align.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6573 llvm-svn: 224440
* IR: Make metadata typeless in assemblyDuncan P. N. Exon Smith2014-12-1518-37/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257
* IR: Canonicalize metadata formatting, NFCDuncan P. N. Exon Smith2014-12-111-10/+3
| | | | | | | | | | | | Canonicalize formatting of metadata to make it easier to upgrade via scripts -- in particular, one line per metadata definition makes it more `sed`-able. This is preparation for changing the assembly syntax for metadata [1]. [1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248449.html llvm-svn: 224002
* [NVPTX] Do not emit .weak symbols for NVPTXJingyue Wu2014-12-011-1/+7
| | | | | | | | | | | | | | | | | | | Summary: ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the weak directive in MCAsmPrinter customizable, and disables emitting ".weak" symbols for NVPTX. Test Plan: weak-linkage.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6455 llvm-svn: 223077
* [NVPTX] Add NVPTXLowerStructArgs passJustin Holewinski2014-11-051-0/+24
| | | | | | | | | | | | | | | | | | | | | | | This works around the limitation that PTX does not allow .param space loads/stores with arbitrary pointers. If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then add the following instructions to the first basic block : %temp = alloca %struct.x, align 8 %tt1 = bitcast %struct.x * %d to i8 * %tt2 = llvm.nvvm.cvt.gen.to.param %tt2 %tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) * %tv = load %struct.x addrspace(101) * %tempd store %struct.x %tv, %struct.x * %temp, align 8 The above code allocates some space in the stack and copies the incoming struct from param space to local space. Then replace all occurences of %d by %temp. Fixes PR21465. llvm-svn: 221377
* [NVPTX] aligned byte-buffers for vector return typesJingyue Wu2014-10-251-0/+14
| | | | | | | | | | | | | | | | | | | Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607
* [MachineSink] Use the real post dominator treeJingyue Wu2014-10-151-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773
* Revert r216862 due to a performance regressionJingyue Wu2014-10-011-40/+0
| | | | | | Reported by Alexey Volkov in PR21115 llvm-svn: 218771
* [MachineSink] Use the real post dominator treeJingyue Wu2014-09-011-0/+40
| | | | | | | | | | | | | | | | | | | | | | | Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. Test Plan: Added a NVPTX codegen test to verify that our change is in effect. It also shows the unnecessary register pressure caused by over-sinking. Updated affected tests in AArch64 and X86. Reviewers: eliben, meheff, Jiangning Reviewed By: Jiangning Subscribers: jholewinski, aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4814 llvm-svn: 216862
* [NVPTX] Make the alignment an explicit argument to ldu/ldgJingyue Wu2014-08-293-21/+13
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of specifying the alignment as metadata which may be destroyed by transformation passes, make the alignment the second argument to ldu/ldg intrinsic calls. Test Plan: ldu-ldg.ll ldu-i8.ll ldu-reg-plus-offset.ll Reviewers: eliben, meheff, jholewinski Reviewed By: meheff, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D5093 llvm-svn: 216731
* [NVPTX] Add some extra tests for mul.wide to test non-power-of-two source typesJustin Holewinski2014-07-231-0/+22
| | | | llvm-svn: 213794
* [NVPTX] mul.wide generation works for any smaller integer source types, not ↵Justin Holewinski2014-07-231-0/+22
| | | | | | just the next smaller power of two llvm-svn: 213784
* [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵Justin Holewinski2014-07-231-9/+18
| | | | | | | | | | are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
* Add some tests for NVPTX lowering of cmpxchgEli Bendersky2014-07-211-0/+14
| | | | llvm-svn: 213586
* Add tests for atomic adds on floats.Eli Bendersky2014-07-181-0/+27
| | | | llvm-svn: 213406
* Use CHECK-LABEL where appropriate in this test.Eli Bendersky2014-07-181-18/+18
| | | | llvm-svn: 213398
* NVPTX: support fpext/fptrunc to and from f16.Tim Northover2014-07-181-0/+40
| | | | llvm-svn: 213377
* CodeGen: soften f16 type by default instead of marking legal.Tim Northover2014-07-181-0/+30
| | | | | | | | | | | | Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372
* NVPTX: support direct f16 <-> f64 conversions via intrinsics.Tim Northover2014-07-181-0/+45
| | | | | | | | Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. llvm-svn: 213356
* [NVPTX] Improve handling of FP fusionJustin Holewinski2014-07-175-5/+41
| | | | | | | | | We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287
* [NVPTX] Add missing .v4 qualifier on vector store instructionJustin Holewinski2014-07-171-0/+12
| | | | llvm-svn: 213276
* [NVPTX] Flag surface/texture query instructions with IsTexSurfQueryJustin Holewinski2014-07-171-0/+103
| | | | | | | Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268
* [NVPTX] Add more surface/texture intrinsics, including CUDA unified texture ↵Justin Holewinski2014-07-174-2/+143
| | | | | | | | | | | fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256
* [NVPTX] Honor alignment on vector loads/storesJustin Holewinski2014-07-161-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186
* [NVPTX] Rename registers %fl -> %fd and %rl -> %rdJustin Holewinski2014-07-1616-134/+134
| | | | | | This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168
* [NVPTX] Add reflect intrinsic (better than matching by function name)Justin Holewinski2014-06-271-0/+14
| | | | | | Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there. llvm-svn: 211948
* [NVPTX] Add 'b' asm constraintJustin Holewinski2014-06-271-0/+7
| | | | llvm-svn: 211946
* [NVPTX] Error out if initializer is given for variable in an address space ↵Justin Holewinski2014-06-271-0/+5
| | | | | | that does not support initialization llvm-svn: 211943
* [NVPTX] Add support for .managed variables for UVMJustin Holewinski2014-06-271-0/+11
| | | | llvm-svn: 211942
* [NVPTX] Emit .weak linkage for link_once, weak, available_externally, and ↵Justin Holewinski2014-06-271-0/+9
| | | | | | common linkage llvm-svn: 211941
* [NVPTX] Fix handling of ldg/ldu intrinsics.Justin Holewinski2014-06-273-5/+47
| | | | | | | | | | The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g. %val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0 !0 = metadata !{i32 4} llvm-svn: 211939
* [NVPTX] Clean up argument lowering code and properly handle alignment for ↵Justin Holewinski2014-06-271-0/+13
| | | | | | structs and vectors llvm-svn: 211938
* [NVPTX] Add support for [SHL,SRA,SRL]_PARTSJustin Holewinski2014-06-271-0/+38
| | | | llvm-svn: 211936
* [NVPTX] Implement fma and imad contraction as target DAGCombiner patternsJustin Holewinski2014-06-272-0/+46
| | | | | | This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer llvm-svn: 211935
* [NVPTX] Add support for efficient rotate instructions on SM 3.2+Justin Holewinski2014-06-271-0/+58
| | | | llvm-svn: 211934
OpenPOWER on IntegriCloud