summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
* [NVPTX] Switch nvptx-use-infer-addrspace to true.Justin Lebar2016-08-192-3/+6
| | | | | | | | | | | | | | | Summary: This switches us to use a different, more powerful algorithm for address space inference. I've tested this locally and it seems to work great. Once we're more confident in it, we can remove the old pass altogether. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D23694 llvm-svn: 279317
* [NVPTX] Use untyped (.b) integer registers in PTX.Artem Belevich2016-08-121-0/+69
| | | | | | | | | | | | This bring LLVM-generated PTX closer to what nvcc generates and avoids triggering issues in ptxas. For instance, ptxas does not accept .s16 (or .u16) registers as operands for .fp16 instructions. Differential Revision: https://reviews.llvm.org/D23460 llvm-svn: 278568
* [NVPTX] remove unnecessary named metadata update that happens to break debug ↵Artem Belevich2016-08-021-0/+64
| | | | | | | | | | info. Also added test case to verify IR changes done by NVPTXGenericToNVVM pass. Differential Revision: https://reviews.llvm.org/D22837 llvm-svn: 277520
* Fix NVPTX/call-with-alloca-buffer.ll after r276777.Justin Lebar2016-07-261-4/+2
| | | | | | | r276777 makes InstSimplify stronger, letting it see through some unnecessary addrspace casts. llvm-svn: 276786
* [NVPTX] Enable the load-store vectorizer on nvptx.Justin Lebar2016-07-201-0/+17
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196
* [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.Artem Belevich2016-07-201-1/+1
| | | | | | | | After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189
* [NVPTX] deal with all aggregate return types.Artem Belevich2016-07-202-14/+43
| | | | | | | | Fixes a crash in llvm_unreachable when a function has array return type. Differential Revision: https://reviews.llvm.org/D22524 llvm-svn: 276154
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-07-202-8/+26
| | | | | | | | | | | | Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153
* [NVPTX] Make sure we adjust alignment at all call sitesArtem Belevich2016-07-181-0/+11
| | | | | | | .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920
* [NVPTX] Force minimum alignment of 4 for byval arguments of device-side ↵Artem Belevich2016-07-181-0/+8
| | | | | | | | | | | | | | | | functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893
* NVPTX: Remove the legacy ptx intrinsicsJustin Bogner2016-07-072-95/+95
| | | | | | | | | | | | - Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but not all of these registers were already accessible via the nvvm name. - Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0. There's a fair amount of code motion here, but it's all very mechanical. llvm-svn: 274769
* [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.Justin Lebar2016-07-063-0/+15
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D22068 llvm-svn: 274674
* NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0Justin Bogner2016-07-064-14/+14
| | | | | | | Everywhere where cuda.syncthreads or __syncthreads is used, use the properly namespaced nvvm.barrier0 instead. llvm-svn: 274664
* Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."Artem Belevich2016-06-292-26/+8
| | | | | | The change causes llvm crash in some unoptimized builds. llvm-svn: 274163
* Only emit extension for zeroext/signext arguments if type is < 32 bitsJustin Holewinski2016-06-271-0/+26
| | | | | | | | | | Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D21756 llvm-svn: 273922
* [NVPTX] Improve lowering of byval args of device functions.Artem Belevich2016-06-212-8/+26
| | | | | | | | | | | | | | | | | | | | | Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 llvm-svn: 273313
* [NVPTX] Add intrinsics for shfl instructions.Justin Lebar2016-06-091-0/+90
| | | | | | | | | | | | | | | Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298
* [NVPTX] Added NVVMIntrRange passArtem Belevich2016-05-261-13/+54
| | | | | | | | | | | | NVVMIntrRange adds !range metadata to calls of NVVM intrinsics that return values within known limited range. This allows LLVM to generate optimal code for indexing arrays based on tid/ctaid which is a frequently used pattern in CUDA code. Differential Revision: http://reviews.llvm.org/D20644 llvm-svn: 270872
* [NVPTX] Fix sign/zero-extending ldg/ldu instruction selectionJustin Holewinski2016-05-021-0/+34
| | | | | | | | | | | | | | | | | Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX ↵Justin Lebar2016-04-151-0/+24
| | | | | | target. llvm-svn: 266403
* [NVPTX] Handle ldg created from sign-/zero-extended loadJustin Holewinski2016-04-051-0/+57
| | | | | | | | | | Reviewers: jingyue Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D18053 llvm-svn: 265389
* [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.Justin Lebar2016-04-011-0/+13
| | | | | | | | | | | | | | | | | | | Summary: Previously the NVVMReflect pass would read its configuration from command-line flags or a static configuration given to the pass at instantiation time. This doesn't quite work for clang's use-case. It needs to pass a value for __CUDA_FTZ down on a per-module basis. We use a module flag for this, so the NVVMReflect pass needs to be updated to read said flag. Reviewers: tra, rnk Subscribers: cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18672 llvm-svn: 265090
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-011-1/+1
| | | | llvm-svn: 265081
* [NVPTX] Adds a new address space inference pass.Jingyue Wu2016-03-201-10/+69
| | | | | | | | | | | | | | | | | | | Summary: The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable to convert the address space of a pointer induction variable. This patch adds a new pass called NVPTXInferAddressSpaces that overcomes that limitation using a fixed-point data-flow analysis (see the file header comments for details). The new pass is experimental and not enabled by default. Users can turn it on by setting the -nvptx-use-infer-addrspace flag of llc. Reviewers: jholewinski, tra, jlebar Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17965 llvm-svn: 263916
* [NVPTX] Use different, convergent MIs for convergent calls.Justin Lebar2016-03-011-0/+27
| | | | | | | | | | | | | | | | | | | | | | | Summary: Calls sometimes need to be convergent. This is already handled at the LLVM IR level, but it also needs to be handled at the MI level. Ideally we'd propagate convergence from instructions, down through the selection DAG, and into MIs. But this is Hard, and would affect optimizations in the SDNs -- right now only SDNs with two operands have any flags at all. Instead, here's a much simpler hack: Add new opcodes for NVPTX for convergent calls, and generate these when lowering convergent LLVM calls. Reviewers: jholewinski Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17423 llvm-svn: 262373
* Don't tail-duplicate blocks that contain convergent instructions.Justin Lebar2016-02-221-0/+45
| | | | | | | | | | | | | | | Summary: Convergent instrs shouldn't be made control-dependent on other values, but this is basically the whole point of tail duplication. So just bail if we see a convergent instruction. Reviewers: iteratee Subscribers: jholewinski, jhen, hfinkel, tra, jingyue, llvm-commits Differential Revision: http://reviews.llvm.org/D17320 llvm-svn: 261540
* [NVPTX] Test that MachineSink won't sink across llvm.cuda.syncthreads.Justin Lebar2016-02-171-0/+23
| | | | | | | | | | | | | | | Summary: The syncthreads MI is modeled as mayread/maywrite -- convergence doesn't even come into play here. Nonetheless this property is highly implicit in the tablegen files, so a test seems appropriate. Reviewers: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D17319 llvm-svn: 261114
* [NVPTX] Annotate call machine instructions as calls.Justin Lebar2016-02-171-0/+23
| | | | | | | | | | | | | | | | | Summary: Otherwise we'll try to do unsafe optimizations on these MIs, such as sinking loads below calls. (I suspect that this is not the only bug in the NVPTX instruction tablegen files; I need to comb through them.) Reviewers: jholewinski, tra Subscribers: jingyue, jhen, llvm-commits Differential Revision: http://reviews.llvm.org/D17315 llvm-svn: 261113
* llvm/test/CodeGen/NVPTX/debug-file-loc.ll: Tweak expressions for dos path.NAKAMURA Takumi2016-02-111-2/+2
| | | | llvm-svn: 260623
* [NVPTX] emit .file directives for files referenced by subprograms.Artem Belevich2016-02-111-0/+44
| | | | | | | | .. so .loc directives referring to those files work correctly. Differential Revision: http://reviews.llvm.org/D17086 llvm-svn: 260557
* [NVPTX] Disable performance optimizations when OptLevel==NoneJingyue Wu2016-02-041-0/+12
| | | | | | | | | | Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
* [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.Justin Lebar2016-01-303-0/+23
| | | | | | | | | | | | Summary: Previously we'd just silently skip these. Reviewers: tra, jholewinski Subscribers: llvm-commits, jhen, echristo, Differential Revision: http://reviews.llvm.org/D16739 llvm-svn: 259279
* [CUDA] Die gracefully when trying to output an LLVM alias.Justin Lebar2016-01-231-0/+7
| | | | | | | | | | | | | | Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638
* [NVPTX] expand mul_lohi to mul_lo and mul_hiJingyue Wu2016-01-221-0/+24
| | | | | | | | | | | | Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536
* [NVPTX] Do not emit .hidden or .protected directives as they are not allowed ↵Artem Belevich2016-01-151-0/+16
| | | | | | by PTX. llvm-svn: 257961
* Have a single way for creating unique value names.Rafael Espindola2015-11-221-2/+2
| | | | | | | | | | | We had two code paths. One would create names like "foo.1" and the other names like "foo1". For globals it is important to use "foo.1" to help C++ name demangling. For locals there is no strong reason to go one way or the other so I kept the most common mangling (foo1). llvm-svn: 253804
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-191-8/+8
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-181-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* [NVPTX] Let NVPTX backend detect integer min and max patterns.Bjarke Hammersholt Roune2015-08-261-0/+307
| | | | | | | | | | | | | | Summary: Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support. Reviewers: jholewinski, meheff, jingyue Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D12377 llvm-svn: 246107
* [NVPTX] Allow undef value as global initializerJingyue Wu2015-08-221-0/+12
| | | | | | | | | | | | | | | | | | Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785
* Use 32-bit divides instead of 64-bit divides where possible.Mark Heffernan2015-08-111-0/+80
| | | | | | | | | For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason is that many index computations are carried out in 64-bits but never actually exceed the capacity of a 32-bit word. llvm-svn: 244684
* SelectionDAG: Prefer to combine multiplication with less uses for fmaJingyue Wu2015-08-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For example: s6 = s0*s5; s2 = s6*s6 + s6; ... s4 = s6*s3; We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)). This only happens when Aggressive is true, otherwise hasOneUse() check already prevents from folding the multiplication with more uses. Test Plan: test/CodeGen/NVPTX/fma-assoc.ll Patch by Xuetian Weng Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm Subscribers: arsenm, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11855 llvm-svn: 244649
* Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCIJonathan Roelofs2015-08-101-1/+1
| | | | | | | I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481
* [NVPTX] Use LDG for pointer induction variables.Bjarke Hammersholt Roune2015-08-051-1/+55
| | | | | | | | More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables. Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads. llvm-svn: 244166
* [NVPTX] allow register copy between float and intJingyue Wu2015-08-011-0/+224
| | | | | | | | | | | | | | | Summary: Fixes PR24303. With Bruno's WIP (D11197) on PeepholeOptimizer, across-class register copying (e.g. i32 to f32) becomes possible. Enhance NVPTXInstrInfo::copyPhysReg to handle these cases. Reviewers: jholewinski Subscribers: eliben, jholewinski, llvm-commits, bruno Differential Revision: http://reviews.llvm.org/D11622 llvm-svn: 243839
* [NVPTX] convert pointers in byval kernel arguments to globalJingyue Wu2015-07-311-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: For example, in struct S { int *x; int *y; }; __global__ void foo(S s) { int *b = s.y; // use b } "b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for accessing "b". Reviewers: jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11505 llvm-svn: 243790
* Roll forward r242871Jingyue Wu2015-07-291-0/+40
| | | | | | | r242871 missed one place that should be guarded with isPhysicalReg. This patch fixes that. llvm-svn: 243555
* Temporarily revert r242871Jingyue Wu2015-07-291-24/+0
| | | | | | PR24299 llvm-svn: 243522
* [BranchFolding] do not iterate the aliases of virtual registersJingyue Wu2015-07-221-0/+24
| | | | | | | | | | | | | | | | | Summary: MCRegAliasIterator only works for physical registers. So, do not run it on virtual registers. With this issue fixed, we can resurrect the BranchFolding pass in NVPTX backend. Reviewers: jholewinski, bkramer Subscribers: henryhu, meheff, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11174 llvm-svn: 242871
OpenPOWER on IntegriCloud