summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] Remove i8 register class. PTX support for i8 (.b8, .u8, .s8) is ↵Justin Holewinski2013-06-286-44/+44
| | | | | | rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16. llvm-svn: 185174
* [NVPTX] Add support for vectorized function return valuesJustin Holewinski2013-06-281-0/+10
| | | | llvm-svn: 185173
* [NVPTX] Clean up handling of formal arguments and enable generation of ↵Justin Holewinski2013-06-281-4/+2
| | | | | | vector parameter loads llvm-svn: 185172
* [NVPTX] Add support for selecting CUDA vs OCL mode based on tripleJustin Holewinski2013-06-215-7/+11
| | | | | | IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl" llvm-svn: 184579
* [NVPTX] Remove old CONST_NOT_GEN address space that is not being used ↵Justin Holewinski2013-06-101-0/+10
| | | | | | anymore and causes constants to be emitted in the global address space llvm-svn: 183652
* [NVPTX] Re-enable support for virtual registers in the final outputJustin Holewinski2013-05-312-35/+35
| | | | | | | | | | | | Now that 3.3 is branched, we are re-enabling virtual registers to help iron out bugs before the next release. Some of the post-RA passes do not play well with virtual registers, so we disable them for now. The needed functionality of the PrologEpilogInserter pass is copied to a new backend-specific NVPTXPrologEpilog pass. The test for this commit is not breaking the existing tests. llvm-svn: 182998
* [NVPTX] Fix case where a sext load of an i1 type may produce anJustin Holewinski2013-05-301-0/+14
| | | | | | ld.u1 instead of an ld.u8. llvm-svn: 182924
* [NVPTX] Add @llvm.nvvm.sqrt.f() intrinsicJustin Holewinski2013-05-211-0/+7
| | | | llvm-svn: 182394
* [NVPTX] Fix mis-use of CurrentFnSym in NVPTXAsmPrinter. This was causing a ↵Justin Holewinski2013-05-201-0/+37
| | | | | | symbol name error in the output PTX. llvm-svn: 182298
* [NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputsJustin Holewinski2013-05-201-0/+25
| | | | | | | | | | | | | | | This converter currently only handles global variables in address space 0. For these variables, they are promoted to address space 1 (global memory), and all uses are updated to point to the result of a cvta.global instruction on the new variable. The motivation for this is address space 0 global variables are illegal since we cannot declare variables in the generic address space. Instead, we place the variables in address space 1 and explicitly convert the pointer to address space 0. This is primarily intended to help new users who expect to be able to place global variables in the default address space. llvm-svn: 182254
* [NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we ↵Justin Holewinski2013-05-202-0/+37
| | | | | | need to use .u8 for i1 parameters for kernels. llvm-svn: 182253
* [NVPTX] Remove support for SM < 2.0. This was never fully supported anyway.Justin Holewinski2013-03-3016-170/+1
| | | | llvm-svn: 178417
* [NVPTX] Add NVVMReflect pass to allow compile-time selection ofJustin Holewinski2013-03-301-0/+34
| | | | | | | | | | | | | | | | specific code paths. This allows us to write code like: if (__nvvm_reflect("FOO")) // Do something else // Do something else and compile into a library, then give "FOO" a value at kernel compile-time so the check becomes a no-op. llvm-svn: 178416
* [NVPTX] Fix handling of vector argumentsJustin Holewinski2013-03-241-0/+27
| | | | llvm-svn: 177847
* Propagate DAG node ordering during type legalization and instruction selectionJustin Holewinski2013-03-203-8/+71
| | | | | | | | A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465
* [NVPTX] Disable vector registersJustin Holewinski2013-02-121-0/+66
| | | | | | | | | | | Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968
* [NVPTX] Remove NoCapture from address space conversion intrinsics. NoCapture ↵Justin Holewinski2013-02-111-0/+21
| | | | | | is not valid in this case, and was causing incorrect optimizations. llvm-svn: 174896
* [NVPTX] Fix crash with unnamed struct argumentsJustin Holewinski2012-12-051-0/+5
| | | | | | Patch by Eric Holk llvm-svn: 169418
* Teach the legalizer how to handle operands for VSELECT nodesJustin Holewinski2012-11-291-0/+16
| | | | | | | If we need to split the operand of a VSELECT, it must be the mask operand. We split the entire VSELECT operand with EXTRACT_SUBVECTOR. llvm-svn: 168883
* Allow targets to prefer TypeSplitVector over TypePromoteInteger when ↵Justin Holewinski2012-11-291-0/+19
| | | | | | | | computing the legalization method for vectors For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>. llvm-svn: 168882
* [NVPTX] Order global variables in def-use order before emiting them in the ↵Justin Holewinski2012-11-161-0/+20
| | | | | | final assembly llvm-svn: 168198
* [NVPTX] Implement custom lowering of loads/stores for i1Justin Holewinski2012-11-141-0/+26
| | | | | | | | | Loads from i1 become loads from i8 followed by trunc Stores to i1 become zext to i8 followed by store to i8 Fixes PR13291 llvm-svn: 167948
* [NVPTX] Add more precise PTX/SM target attributesJustin Holewinski2012-11-1210-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally, PTX 3.1 is added as the default PTX version to be out-of-the-box compatible with CUDA 5.0. Available CPUs for this target: sm_10 - Select the sm_10 processor. sm_11 - Select the sm_11 processor. sm_12 - Select the sm_12 processor. sm_13 - Select the sm_13 processor. sm_20 - Select the sm_20 processor. sm_21 - Select the sm_21 processor. sm_30 - Select the sm_30 processor. sm_35 - Select the sm_35 processor. Available features for this target: ptx30 - Use PTX version 3.0. ptx31 - Use PTX version 3.1. sm_10 - Target SM 1.0. sm_11 - Target SM 1.1. sm_12 - Target SM 1.2. sm_13 - Target SM 1.3. sm_20 - Target SM 2.0. sm_21 - Target SM 2.1. sm_30 - Target SM 3.0. sm_35 - Target SM 3.5. llvm-svn: 167699
* [NVPTX] Use ABI alignment for parameters when alignment is not specified.Justin Holewinski2012-11-091-0/+25
| | | | | | Affects SM 2.0+. Fixes bug 13324. llvm-svn: 167646
* Add llvm.fabs intrinsic.Peter Collingbourne2012-05-281-0/+21
| | | | llvm-svn: 157594
* [NVPTX] Add a new test case for the newly-enabled call handlingJustin Holewinski2012-05-251-0/+26
| | | | | | NV_CONTRIB llvm-svn: 157485
* This patch adds a new NVPTX back-end to LLVM which supports code generation ↵Justin Holewinski2012-05-0417-0/+1994
for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it. The new target machines are: nvptx (old ptx32) => 32-bit PTX nvptx64 (old ptx64) => 64-bit PTX The sources are based on the internal NVIDIA NVPTX back-end, and contain more functionality than the current PTX back-end currently provides. NV_CONTRIB llvm-svn: 156196
OpenPOWER on IntegriCloud