| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16.
llvm-svn: 185174
|
|
|
|
| |
llvm-svn: 185173
|
|
|
|
|
|
| |
vector parameter loads
llvm-svn: 185172
|
|
|
|
|
|
| |
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"
llvm-svn: 184579
|
|
|
|
|
|
| |
anymore and causes constants to be emitted in the global address space
llvm-svn: 183652
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.
The test for this commit is not breaking the existing tests.
llvm-svn: 182998
|
|
|
|
|
|
| |
ld.u1 instead of an ld.u8.
llvm-svn: 182924
|
|
|
|
| |
llvm-svn: 182394
|
|
|
|
|
|
| |
symbol name error in the output PTX.
llvm-svn: 182298
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
llvm-svn: 182254
|
|
|
|
|
|
| |
need to use .u8 for i1 parameters for kernels.
llvm-svn: 182253
|
|
|
|
| |
llvm-svn: 178417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
specific code paths.
This allows us to write code like:
if (__nvvm_reflect("FOO"))
// Do something
else
// Do something else
and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.
llvm-svn: 178416
|
|
|
|
| |
llvm-svn: 177847
|
|
|
|
|
|
|
|
| |
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
llvm-svn: 177465
|
|
|
|
|
|
|
|
|
|
|
| |
Vectors were being manually scalarized by the backend. Instead,
let the target-independent code do all of the work. The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.
llvm-svn: 174968
|
|
|
|
|
|
| |
is not valid in this case, and was causing incorrect optimizations.
llvm-svn: 174896
|
|
|
|
|
|
| |
Patch by Eric Holk
llvm-svn: 169418
|
|
|
|
|
|
|
| |
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
llvm-svn: 168883
|
|
|
|
|
|
|
|
| |
computing the legalization method for vectors
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.
llvm-svn: 168882
|
|
|
|
|
|
| |
final assembly
llvm-svn: 168198
|
|
|
|
|
|
|
|
|
| |
Loads from i1 become loads from i8 followed by trunc
Stores to i1 become zext to i8 followed by store to i8
Fixes PR13291
llvm-svn: 167948
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.
Available CPUs for this target:
sm_10 - Select the sm_10 processor.
sm_11 - Select the sm_11 processor.
sm_12 - Select the sm_12 processor.
sm_13 - Select the sm_13 processor.
sm_20 - Select the sm_20 processor.
sm_21 - Select the sm_21 processor.
sm_30 - Select the sm_30 processor.
sm_35 - Select the sm_35 processor.
Available features for this target:
ptx30 - Use PTX version 3.0.
ptx31 - Use PTX version 3.1.
sm_10 - Target SM 1.0.
sm_11 - Target SM 1.1.
sm_12 - Target SM 1.2.
sm_13 - Target SM 1.3.
sm_20 - Target SM 2.0.
sm_21 - Target SM 2.1.
sm_30 - Target SM 3.0.
sm_35 - Target SM 3.5.
llvm-svn: 167699
|
|
|
|
|
|
| |
Affects SM 2.0+. Fixes bug 13324.
llvm-svn: 167646
|
|
|
|
| |
llvm-svn: 157594
|
|
|
|
|
|
| |
NV_CONTRIB
llvm-svn: 157485
|
|
for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
|