| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The BasicBlockManager is potentially broken and should not be used.
Replace all uses of the BasicBlockPass with a FunctionBlockPass+loop on
blocks.
Reviewers: chandlerc
Subscribers: jholewinski, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68234
llvm-svn: 373254
|
|
|
|
|
|
|
| |
Other files were not relying on these transitive includes, so I'm
submitting this change separately.
llvm-svn: 362403
|
|
|
|
|
|
|
| |
I also fixed all other files that were including NVPTX.h and were
relying on transitive includes.
llvm-svn: 362402
|
|
|
|
|
|
|
|
| |
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
libdevice in recent CUDA versions relies on __nvvm_reflect() to select
GPU-specific bitcode. This patch addresses the requirement.
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D50207
llvm-svn: 338908
|
|
|
|
| |
llvm-svn: 293579
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:
* An LLVM command-line option
* Parameters to the NVVMReflect constructor
* Metadata on the module itself.
This change removes the first two, leaving only the third.
The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information. Ideally we'd have a target-generic
piece of metadata on the module. This change moves us in that
direction.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28700
llvm-svn: 292068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only scalar half-precision operations are supported at the moment.
- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
(can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
fp16 operations are promoted to fp32 and results are converted back
to fp16 for storage.
Differential Revision: https://reviews.llvm.org/D28540
llvm-svn: 291956
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass. We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.
Reviewers: tra
Subscribers: llvm-commits, jholewinski, jingyue, mgorny
Differential Revision: https://reviews.llvm.org/D26165
llvm-svn: 285642
|
|
|
|
|
|
|
|
| |
This avoids "static initialization order fiasco"
Differential Revision: https://reviews.llvm.org/D25412
llvm-svn: 283702
|
|
|
|
|
|
|
|
| |
After r276153 the pass applies to both kernels and regular functions.
Differential Revision: https://reviews.llvm.org/D22583
llvm-svn: 276189
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVVMIntrRange adds !range metadata to calls of NVVM intrinsics
that return values within known limited range.
This allows LLVM to generate optimal code for indexing arrays
based on tid/ctaid which is a frequently used pattern in CUDA code.
Differential Revision: http://reviews.llvm.org/D20644
llvm-svn: 270872
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently it's a module pass. Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18615
llvm-svn: 264919
|
|
|
|
|
|
|
|
| |
the cpp file.
No functionality change intended.
llvm-svn: 264861
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).
The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.
Reviewers: jholewinski, tra, jlebar
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17965
llvm-svn: 263916
|
|
|
|
|
|
| |
I left helpers that look useful for debugging alone. NFC.
llvm-svn: 250410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch first change the register that holds local address for stack
frame to %SPL. Then the new NVPTXPeephole pass will try to scan the
following pattern
%vreg0<def> = LEA_ADDRi64 <fi#0>, 4
%vreg1<def> = cvta_to_local %vreg0
and transform it into
%vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4
Patched by Xuetian Weng
Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10549
llvm-svn: 240587
|
|
|
|
|
|
| |
Apparently, the style needs to be agreed upon first.
llvm-svn: 240390
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.
Patched by Xuetian Weng (xweng@google.com).
Test Plan: test/CodeGen/NVPTX/lower-alloca.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: meheff, broune, eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10483
llvm-svn: 239964
|
|
|
|
|
|
|
|
|
| |
NVPTXISelDAGToDAG translates "addrspacecast to param" to
NVPTX::nvvm_ptr_gen_to_param
Added an llc test in bug21465.
llvm-svn: 239100
|
|
|
|
|
|
| |
llc crashed for NVPTX backend
llvm-svn: 239094
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Test Plan: lower-kernel-ptr-arg.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: wengxt, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10154
llvm-svn: 239082
|
|
|
|
|
|
|
| |
their definitions, but forgot to clean up all the declarations which are
in different files.
llvm-svn: 227698
|
|
|
|
| |
llvm-svn: 223310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It currently only implements hasBranchDivergence, and will be extended
in later diffs.
Split from D6188.
Test Plan: make check-all
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D6195
llvm-svn: 221619
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.
If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :
%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8
The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.
Fixes PR21465.
llvm-svn: 221377
|
|
|
|
|
|
|
|
|
|
| |
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
|
|
|
|
|
|
| |
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).
llvm-svn: 205907
|
|
|
|
|
|
|
|
|
| |
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.
Patch by Jingyue Wu.
llvm-svn: 205571
|
|
|
|
|
|
|
|
| |
This is a more thorough fix for the issue than r203483. An IR pass will run
before NVPTX codegen to make sure there are no invalid symbol names that can't
be consumed by the ptxas assembler.
llvm-svn: 205212
|
|
|
|
| |
llvm-svn: 186844
|
|
|
|
|
|
|
|
| |
instructions from their patterns
Test case is no breakage
llvm-svn: 185175
|
|
|
|
|
|
| |
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"
llvm-svn: 184579
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.
The test for this commit is not breaking the existing tests.
llvm-svn: 182998
|
|
|
|
| |
llvm-svn: 182297
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
llvm-svn: 182254
|
|
|
|
|
|
|
| |
Hopefully this resolves any outstanding style issues and gives us
an automated way of ensuring we conform to the style guidelines.
llvm-svn: 178415
|
|
|
|
|
|
|
|
|
|
|
| |
Vectors were being manually scalarized by the backend. Instead,
let the target-independent code do all of the work. The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.
llvm-svn: 174968
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
|
|
|
|
|
|
|
|
|
|
| |
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
|
|
|
|
| |
llvm-svn: 158013
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default: assert(0 && "Unknown condition code");
^
1 warning generated.
The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.
llvm-svn: 156209
|
|
for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
|