| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 281493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic. This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.
Reviewers: tra
Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc
Differential Revision: https://reviews.llvm.org/D23477
llvm-svn: 281152
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously these only worked via NVPTX-specific intrinsics.
This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.
It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.
Reviewers: majnemer
Subscribers: llvm-commits, jholewinski, tra
Differential Revision: https://reviews.llvm.org/D24300
llvm-svn: 281092
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279602
|
|
|
|
|
|
|
| |
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.
llvm-svn: 279580
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279564
|
|
|
|
|
|
|
|
|
|
| |
MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.
This reverts commit r279502.
llvm-svn: 279503
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279502
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This switches us to use a different, more powerful algorithm for address
space inference. I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.
Reviewers: jingyue
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: https://reviews.llvm.org/D23694
llvm-svn: 279317
|
|
|
|
|
|
|
|
|
|
| |
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
|
|
|
|
|
|
| |
Follow up to r278902. I had missed "fall through", with a space.
llvm-svn: 278970
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.
For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.
Differential Revision: https://reviews.llvm.org/D23460
llvm-svn: 278568
|
|
|
|
|
|
|
|
|
| |
If the result of the find is only used to compare against end(), just
use is_contained instead.
No functionality change is intended.
llvm-svn: 278433
|
|
|
|
|
|
|
|
|
|
| |
info.
Also added test case to verify IR changes done by NVPTXGenericToNVVM pass.
Differential Revision: https://reviews.llvm.org/D22837
llvm-svn: 277520
|
|
|
|
|
|
|
|
|
|
|
| |
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
|
|
|
|
|
|
|
| |
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.
llvm-svn: 277017
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: jholewinski, arsenm, asbirlea
Differential Revision: https://reviews.llvm.org/D22592
llvm-svn: 276196
|
|
|
|
|
|
|
|
| |
After r276153 the pass applies to both kernels and regular functions.
Differential Revision: https://reviews.llvm.org/D22583
llvm-svn: 276189
|
|
|
|
|
|
|
|
| |
Fixes a crash in llvm_unreachable when a function has array return type.
Differential Revision: https://reviews.llvm.org/D22524
llvm-svn: 276154
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.
Differential Review: https://reviews.llvm.org/D21421
llvm-svn: 276153
|
|
|
|
|
|
|
| |
.. including calls from kernel functions that were
ignored by mistake before.
llvm-svn: 275920
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.
The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.
Differential Revision: https://reviews.llvm.org/D22428
llvm-svn: 275893
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.
Reviewers: tstellarAMD, mcrosier
Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai
Differential Revision: https://reviews.llvm.org/D22409
llvm-svn: 275564
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the NVPTX backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in
NVPTXInstrInfo::AnalyzeBranch, where a condition checked a pointer
converted from an iterator for nullptr. Since this case is impossible
(moreover, the code above guarantees that the iterator is valid), I
removed the check when I changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 274931
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
not all of these registers were already accessible via the nvvm
name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.
There's a fair amount of code motion here, but it's all very
mechanical.
llvm-svn: 274769
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D22068
llvm-svn: 274674
|
|
|
|
|
|
|
| |
Everywhere where cuda.syncthreads or __syncthreads is used, use the
properly namespaced nvvm.barrier0 instead.
llvm-svn: 274664
|
|
|
|
|
|
|
| |
The intrinsics here use nvvm, but the builtins and tablegen variable
names were using ptx. Stick to the modern names here.
llvm-svn: 274662
|
|
|
|
|
|
|
|
| |
where possible.
No functionality change intended.
llvm-svn: 274431
|
|
|
|
|
|
|
| |
MC doesn't really care about CodeGen stuff, so this was just
complicating target initialization.
llvm-svn: 274258
|
|
|
|
|
|
| |
The change causes llvm crash in some unoptimized builds.
llvm-svn: 274163
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: jingyue, jlebar
Subscribers: jholewinski
Differential Revision: http://reviews.llvm.org/D21756
llvm-svn: 273922
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.
Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.
This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).
This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.
Differential Review: http://reviews.llvm.org/D21421
llvm-svn: 273313
|
|
|
|
|
|
|
|
| |
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D21160
llvm-svn: 272298
|
|
|
|
|
|
|
| |
Avoids unnecessary copies. All changes audited & pass tests with asan.
No functional change intended.
llvm-svn: 272190
|
|
|
|
|
|
|
|
| |
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
|
|
|
|
|
|
| |
No functionality change intended, maybe a tiny performance improvement.
llvm-svn: 270997
|
|
|
|
|
|
|
| |
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.
llvm-svn: 270988
|
|
|
|
|
|
|
| |
Fixes build error on windows where MSVC does not
support list initialization inside member initializer list.
llvm-svn: 270877
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVVMIntrRange adds !range metadata to calls of NVVM intrinsics
that return values within known limited range.
This allows LLVM to generate optimal code for indexing arrays
based on tid/ctaid which is a frequently used pattern in CUDA code.
Differential Revision: http://reviews.llvm.org/D20644
llvm-svn: 270872
|
|
|
|
|
|
|
|
|
|
|
|
| |
analyses.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D20585
llvm-svn: 270790
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
|
|
|
|
|
|
|
|
|
|
| |
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
Part of llvm.org/pr26808.
llvm-svn: 269483
|
|
|
|
|
|
| |
This is similar to how getName is handled.
llvm-svn: 269218
|
|
|
|
|
|
|
|
| |
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
|
|
|
|
|
|
|
| |
Previously it was just "// inline asm", which made it tricky to read
code with lots of inline assembly.
llvm-svn: 268994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
|