| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.
(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28721
llvm-svn: 292302
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28720
llvm-svn: 292301
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
has defined behavior on x == 0 to check whether x is, in fact zero.
* Add DAG patterns that avoid re-truncating or re-expanding the result
of the 16- and 64-bit ctz instructions.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28719
llvm-svn: 292299
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:
* An LLVM command-line option
* Parameters to the NVVMReflect constructor
* Metadata on the module itself.
This change removes the first two, leaving only the third.
The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information. Ideally we'd have a target-generic
piece of metadata on the module. This change moves us in that
direction.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28700
llvm-svn: 292068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only scalar half-precision operations are supported at the moment.
- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
(can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
fp16 operations are promoted to fp32 and results are converted back
to fp16 for storage.
Differential Revision: https://reviews.llvm.org/D28540
llvm-svn: 291956
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allowed.
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.
Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.
Differential Revision: https://reviews.llvm.org/D28619
llvm-svn: 291936
|
|
|
|
|
|
|
|
|
|
|
| |
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 291490
|
|
|
|
|
|
|
| |
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.
llvm-svn: 290294
|
|
|
|
| |
llvm-svn: 289747
|
|
|
|
|
|
|
|
|
|
|
| |
I've chosen to remove NVPTXInstrInfo::CanTailMerge but not
NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead)
because while the latter two are reasonably useful utilities, the former
cannot be used safely: It relies on successful address space inference
to identify writes to shared memory, but addrspace inference is a
best-effort thing.
llvm-svn: 289740
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D27638
llvm-svn: 289729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously they were defined as a 2D char array in a header file. This
is kind of overkill -- we can let the linker lay out these strings
however it pleases. While we're at it, we might as well just inline
these constants where they're used, as each of them is used only once.
Also move NVPTXUtilities.{h,cpp} into namespace llvm.
Reviewers: tra
Subscribers: jholewinski, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D27636
llvm-svn: 289728
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: beanz, lattner, jlebar
Subscribers: jholewinski, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D26235
llvm-svn: 285832
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass. We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.
Reviewers: tra
Subscribers: llvm-commits, jholewinski, jingyue, mgorny
Differential Revision: https://reviews.llvm.org/D26165
llvm-svn: 285642
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In isel, transform
Num % Den
into
Num - (Num / Den) * Den
if the result of Num / Den is already available.
Reviewers: tra
Subscribers: hfinkel, llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26090
llvm-svn: 285461
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
take a GlobalObject.
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Differential Revision: https://reviews.llvm.org/D25917
llvm-svn: 285006
|
|
|
|
| |
llvm-svn: 284975
|
|
|
|
|
|
|
|
|
|
| |
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
|
|
|
|
|
|
|
|
| |
This avoids "static initialization order fiasco"
Differential Revision: https://reviews.llvm.org/D25412
llvm-svn: 283702
|
|
|
|
| |
llvm-svn: 283515
|
|
|
|
| |
llvm-svn: 283004
|
|
|
|
|
|
| |
line or the other commented out place.
llvm-svn: 282673
|
|
|
|
|
|
|
|
| |
These are only available on sm_60+ GPUs.
Differential Revision: https://reviews.llvm.org/D24943
llvm-svn: 282607
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.
Reviewers: eliben, jpienaar
Subscribers: jlebar, vchuravy, cfe-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D9168
llvm-svn: 282045
|
|
|
|
|
|
| |
was "used" but not used.
llvm-svn: 281749
|
|
|
|
|
|
| |
TLOF API accordingly.
llvm-svn: 281708
|
|
|
|
| |
llvm-svn: 281535
|
|
|
|
|
|
|
| |
analyzeBranch was renamed to use lowercase first, rename
the related set to match.
llvm-svn: 281506
|
|
|
|
|
|
|
|
|
| |
The main change is to return the code size from
InsertBranch/RemoveBranch.
Patch mostly by Tim Northover
llvm-svn: 281505
|
|
|
|
| |
llvm-svn: 281493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic. This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.
Reviewers: tra
Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc
Differential Revision: https://reviews.llvm.org/D23477
llvm-svn: 281152
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously these only worked via NVPTX-specific intrinsics.
This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.
It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.
Reviewers: majnemer
Subscribers: llvm-commits, jholewinski, tra
Differential Revision: https://reviews.llvm.org/D24300
llvm-svn: 281092
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279602
|
|
|
|
|
|
|
| |
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.
llvm-svn: 279580
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279564
|
|
|
|
|
|
|
|
|
|
| |
MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.
This reverts commit r279502.
llvm-svn: 279503
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279502
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This switches us to use a different, more powerful algorithm for address
space inference. I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.
Reviewers: jingyue
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: https://reviews.llvm.org/D23694
llvm-svn: 279317
|
|
|
|
|
|
|
|
|
|
| |
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
|
|
|
|
|
|
| |
Follow up to r278902. I had missed "fall through", with a space.
llvm-svn: 278970
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.
For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.
Differential Revision: https://reviews.llvm.org/D23460
llvm-svn: 278568
|
|
|
|
|
|
|
|
|
| |
If the result of the find is only used to compare against end(), just
use is_contained instead.
No functionality change is intended.
llvm-svn: 278433
|
|
|
|
|
|
|
|
|
|
| |
info.
Also added test case to verify IR changes done by NVPTXGenericToNVVM pass.
Differential Revision: https://reviews.llvm.org/D22837
llvm-svn: 277520
|
|
|
|
|
|
|
|
|
|
|
| |
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
|
|
|
|
|
|
|
| |
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.
llvm-svn: 277017
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tra
Subscribers: jholewinski, arsenm, asbirlea
Differential Revision: https://reviews.llvm.org/D22592
llvm-svn: 276196
|