summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [NVPTX] do not run DCE after SLSR and SeparateConstOffsetFromGEPJingyue Wu2015-04-211-10/+4
| | | | | | | | | | | | | | | | Summary: With D9096 and D9101, there's no need to run DCE after SLSR and SeparateConstOffsetFromGEP. Test Plan: no regression Reviewers: jholewinski, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9172 llvm-svn: 235415
* Simplify the query for a subtarget in the NVPTX pass manager.Eric Christopher2015-03-211-2/+1
| | | | llvm-svn: 232876
* Move the DataLayout to the generic TargetMachine, making it mandatory.Mehdi Amini2015-03-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I don't know why every singled backend had to redeclare its own DataLayout. There was a virtual getDataLayout() on the common base TargetMachine, the default implementation returned nullptr. It was not clear from this that we could assume at call site that a DataLayout will be available with each Target. Now getDataLayout() is no longer virtual and return a pointer to the DataLayout member of the common base TargetMachine. I plan to turn it into a reference in a future patch. The only backend that didn't have a DataLayout previsouly was the CPPBackend. It now initializes the default DataLayout. This commit is NFC for all the other backends. Test Plan: clang+llvm ninja check-all Reviewers: echristo Subscribers: jfb, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8243 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231987
* NVPTX: move NVPTXAllocaHoisting into the cpp fileBenjamin Kramer2015-03-101-0/+2
| | | | | | Also initialize without using static initialization. llvm-svn: 231822
* Remove all use of is64bit off of NVPTXSubtarget and clean up codeEric Christopher2015-02-191-1/+1
| | | | | | | accordingly. This changes the constructors of a number of classes that don't need to know the subtarget's 64-bitness. llvm-svn: 229787
* Migrate the NVPTX backend asm printer to a per function subtarget.Eric Christopher2015-02-191-3/+6
| | | | | | | | | | | This involved moving two non-subtarget dependent features (64-bitness and the driver interface) to the NVPTX target machine and updating the uses (or migrating around the subtarget use for ease of review). Otherwise use the cached subtarget or create a default subtarget based on the TargetMachine cpu and feature string for the module level assembler emission. llvm-svn: 229785
* [PM] Remove the old 'PassManager.h' header file at the top level ofChandler Carruth2015-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094
* Add straight-line strength reduction to LLVMJingyue Wu2015-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Straight-line strength reduction (SLSR) is implemented in GCC but not yet in LLVM. It has proven to effectively simplify statements derived from an unrolled loop, and can potentially benefit many other cases too. For example, LLVM unrolls #pragma unroll foo (int i = 0; i < 3; ++i) { sum += foo((b + i) * s); } into sum += foo(b * s); sum += foo((b + 1) * s); sum += foo((b + 2) * s); However, no optimizations yet reduce the internal redundancy of the three expressions: b * s (b + 1) * s (b + 2) * s With SLSR, LLVM can optimize these three expressions into: t1 = b * s t2 = t1 + s t3 = t2 + s This commit is only an initial step towards implementing a series of such optimizations. I will implement more (see TODO in the file commentary) in the near future. This optimization is enabled for the NVPTX backend for now. However, I am more than happy to push it to the standard optimization pipeline after more thorough performance tests. Test Plan: test/StraightLineStrengthReduce/slsr.ll Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick Reviewed By: jholewinski, atrick Subscribers: karthikthecool, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7310 llvm-svn: 228016
* [multiversion] Switch all of the targets over to use theChandler Carruth2015-02-011-2/+3
| | | | | | | | | | | | | | | | TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735
* [PM] Switch the TargetMachine interface from accepting a pass managerChandler Carruth2015-01-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI *pass* in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
* [PM] Change the core design of the TTI analysis to use a polymorphicChandler Carruth2015-01-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-261-0/+12
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-6/+3
| | | | | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
* This reverts commit r224043 and r224042.Rafael Espindola2014-12-111-3/+6
| | | | | | check-llvm was failing. llvm-svn: 224045
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-6/+3
| | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
* Add out of line virtual destructors to all LLVMTargetMachine subclassesReid Kleckner2014-11-201-0/+2
| | | | | | | | | | | | | | | | | These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-131-0/+2
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* [NVPTX] Add an NVPTX-specific TargetTransformInfoJingyue Wu2014-11-101-0/+8
| | | | | | | | | | | | | | | | | | | | Summary: It currently only implements hasBranchDivergence, and will be extended in later diffs. Split from D6188. Test Plan: make check-all Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6195 llvm-svn: 221619
* [NVPTX] Add NVPTXLowerStructArgs passJustin Holewinski2014-11-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | This works around the limitation that PTX does not allow .param space loads/stores with arbitrary pointers. If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then add the following instructions to the first basic block : %temp = alloca %struct.x, align 8 %tt1 = bitcast %struct.x * %d to i8 * %tt2 = llvm.nvvm.cvt.gen.to.param %tt2 %tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) * %tv = load %struct.x addrspace(101) * %tempd store %struct.x %tv, %struct.x * %temp, align 8 The above code allocates some space in the stack and copies the incoming struct from param space to local space. Then replace all occurences of %d by %temp. Fixes PR21465. llvm-svn: 221377
* [NVPTX] Directly control the Machine SSA passes that are invoked for NVPTX.Justin Holewinski2014-06-271-0/+41
| | | | | | | NVPTX is a bit special in the optimizations it requires, so this gives us better control over the backend optimization pipeline. llvm-svn: 211927
* Move NVPTX subtarget dependent variables from the target machineEric Christopher2014-06-271-14/+1
| | | | | | to the subtarget. llvm-svn: 211860
* Remove unnecessary caching of the TargetMachine on NVPTXFrameLowering.Eric Christopher2014-06-271-1/+1
| | | | | | Adjust the constructor accordingly. llvm-svn: 211846
* Remove caching of the target machine in NVPTXInstrInfo andEric Christopher2014-06-271-1/+1
| | | | | | update constructor accordingly. llvm-svn: 211840
* Remove comment that duplicated information in the constructorEric Christopher2014-06-271-6/+6
| | | | | | that it's after. llvm-svn: 211839
* Have TargetSelectionDAGInfo take a DataLayout initializer rather thanEric Christopher2014-06-061-1/+1
| | | | | | a TargetMachine since the only thing it wants is DataLayout. llvm-svn: 210366
* Add an optimization that does CSE in a group of similar GEPs.Eli Bendersky2014-05-011-4/+17
| | | | | | | | | | | | | | This optimization merges the common part of a group of GEPs, so we can compute each pointer address by adding a simple offset to the common part. The optimization is currently only enabled for the NVPTX backend, where it has a large payoff on some benchmarks. Review: http://reviews.llvm.org/D3462 Patch by Jingyue Wu. llvm-svn: 207783
* [C++11] Add 'override' keywords and remove 'virtual'. Additionally add ↵Craig Topper2014-04-291-8/+8
| | | | | | 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. NVPTX edition llvm-svn: 207505
* [C++] Use 'nullptr'. Target edition.Craig Topper2014-04-251-1/+1
| | | | llvm-svn: 207197
* [C++11] Replace OwningPtr with std::unique_ptr in places where it doesn't ↵Benjamin Kramer2014-04-211-1/+0
| | | | | | | | break the API. No functionality change. llvm-svn: 206740
* [NVPTX] Add preliminary intrinsics and codegen support for textures/surfacesJustin Holewinski2014-04-091-0/+8
| | | | | | This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later). llvm-svn: 205907
* Optimize away unnecessary address casts.Eli Bendersky2014-04-031-0/+9
| | | | | | | | | Removes unnecessary casts from non-generic address spaces to the generic address space for certain code patterns. Patch by Jingyue Wu. llvm-svn: 205571
* Fix for PR19099 - NVPTX produces invalid symbol names.Eli Bendersky2014-03-311-0/+3
| | | | | | | | This is a more thorough fix for the issue than r203483. An IR pass will run before NVPTX codegen to make sure there are no invalid symbol names that can't be consumed by the ptxas assembler. llvm-svn: 205212
* Removes the NVPTXSplitBBatBar pass.Eli Bendersky2014-03-241-2/+0
| | | | | | | This pass is a historic remnant and actually causes less efficient code to be generated in some cases. llvm-svn: 204620
* Switch all uses of LLVM_OVERRIDE to just use 'override' directly.Craig Topper2014-03-021-1/+1
| | | | llvm-svn: 202621
* [cleanup] Move the Dominators.h and Verifier.h headers into the IRChandler Carruth2014-01-131-1/+1
| | | | | | | | | | | | | | | | | | directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082
* [PM] Rename the IR printing pass header to a more generic and correctChandler Carruth2014-01-121-1/+1
| | | | | | | | name to match the source file which I got earlier. Update the include sites. Also modernize the comments in the header to use the more recommended doxygen style. llvm-svn: 199041
* Move the LLVM IR asm writer header files into the IR directory, as theyChandler Carruth2014-01-071-1/+1
| | | | | | | | | | | | | | | | | are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. llvm-svn: 198688
* The preferred alignment defaults to the abi alignment. Omit if it is the same.Rafael Espindola2013-12-161-2/+2
| | | | llvm-svn: 197400
* Don't duplicate the DataLayout defaults for integer, floats and vectors.Rafael Espindola2013-12-161-3/+1
| | | | llvm-svn: 197398
* On DataLayout, omit the default of p:64:64:64.Rafael Espindola2013-12-161-3/+1
| | | | llvm-svn: 197397
* Refactor NVPTX's computeDataLayout.Rafael Espindola2013-12-141-4/+9
| | | | | | No functionality change. llvm-svn: 197312
* Turn NVPTXSubtarget::getDataLayout into a static function.Rafael Espindola2013-12-141-1/+11
| | | | | | No functionality change. llvm-svn: 197311
* [NVPTX] Blacklist TailDuplicate passJustin Holewinski2013-11-111-0/+1
| | | | | | | | This causes issues with virtual registers. We will likely need to fix TailDuplicate in the future, or introduce a new version that plays nicely with vregs. llvm-svn: 194373
* Assert on duplicate registration. Don't depend on function pointer equality.Rafael Espindola2013-10-161-3/+0
| | | | | | | | | | | | | | | | | | | | Before this patch we would assert when building llvm as multiple shared libraries (cmake's BUILD_SHARED_LIBS). The problem was the line if (T.AsmStreamerCtorFn == Target::createDefaultAsmStreamer) which returns false because of -fvisibility-inlines-hidden. It is easy to fix just this one case, but I decided to try to also make the registration more strict. It looks like the old logic for ignoring followup registration was just a temporary hack that outlived its usefulness. This patch converts the ifs to asserts, fixes the few cases that were registering twice and makes sure all the asserts compare with null. Thanks for Joerg for reporting the problem and reviewing the patch. llvm-svn: 192803
* [NVPTX] Switch from StrongPHIElimination to PHIElimination in ↵Justin Holewinski2013-10-111-2/+22
| | | | | | | | NVPTXTargetMachine, and add some missing optimization passes to addOptimizedRegAlloc Fixes PR17529 llvm-svn: 192445
* NVPTX: Don't even create a regalloc if we're not going to use it.Benjamin Kramer2013-05-311-2/+7
| | | | | | Fixes a leak found by valgrind. llvm-svn: 183031
* [NVPTX] Re-enable support for virtual registers in the final outputJustin Holewinski2013-05-311-0/+27
| | | | | | | | | | | | Now that 3.3 is branched, we are re-enabling virtual registers to help iron out bugs before the next release. Some of the post-RA passes do not play well with virtual registers, so we disable them for now. The needed functionality of the PrologEpilogInserter pass is copied to a new backend-specific NVPTXPrologEpilog pass. The test for this commit is not breaking the existing tests. llvm-svn: 182998
* Move passes from namespace llvm into anonymous namespaces. Sort includes ↵Benjamin Kramer2013-05-231-2/+2
| | | | | | while there. llvm-svn: 182594
* [NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputsJustin Holewinski2013-05-201-0/+8
| | | | | | | | | | | | | | | This converter currently only handles global variables in address space 0. For these variables, they are promoted to address space 1 (global memory), and all uses are updated to point to the result of a cvta.global instruction on the new variable. The motivation for this is address space 0 global variables are illegal since we cannot declare variables in the generic address space. Instead, we place the variables in address space 1 and explicitly convert the pointer to address space 0. This is primarily intended to help new users who expect to be able to place global variables in the default address space. llvm-svn: 182254
* Remove the MachineMove class.Rafael Espindola2013-05-131-1/+3
| | | | | | | | | | | | It was just a less powerful and more confusing version of MCCFIInstruction. A side effect is that, since MCCFIInstruction uses dwarf register numbers, calls to getDwarfRegNum are pushed out, which should allow further simplifications. I left the MachineModuleInfo::addFrameMove interface unchanged since this patch was already fairly big. llvm-svn: 181680
OpenPOWER on IntegriCloud