summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
...
* NVPTX: move NVPTXAllocaHoisting into the cpp fileBenjamin Kramer2015-03-103-34/+35
| | | | | | Also initialize without using static initialization. llvm-svn: 231822
* NVPTX: Remove copy of LLVMInitializeNVPTXAsmPrinter.Benjamin Kramer2015-03-101-7/+0
| | | | | | | | If anyone is using this for some strange reason, LLVMInitializeNVPTXAsmPrinter does exactly the same thing and is what other LLVM tools are calling. llvm-svn: 231810
* Remove incredibly confusing isBaseAddressKnownZero.Rafael Espindola2015-03-101-1/+0
| | | | | | | | | | | | | | | When referring to a symbol in a dwarf section on ELF we should use .long foo instead of .long foo - .debug_something because ELF is unaware of the content of the sections and therefore needs relocations. This has nothing to do with optimizing a -0. llvm-svn: 231751
* Use a better name for compile unit labels.Rafael Espindola2015-03-101-1/+0
| | | | | | | | | | | | | They mark the start of a compile unit, so name them .Lcu_*. Using Section->getLabelBeginName() makes it looks like they mark the start of the section. While at it, switch to createTempSymbol to avoid collisions with labels created in inline assembly. Not sure if a "don't crash" test is worth it. With this getLabelBeginName is dead, delete it. llvm-svn: 231750
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
* TableGen: Use 'enum : uint64_t' for feature flags to fix -WmicrosoftReid Kleckner2015-03-091-0/+2
| | | | | | | | | | | | | clang-cl would warn that this value is not representable in 'int': enum { FeatureX = 1ULL << 31 }; All MS enums are 'ints' unless otherwise specified, so we have to use an explicit type. The AMDGPU target just hit 32 features, triggering this warning. Now that we have C++11 strong enum types, we can also eliminate the 'const uint64_t' codepath from tablegen and just use 'enum : uint64_t'. llvm-svn: 231697
* Delete dead code. NFC.Rafael Espindola2015-03-091-1/+0
| | | | llvm-svn: 231682
* Move unreferenced passes into the cpp fileBenjamin Kramer2015-03-092-27/+25
| | | | | | NFC. llvm-svn: 231661
* Simplify expressions involving boolean constants with clang-tidyDavid Blaikie2015-03-091-1/+1
| | | | | | | | Patch by Richard (legalize at xmission dot com). Differential Revision: http://reviews.llvm.org/D8154 llvm-svn: 231617
* Make DataLayout Non-Optional in the ModuleMehdi Amini2015-03-043-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270
* Remove useless .debug_macinfo section setup.Paul Robinson2015-03-022-4/+0
| | | | llvm-svn: 231001
* NVPTX: Remove dead code.Benjamin Kramer2015-03-023-116/+0
| | | | | | | Fun fact: This file was never referenced since the initial checkin of the NVPTX backend. llvm-svn: 230957
* Convert push_back loops into append calls.Benjamin Kramer2015-02-281-3/+1
| | | | | | No functionality change intended. llvm-svn: 230849
* getRegForInlineAsmConstraint wants to use TargetRegisterInfo forEric Christopher2015-02-262-3/+5
| | | | | | | | | a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699
* Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.Eric Christopher2015-02-261-1/+1
| | | | | | | | | This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583
* Demote vectors to arrays. No functionality change.Benjamin Kramer2015-02-191-29/+11
| | | | llvm-svn: 229861
* Avoid using a self-referential initializer and fix up uses.Eric Christopher2015-02-191-3/+3
| | | | llvm-svn: 229790
* Remove all use of is64bit off of NVPTXSubtarget and clean up codeEric Christopher2015-02-1912-89/+61
| | | | | | | accordingly. This changes the constructors of a number of classes that don't need to know the subtarget's 64-bitness. llvm-svn: 229787
* Remove all use of getDrvInterface off of NVPTXSubtarget and cleanEric Christopher2015-02-194-51/+23
| | | | | | | up code accordingly. Delete code that was checking for all cases of an enum. llvm-svn: 229786
* Migrate the NVPTX backend asm printer to a per function subtarget.Eric Christopher2015-02-196-58/+74
| | | | | | | | | | | This involved moving two non-subtarget dependent features (64-bitness and the driver interface) to the NVPTX target machine and updating the uses (or migrating around the subtarget use for ease of review). Otherwise use the cached subtarget or create a default subtarget based on the TargetMachine cpu and feature string for the module level assembler emission. llvm-svn: 229785
* NVPTX: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-3/+1
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229260
* [PM] Remove the old 'PassManager.h' header file at the top level ofChandler Carruth2015-02-133-3/+3
| | | | | | | | | | | | | | | | | | | | LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094
* MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line ↵Benjamin Kramer2015-02-121-3/+3
| | | | | | | | with countTrailingZeros Update all callers. llvm-svn: 228930
* Add straight-line strength reduction to LLVMJingyue Wu2015-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Straight-line strength reduction (SLSR) is implemented in GCC but not yet in LLVM. It has proven to effectively simplify statements derived from an unrolled loop, and can potentially benefit many other cases too. For example, LLVM unrolls #pragma unroll foo (int i = 0; i < 3; ++i) { sum += foo((b + i) * s); } into sum += foo(b * s); sum += foo((b + 1) * s); sum += foo((b + 2) * s); However, no optimizations yet reduce the internal redundancy of the three expressions: b * s (b + 1) * s (b + 2) * s With SLSR, LLVM can optimize these three expressions into: t1 = b * s t2 = t1 + s t3 = t2 + s This commit is only an initial step towards implementing a series of such optimizations. I will implement more (see TODO in the file commentary) in the near future. This optimization is enabled for the NVPTX backend for now. However, I am more than happy to push it to the standard optimization pipeline after more thorough performance tests. Test Plan: test/StraightLineStrengthReduce/slsr.ll Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick Reviewed By: jholewinski, atrick Subscribers: karthikthecool, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7310 llvm-svn: 228016
* Remove usernames from TODOs, NFCJingyue Wu2015-02-031-3/+2
| | | | | | making the style consistent with the rest llvm-svn: 227991
* Resurrect the assertion removed by r227717Jingyue Wu2015-02-021-2/+1
| | | | | | | | | | | | | | Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*. Test Plan: no regression Reviewers: mkuper Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7327 llvm-svn: 227853
* [multiversion] Switch the TTI queries from TargetMachine to SubtargetChandler Carruth2015-02-011-7/+8
| | | | | | | | | | | | | | | | | | | now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738
* [multiversion] Remove the cached TargetMachine pointer from theChandler Carruth2015-02-011-3/+10
| | | | | | | | | | | | | | | | intermediate TTI implementation template and instead query up to the derived class for both the TargetMachine and the TargetLowering. Most of the derived types had a TLI cached already and there is no need to store a less precisely typed target machine pointer. This will in turn make it much cleaner to look up the TLI via a per-function subtarget instead of the generic subtarget, and it will pave the way toward pulling the subtarget used for unroll preferences into the same form once we are *always* using the function to look up the correct subtarget. llvm-svn: 227737
* [multiversion] Switch all of the targets over to use theChandler Carruth2015-02-012-3/+4
| | | | | | | | | | | | | | | | TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735
* [multiversion] Remove a false freedom to leave the TargetMachine pointerChandler Carruth2015-02-011-3/+2
| | | | | | | | | | | | | | | | | null. For some reason some of the original TTI code supported a null target machine. This seems to have been legacy, and I made matters worse when refactoring this code by spreading that pattern further through the various targets. The TargetMachine can't actually be null, and it doesn't make sense to support that use case. I've now consistently removed it and removed all of the code trying to cope with that situation. This is probably good, as several targets *didn't* cope with it being null despite the null default argument in their constructors. =] llvm-svn: 227734
* [NVPTX] Emit .pragma "nounroll" for loops marked with nounrollJingyue Wu2015-02-012-0/+48
| | | | | | | | | | | | | | | | | | | | | | | Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703
* [PM] Remove a bunch of stale TTI creation method declarations. I nukedChandler Carruth2015-02-011-1/+0
| | | | | | | their definitions, but forgot to clean up all the declarations which are in different files. llvm-svn: 227698
* [PM] Switch the TargetMachine interface from accepting a pass managerChandler Carruth2015-01-314-60/+73
| | | | | | | | | | | | | | | | | | | | | | | base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI *pass* in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
* [PM] Change the core design of the TTI analysis to use a polymorphicChandler Carruth2015-01-312-54/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
* Migrate a bare getSubtarget call to query the MachineFunctionEric Christopher2015-01-301-3/+3
| | | | | | for the target dependent one. llvm-svn: 227542
* Migrate NVPTXISelLowering to take the subtarget that it's dependentEric Christopher2015-01-303-18/+19
| | | | | | upon as an argument and store/use that in the entire function. llvm-svn: 227541
* Remove unused argument.Eric Christopher2015-01-301-6/+5
| | | | llvm-svn: 227539
* Migrate NVPTXISelDAGToDAG's getSubtarget to a runOnMachineFunctionEric Christopher2015-01-303-52/+56
| | | | | | version. Update NVPTXInstrInfo accordingly. llvm-svn: 227538
* [LPM] Stop using the string based preservation API. It is anChandler Carruth2015-01-282-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | abomination. For starters, this API is incredibly slow. In order to lookup the name of a pass it must take a memory fence to acquire a pointer to the managed static pass registry, and then potentially acquire locks while it consults this registry for information about what passes exist by that name. This stops the world of LLVMs in your process no matter how little they cared about the result. To make this more joyful, you'll note that we are preserving many passes which *do not exist* any more, or are not even analyses which one might wish to have be preserved. This means we do all the work only to say "nope" with no error to the user. String-based APIs are a *bad idea*. String-based APIs that cannot produce any meaningful error are an even worse idea. =/ I have a patch that simply removes this API completely, but I'm hesitant to commit it as I don't really want to perniciously break out-of-tree users of the old pass manager. I'd rather they just have to migrate to the new one at some point. If others disagree and would like me to kill it with fire, just say the word. =] llvm-svn: 227294
* [NVPTX] Generate a more optimal sequence for select of i1Justin Holewinski2015-01-263-5/+23
| | | | | | | | | Instead of creating a pattern like "(p && a) || ((!p) && b)", just expand the i8 operands to i32 and perform the selp on them. Fixes PR22246 llvm-svn: 227123
* [NVPTX] Handle floating-point conversion patterns that are not explicitly ↵Justin Holewinski2015-01-261-6/+13
| | | | | | | | ordered or unordered Fixes PR22322 llvm-svn: 227117
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-265-24/+23
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* std::unique_ptrify the MCStreamer argument to createAsmPrinterDavid Blaikie2015-01-181-2/+2
| | | | llvm-svn: 226414
* Update libdeps in NVPTXCodeGen, since r225944.NAKAMURA Takumi2015-01-141-1/+1
| | | | llvm-svn: 226055
* Override the TLI callback enableAggressiveFMAFusion and return true. Indeed, ↵Olivier Sallenave2015-01-141-0/+2
| | | | | | fmul, fmadd and fadd nodes cost the same number of cycles, so we can enable more combining heuristics to produce more fmadd nodes. llvm-svn: 225984
* [cleanup] Re-sort all the #include lines in LLVM usingChandler Carruth2015-01-146-8/+8
| | | | | | | | | | | utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974
* NVPTX: Use MapMetadata() instead of custom/stale/untested logicDuncan P. N. Exon Smith2015-01-141-41/+10
| | | | | | | | | Copy the `GVMap` over to a standard `ValueToValueMapTy` so that we can reuse the `MapMetadata()` logic. Unfortunately the `GVMap` can't just be replaced, since `MapMetadata()` likes to modify the map, but at least this will prevent NVPTX from bitrotting. llvm-svn: 225944
* NVPTX: Remove bogus remap logic for global variable address spacesDuncan P. N. Exon Smith2015-01-141-12/+1
| | | | | | | The comment is incorrect, and the code mangles debug info. Remove the bad logic, which wasn't tested anyway. llvm-svn: 225943
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.Ahmed Bougacha2015-01-071-3/+1
| | | | | | | | A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
OpenPOWER on IntegriCloud