bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NVPTX] Added a feature to use short pointers for const/local/shared AS.	Artem Belevich	2018-05-09	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Const/local/shared address spaces are all < 4GB and we can always use 32-bit pointers to access them. This has substantial performance impact on kernels that uses shared memory for intermediary results. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46147 llvm-svn: 331941
*	[CodeGen]Add NoVRegs property on PostRASink and ShrinkWrap	Jun Bum Lim	2018-04-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs property, so now the MachineFunctionPass can enforce this check. These passes are disabled in NVPTX & WebAssembly. Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier Reviewed By: dschuff, thegameg Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D45183 llvm-svn: 329095
*	[NVPTX] Enable StructuredCFG for NVPTX	Tim Shen	2018-03-30	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make NVPTX require structured CFG. Added a temporary flag to "roll back" the behavior for easy deployment. Combined with D45008, this fixes several internal Nvidia GPU test failures that we suspect to be ptxas miscompiles (PR27738). Reviewers: jlebar Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D45070 llvm-svn: 328885
*	Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC	Matthias Braun	2018-01-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids playing games with pseudo pass IDs and avoids using an unreliable MRI::isSSA() check to determine whether register allocation has happened. Note that this renames: - MachineLICMID -> EarlyMachineLICM - PostRAMachineLICMID -> MachineLICMID to be consistent with the EarlyTailDuplicate/TailDuplicate naming. llvm-svn: 322927
*	(Re-landing) Expose a TargetMachine::getTargetTransformInfo function	Sanjoy Das	2017-12-22	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321375
*	Revert "Expose a TargetMachine::getTargetTransformInfo function"	Sanjoy Das	2017-12-21	1	-3/+4
\| \| \| \| \| \|	This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. llvm-svn: 321243
*	Expose a TargetMachine::getTargetTransformInfo function	Sanjoy Das	2017-12-21	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321234
*	Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"	Matthias Braun	2017-10-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637
*	TargetMachine: Merge TargetMachine and LLVMTargetMachine	Matthias Braun	2017-10-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633
*	Delete Default and JITDefault code models	Rafael Espindola	2017-08-03	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911
*	[NVPTX] Add lowering of i128 params.	Artem Belevich	2017-07-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 308675
*	Reverting r307326 because it breaks clang tests.	Michael Kuperstein	2017-07-06	1	-1/+1
\| \| \| \|	llvm-svn: 307334
*	[NVPTX] Add lowering of i128 params.	Michael Kuperstein	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
*	Sort the remaining #include lines in include/... and lib/....	Chandler Carruth	2017-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
*	TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC	Matthias Braun	2017-05-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
*	NVPTX: Move InferAddressSpaces to generic code	Matt Arsenault	2017-01-31	1	-3/+1
\| \| \| \|	llvm-svn: 293579
*	Replace addEarlyAsPossiblePasses callback with adjustPassManager	Stanislav Mekhanoshin	2017-01-26	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
*	[NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2017-01-09	1	-23/+14
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 291490
*	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	Justin Lebar	2016-10-31	1	-16/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642
*	Move the global variables representing each Target behind accessor function	Mehdi Amini	2016-10-09	1	-2/+2
\| \| \| \| \| \| \| \|	This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
*	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses	Matthias Braun	2016-08-24	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602
*	Revert r279564. It introduces undefined behavior (binding a reference to a	Richard Smith	2016-08-23	1	-0/+1
\| \| \| \| \| \| \|	dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580
*	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses	Matthias Braun	2016-08-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564
*	Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove ↵	Matthias Braun	2016-08-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	MachineFunctionAnalysis => Enable (Machine)ModulePasses" Reverting while tracking down a use after free. This reverts commit r279502. llvm-svn: 279503
*	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses	Matthias Braun	2016-08-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279502
*	[NVPTX] Switch nvptx-use-infer-addrspace to true.	Justin Lebar	2016-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This switches us to use a different, more powerful algorithm for address space inference. I've tested this locally and it seems to work great. Once we're more confident in it, we can remove the old pass altogether. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D23694 llvm-svn: 279317
*	[NVPTX] Enable the load-store vectorizer on nvptx.	Justin Lebar	2016-07-20	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196
*	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.	Artem Belevich	2016-07-20	1	-5/+5
\| \| \| \| \| \| \| \|	After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189
*	[NVPTX] Added NVVMIntrRange pass	Artem Belevich	2016-05-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	NVVMIntrRange adds !range metadata to calls of NVVM intrinsics that return values within known limited range. This allows LLVM to generate optimal code for indexing arrays based on tid/ctaid which is a frequently used pattern in CUDA code. Differential Revision: http://reviews.llvm.org/D20644 llvm-svn: 270872
*	Delete Reloc::Default.	Rafael Espindola	2016-05-18	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988
*	CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC	Matthias Braun	2016-05-10	1	-0/+1
\| \| \| \| \| \| \| \|	Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011
*	[NVPTX] Run NVVMReflect at the beginning of IR passes.	Justin Lebar	2016-04-27	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently the NVVMReflect pass is run at the beginning of our backend passes. But really, it should be run as early as possible, as it's simply resolving an "if" statement in code. So copy it into TargetMachine::addEarlyAsPossiblePasses. We still run it at the beginning of the backend passes, since it's needed for correctness when lowering to nvptx. (Specifically, NVVMReflect changes each call to the __nvvm_reflect function or llvm.nvvm.reflect intrinsic into an integer constant, based on the pass's configuration. Clearly we miss many optimization opportunities if we perform this transformation at the beginning of codegen.) Reviewers: rnk Subscribers: tra, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18616 llvm-svn: 267765
*	[NVPTX] Fix some usages of CodeGenOpt::None.	Jingyue Wu	2016-04-26	1	-5/+9
\| \| \| \| \| \| \| \| \| \|	NVPTXLowerKernelArgs is required for correctness, so it should not be guarded by CodeGenOpt::None. NVPTXPeephole is optimization only, so it should be skipped when CodeGenOpt::None. llvm-svn: 267619
*	Disable the PatchableFunction pass for NVPTX & Wasm	Sanjoy Das	2016-04-19	1	-0/+1
\| \| \| \| \| \| \|	PatchableFunction requires AllVRegsAllocated that these targets don't provide. llvm-svn: 266720
*	Introduce MachineFunctionProperties and the AllVRegsAllocated property	Derek Schuff	2016-03-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593
*	[NVPTX] Adds a new address space inference pass.	Jingyue Wu	2016-03-20	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable to convert the address space of a pointer induction variable. This patch adds a new pass called NVPTXInferAddressSpaces that overcomes that limitation using a fixed-point data-flow analysis (see the file header comments for details). The new pass is experimental and not enabled by default. Users can turn it on by setting the -nvptx-use-infer-addrspace flag of llc. Reviewers: jholewinski, tra, jlebar Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17965 llvm-svn: 263916
*	[PM] Port GVN to the new pass manager, wire it up, and teach a couple of	Chandler Carruth	2016-03-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests to run GVN in both modes. This is mostly the boring refactoring just like SROA and other complex transformation passes. There is some trickiness in that GVN's ValueNumber class requires hand holding to get to compile cleanly. I'm open to suggestions about a better pattern there, but I tried several before settling on this. I was trying to balance my desire to sink as much implementation detail into the source file as possible without introducing overly many layers of abstraction. Much like with SROA, the design of this system is made somewhat more cumbersome by the need to support both pass managers without duplicating the significant state and logic of the pass. The same compromise is struck here. I've also left a FIXME in a doxygen comment as the GVN pass seems to have pretty woeful documentation within it. I'd like to submit this with the FIXME and let those more deeply familiar backfill the information here now that we have a nice place in an interface to put that kind of documentaiton. Differential Revision: http://reviews.llvm.org/D18019 llvm-svn: 263208
*	[NVPTX] Disable performance optimizations when OptLevel==None	Jingyue Wu	2016-02-04	1	-21/+36
\| \| \| \| \| \| \| \| \| \|	Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
*	constify the Function parameter to the TTI creation callback and	Eric Christopher	2015-09-16	1	-1/+1
\| \| \| \| \| \|	propagate to all callers/users/etc. llvm-svn: 247864
*	[NVPTX] Added run NVVMReflect pass to NVPTX back-end.	Artem Belevich	2015-09-08	1	-0/+1
\| \| \| \| \| \| \| \| \|	The pass is needed to remove __nvvm_reflect calls when we link in libdevice bitcode that comes with CUDA. Differential Revision: http://reviews.llvm.org/D11663 llvm-svn: 247072
*	Roll forward r242871	Jingyue Wu	2015-07-29	1	-1/+0
\| \| \| \| \| \| \|	r242871 missed one place that should be guarded with isPhysicalReg. This patch fixes that. llvm-svn: 243555
*	Temporarily revert r242871	Jingyue Wu	2015-07-29	1	-0/+1
\| \| \| \| \| \|	PR24299 llvm-svn: 243522
*	[NVPTX] run LSR before straight-line optimizations	Jingyue Wu	2015-07-23	1	-5/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Straight-line optimizations can simplify the loop body and make LSR's cost analysis more precise. This significantly improves several Eigen3 CUDA benchmarks. With this change, EigenContractionKernel runs up to 40% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502). EigenConvolutionKernel2D runs up to 10% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605). I have some difficulties writing small tests that benefit from this reordering due to a seemingly issue with LSR (being discussed at http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html). See the review thread for the compilation time impact of GVN. Reviewers: eliben, jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11304 llvm-svn: 242982
*	[BranchFolding] do not iterate the aliases of virtual registers	Jingyue Wu	2015-07-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MCRegAliasIterator only works for physical registers. So, do not run it on virtual registers. With this issue fixed, we can resurrect the BranchFolding pass in NVPTX backend. Reviewers: jholewinski, bkramer Subscribers: henryhu, meheff, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11174 llvm-svn: 242871
*	[NVPTX] enable SpeculativeExecution in NVPTX	Jingyue Wu	2015-07-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SpeculativeExecution enables a series straight line optimizations (such as SLSR and NaryReassociate) on conditional code. For example, if (...) ... b * s ... if (...) ... (b + 1) * s ... speculative execution can hoist b * s and (b + 1) * s from then-blocks, so that we have ... b * s ... if (...) ... ... (b + 1) * s ... if (...) ... Then, SLSR can rewrite (b + 1) * s to (b * s + s) because after speculative execution b * s dominates (b + 1) * s. The performance impact of this change is significant. It speeds up the benchmarks running EigenFloatContractionKernelInternal16x16 (https://bitbucket.org/eigen/eigen/src/ba68f42fa69e4f43417fe1e52669d4dd5d2b3bee/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-526) by roughly 2%. Some internal benchmarks that have the above code pattern are improved by up to 40%. No significant slowdowns are observed on Eigen CUDA microbenchmarks. Reviewers: jholewinski, broune, eliben Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11201 llvm-svn: 242437
*	Correct lowering of memmove in NVPTX	Eli Bendersky	2015-07-16	1	-8/+10
\| \| \| \| \| \| \| \| \| \|	This fixes https://llvm.org/bugs/show_bug.cgi?id=24056 Also a bit of refactoring along the way. Differential Revision: http://reviews.llvm.org/D11220 llvm-svn: 242413
*	Make TargetTransformInfo keeping a reference to the Module DataLayout	Mehdi Amini	2015-07-09	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774
*	[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass	Jingyue Wu	2015-07-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass. Patched by Xuetian Weng. Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10853 llvm-svn: 241185
*	Add NVPTXPeephole pass to reduce unnecessary address cast	Jingyue Wu	2015-06-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch first change the register that holds local address for stack frame to %SPL. Then the new NVPTXPeephole pass will try to scan the following pattern %vreg0<def> = LEA_ADDRi64 <fi#0>, 4 %vreg1<def> = cvta_to_local %vreg0 and transform it into %vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4 Patched by Xuetian Weng Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10549 llvm-svn: 240587
*	Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address	Jingyue Wu	2015-06-17	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is done by first adding two additional instructions to convert the alloca returned address to local and convert it back to generic. Then replace all uses of alloca instruction with the converted generic address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine the generic addresscast and the corresponding Load, Store, Bitcast, GEP Instruction together. Patched by Xuetian Weng (xweng@google.com). Test Plan: test/CodeGen/NVPTX/lower-alloca.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: meheff, broune, eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10483 llvm-svn: 239964