bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[NVPTX] Use different, convergent MIs for convergent calls.	Justin Lebar	2016-03-01	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Calls sometimes need to be convergent. This is already handled at the LLVM IR level, but it also needs to be handled at the MI level. Ideally we'd propagate convergence from instructions, down through the selection DAG, and into MIs. But this is Hard, and would affect optimizations in the SDNs -- right now only SDNs with two operands have any flags at all. Instead, here's a much simpler hack: Add new opcodes for NVPTX for convergent calls, and generate these when lowering convergent LLVM calls. Reviewers: jholewinski Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17423 llvm-svn: 262373
*	Don't tail-duplicate blocks that contain convergent instructions.	Justin Lebar	2016-02-22	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Convergent instrs shouldn't be made control-dependent on other values, but this is basically the whole point of tail duplication. So just bail if we see a convergent instruction. Reviewers: iteratee Subscribers: jholewinski, jhen, hfinkel, tra, jingyue, llvm-commits Differential Revision: http://reviews.llvm.org/D17320 llvm-svn: 261540
*	[NVPTX] Test that MachineSink won't sink across llvm.cuda.syncthreads.	Justin Lebar	2016-02-17	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The syncthreads MI is modeled as mayread/maywrite -- convergence doesn't even come into play here. Nonetheless this property is highly implicit in the tablegen files, so a test seems appropriate. Reviewers: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D17319 llvm-svn: 261114
*	[NVPTX] Annotate call machine instructions as calls.	Justin Lebar	2016-02-17	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Otherwise we'll try to do unsafe optimizations on these MIs, such as sinking loads below calls. (I suspect that this is not the only bug in the NVPTX instruction tablegen files; I need to comb through them.) Reviewers: jholewinski, tra Subscribers: jingyue, jhen, llvm-commits Differential Revision: http://reviews.llvm.org/D17315 llvm-svn: 261113
*	llvm/test/CodeGen/NVPTX/debug-file-loc.ll: Tweak expressions for dos path.	NAKAMURA Takumi	2016-02-11	1	-2/+2
\| \| \| \|	llvm-svn: 260623
*	[NVPTX] emit .file directives for files referenced by subprograms.	Artem Belevich	2016-02-11	1	-0/+44
\| \| \| \| \| \| \| \|	.. so .loc directives referring to those files work correctly. Differential Revision: http://reviews.llvm.org/D17086 llvm-svn: 260557
*	[NVPTX] Disable performance optimizations when OptLevel==None	Jingyue Wu	2016-02-04	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
*	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	Justin Lebar	2016-01-30	3	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we'd just silently skip these. Reviewers: tra, jholewinski Subscribers: llvm-commits, jhen, echristo, Differential Revision: http://reviews.llvm.org/D16739 llvm-svn: 259279
*	[CUDA] Die gracefully when trying to output an LLVM alias.	Justin Lebar	2016-01-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638
*	[NVPTX] expand mul_lohi to mul_lo and mul_hi	Jingyue Wu	2016-01-22	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536
*	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed ↵	Artem Belevich	2016-01-15	1	-0/+16
\| \| \| \| \| \|	by PTX. llvm-svn: 257961
*	Have a single way for creating unique value names.	Rafael Espindola	2015-11-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	We had two code paths. One would create names like "foo.1" and the other names like "foo1". For globals it is important to use "foo.1" to help C++ name demangling. For locals there is no strong reason to go one way or the other so I kept the most common mangling (foo1). llvm-svn: 253804
*	Revert "Change memcpy/memset/memmove to have dest and source alignments."	Pete Cooper	2015-11-19	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
*	Change memcpy/memset/memmove to have dest and source alignments.	Pete Cooper	2015-11-18	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
*	[NVPTX] Let NVPTX backend detect integer min and max patterns.	Bjarke Hammersholt Roune	2015-08-26	1	-0/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support. Reviewers: jholewinski, meheff, jingyue Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D12377 llvm-svn: 246107
*	[NVPTX] Allow undef value as global initializer	Jingyue Wu	2015-08-22	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785
*	Use 32-bit divides instead of 64-bit divides where possible.	Mark Heffernan	2015-08-11	1	-0/+80
\| \| \| \| \| \| \| \| \|	For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason is that many index computations are carried out in 64-bits but never actually exceed the capacity of a 32-bit word. llvm-svn: 244684
*	SelectionDAG: Prefer to combine multiplication with less uses for fma	Jingyue Wu	2015-08-11	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For example: s6 = s0s5; s2 = s6s6 + s6; ... s4 = s6*s3; We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)). This only happens when Aggressive is true, otherwise hasOneUse() check already prevents from folding the multiplication with more uses. Test Plan: test/CodeGen/NVPTX/fma-assoc.ll Patch by Xuetian Weng Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm Subscribers: arsenm, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11855 llvm-svn: 244649
*	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI	Jonathan Roelofs	2015-08-10	1	-1/+1
\| \| \| \| \| \| \|	I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481
*	[NVPTX] Use LDG for pointer induction variables.	Bjarke Hammersholt Roune	2015-08-05	1	-1/+55
\| \| \| \| \| \| \| \|	More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables. Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads. llvm-svn: 244166
*	[NVPTX] allow register copy between float and int	Jingyue Wu	2015-08-01	1	-0/+224
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR24303. With Bruno's WIP (D11197) on PeepholeOptimizer, across-class register copying (e.g. i32 to f32) becomes possible. Enhance NVPTXInstrInfo::copyPhysReg to handle these cases. Reviewers: jholewinski Subscribers: eliben, jholewinski, llvm-commits, bruno Differential Revision: http://reviews.llvm.org/D11622 llvm-svn: 243839
*	[NVPTX] convert pointers in byval kernel arguments to global	Jingyue Wu	2015-07-31	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For example, in struct S { int x; int y; }; __global__ void foo(S s) { int *b = s.y; // use b } "b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for accessing "b". Reviewers: jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11505 llvm-svn: 243790
*	Roll forward r242871	Jingyue Wu	2015-07-29	1	-0/+40
\| \| \| \| \| \| \|	r242871 missed one place that should be guarded with isPhysicalReg. This patch fixes that. llvm-svn: 243555
*	Temporarily revert r242871	Jingyue Wu	2015-07-29	1	-24/+0
\| \| \| \| \| \|	PR24299 llvm-svn: 243522
*	[BranchFolding] do not iterate the aliases of virtual registers	Jingyue Wu	2015-07-22	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MCRegAliasIterator only works for physical registers. So, do not run it on virtual registers. With this issue fixed, we can resurrect the BranchFolding pass in NVPTX backend. Reviewers: jholewinski, bkramer Subscribers: henryhu, meheff, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11174 llvm-svn: 242871
*	[NVPTX] make load on global readonly memory to use ldg	Jingyue Wu	2015-07-20	1	-0/+210
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [NVPTX] make load on global readonly memory to use ldg Summary: As describe in [1], ld.global.nc may be used to load memory by nvcc when __restrict__ is used and compiler can detect whether read-only data cache is safe to use. This patch will try to check whether ldg is safe to use and use them to replace ld.global when possible. This change can improve the performance by 18~29% on affected kernels (ratt_kernel and rwdot_kernel) in S3D benchmark of shoc [2]. Patched by Xuetian Weng. [1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache [2] https://github.com/vetter/shoc Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll Reviewers: jholewinski, jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11314 llvm-svn: 242713
*	Use inbounds GEPs for memcpy and memset lowering	Eli Bendersky	2015-07-17	1	-5/+5
\| \| \| \| \| \|	Follow-up on discussion in http://reviews.llvm.org/D11220 llvm-svn: 242542
*	Correct lowering of memmove in NVPTX	Eli Bendersky	2015-07-16	1	-22/+96
\| \| \| \| \| \| \| \| \| \|	This fixes https://llvm.org/bugs/show_bug.cgi?id=24056 Also a bit of refactoring along the way. Differential Revision: http://reviews.llvm.org/D11220 llvm-svn: 242413
*	Actually support volatile memcpys in NVPTX lowering	Eli Bendersky	2015-07-10	1	-0/+13
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D11091 llvm-svn: 241914
*	[NVPTX] declare no vector registers	Jingyue Wu	2015-07-10	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll) produces code with complex control flow which hurts later optimizations. Since NVPTX doesn't have vector registers in LLVM's sense (NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector registers to effectively disable loop vectorization. Reviewers: jholewinski Subscribers: jingyue, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11089 llvm-svn: 241884
*	Add tests for the NVPTXLowerAggrCopies pass.	Eli Bendersky	2015-07-08	1	-0/+34
\| \| \| \| \| \| \|	Note: not testing memmove lowering for now, as it's broken [see https://llvm.org/bugs/show_bug.cgi?id=24056] llvm-svn: 241736
*	[NVPTX] expand extload/truncstore for vectors of floats	Jingyue Wu	2015-07-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: According to PTX ISA: For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type. So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of. As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like ld.v2.f32 {%fd3, %fd4}, [%rd2] This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated. Patched by Gang Hu. Test Plan: Test case attached. Reviewers: jingyue Reviewed By: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D10876 llvm-svn: 241191
*	[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass	Jingyue Wu	2015-07-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass. Patched by Xuetian Weng. Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10853 llvm-svn: 241185
*	[NVPTX] Fix issue introduced in D10321	Jingyue Wu	2015-06-30	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Really check if %SP is not used in other places, instead of checking only exact one non-dbg use. Patched by Xuetian Weng. Test Plan: @foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that SP will appear twice. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: llvm-commits, sfantao, jholewinski Differential Revision: http://reviews.llvm.org/D10844 llvm-svn: 241099
*	Force relocation mode to be default, regardless of what is passed to the ↵	Samuel Antao	2015-06-30	1	-0/+15
\| \| \| \| \| \|	backend. llvm-svn: 241081
*	[NVPTX] noop when kernel pointers are already global	Jingyue Wu	2015-06-26	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some front ends make kernel pointers global already. In that case, handlePointerParams does nothing. Test Plan: more tests in lower-kernel-ptr-arg.ll Reviewers: grosser Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10779 llvm-svn: 240849
*	Add NVPTXPeephole pass to reduce unnecessary address cast	Jingyue Wu	2015-06-24	2	-6/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch first change the register that holds local address for stack frame to %SPL. Then the new NVPTXPeephole pass will try to scan the following pattern %vreg0<def> = LEA_ADDRi64 <fi#0>, 4 %vreg1<def> = cvta_to_local %vreg0 and transform it into %vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4 Patched by Xuetian Weng Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10549 llvm-svn: 240587
*	[NVPTX] Added missing test case for llvm.nvvm.sqrt.f NVPTX intrinsic	Artem Belevich	2015-06-23	1	-0/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D10663 llvm-svn: 240437
*	Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address	Jingyue Wu	2015-06-17	2	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is done by first adding two additional instructions to convert the alloca returned address to local and convert it back to generic. Then replace all uses of alloca instruction with the converted generic address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine the generic addresscast and the corresponding Load, Store, Bitcast, GEP Instruction together. Patched by Xuetian Weng (xweng@google.com). Test Plan: test/CodeGen/NVPTX/lower-alloca.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: meheff, broune, eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10483 llvm-svn: 239964
*	Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters	Philip Reames	2015-06-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The original change broke clang side tests. I will be submitting those momentarily. This change includes post commit feedback on the original change from from Pete Cooper. Original Submission comments: If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239849
*	Revert 239795	Philip Reames	2015-06-16	1	-1/+1
\| \| \| \| \| \|	I forgot to update some clang test cases. I'll fix and resubmit tomorrow. llvm-svn: 239800
*	[InstCombine] Propagate non-null facts to call parameters	Philip Reames	2015-06-16	1	-1/+1
\| \| \| \| \| \| \| \|	If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239795
*	[NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces	Jingyue Wu	2015-06-09	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We used to assume V->RAUW only modifies the operand list of V's user. However, if V and V's user are Constants, RAUW may replace and invalidate V's user entirely. This patch fixes the above issue by letting the caller replace the operand instead of calling RAUW on Constants. Test Plan: @nested_const_expr and @rauw in access-non-generic.ll Reviewers: broune, jholewinski Reviewed By: broune, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10345 llvm-svn: 239435
*	The constant initialization for globals in NVPTX is generated as an	Samuel Antao	2015-06-09	1	-0/+23
\| \| \| \| \| \| \| \| \| \|	array of bytes. The generation of this byte arrays was expecting the host to be little endian, which prevents big endian hosts to be used in the generation of the PTX code. This patch fixes the problem by changing the way the bytes are extracted so that it works for either little and big endian. llvm-svn: 239412
*	[NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces	Jingyue Wu	2015-06-09	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This cleans up most allocas NVPTXLowerKernelArgs emits for byval parameters. Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store. Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10322 llvm-svn: 239368
*	[NVPTX] roll forward r239082	Jingyue Wu	2015-06-04	6	-18/+44
\| \| \| \| \| \| \| \| \|	NVPTXISelDAGToDAG translates "addrspacecast to param" to NVPTX::nvvm_ptr_gen_to_param Added an llc test in bug21465. llvm-svn: 239100
*	Revert r239082	Jingyue Wu	2015-06-04	6	-40/+18
\| \| \| \| \| \|	llc crashed for NVPTX backend llvm-svn: 239094
*	[NVPTX] kernel pointer arguments point to the global address space	Jingyue Wu	2015-06-04	6	-18/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a pointer in the global address space. This change, along with NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.* and st.global.* for accessing kernel pointer arguments. Minor changes: 1. refactor: extract function convertToPointerInAddrSpace 2. fix a bug in the test case in bug21465.ll Test Plan: lower-kernel-ptr-arg.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: wengxt, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10154 llvm-svn: 239082
*	[NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast	Jingyue Wu	2015-05-29	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast from longer chains consisting of GEPs and BitCasts. For example, it can now optimize %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]* %1 = gep [10 x float]* %0, i64 0, i64 %i %2 = bitcast float* %1 to i32* %3 = load i32* %2 ; emits ld.u32 to %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)* %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32 Test Plan: @ld_int_from_global_float in access-non-generic.ll Reviewers: broune, eliben, jholewinski, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10074 llvm-svn: 238574
*	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers	Justin Holewinski	2015-04-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	We need to track if an AddrSpaceCast expression was seen when generating an MCExpr for a ConstantExpr. This change introduces a custom lowerConstant method to the NVPTX asm printer that will create NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode the information that a given symbol needs to be casted to a generic address. llvm-svn: 236000