bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[Verifier] Remove the -verify-debug-info cl::opt	Vedant Kumar	2017-11-02	1	-1/+1
\| \| \| \| \| \| \|	This cl::opt has been dead for a while. It's no longer possible to run the verifier without also verifying debug info. llvm-svn: 317288
*	Improve clamp recognition in ValueTracking.	Artur Gainullin	2017-10-27	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 llvm-svn: 316795
*	[NVPTX] allow address space inference for volatile loads/stores.	Artem Belevich	2017-10-24	1	-0/+97
\| \| \| \| \| \| \| \| \| \|	If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495
*	[NVPTX] Implemented wmma intrinsics and instructions.	Artem Belevich	2017-10-12	1	-0/+201
\| \| \| \| \| \| \| \| \| \|	WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601
*	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.	Artem Belevich	2017-09-26	1	-0/+117
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223
*	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & ↵	Justin Lebar	2017-09-25	1	-117/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142
*	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.	Artem Belevich	2017-09-25	1	-0/+117
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135
*	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} ↵	Artem Belevich	2017-09-21	2	-0/+97
\| \| \| \| \| \| \| \|	instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898
*	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.	Artem Belevich	2017-09-20	1	-0/+94
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820
*	[CUDA] Added rudimentary support for CUDA-9 and sm_70.	Artem Belevich	2017-09-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734
*	Canonicalize the representation of empty an expression in ↵	Adrian Prantl	2017-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. llvm-svn: 312144
*	[NVPTX] Add lowering of i128 params.	Artem Belevich	2017-07-20	3	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 308675
*	Extend memcpy expansion in Transform/Utils to handle wider operand types.	Sean Fertile	2017-07-07	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \|	Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
*	Reverting r307326 because it breaks clang tests.	Michael Kuperstein	2017-07-06	3	-93/+0
\| \| \| \|	llvm-svn: 307334
*	[NVPTX] Add lowering of i128 params.	Michael Kuperstein	2017-07-06	3	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
*	Recommit "r306541 - Add zero-length check to memcpy/memset load store loop ↵	Teresa Johnson	2017-07-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	expansion"" With fix for use-after-free errors. We can't add the new branch and remove the old one until we are done with the Builder constructed for the block. llvm-svn: 306937
*	Revert "r306541 - Add zero-length check to memcpy/memset load store loop ↵	Daniel Jasper	2017-06-30	1	-4/+0
\| \| \| \| \| \| \| \| \|	expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. llvm-svn: 306793
*	Add zero-length check to memcpy/memset load store loop expansion	Teresa Johnson	2017-06-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 llvm-svn: 306541
*	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."	Hans Wennborg	2017-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369
*	Only enable LiveRangeShrink for x86.	Dehao Chen	2017-05-17	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292
*	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ↵	Simon Pilgrim	2017-05-15	9	-3616/+3618
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082
*	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)	Simon Pilgrim	2017-05-12	3	-4/+3
\| \| \| \| \| \| \| \|	This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 llvm-svn: 302942
*	Add LiveRangeShrink pass to shrink live range within BB.	Dehao Chen	2017-05-12	3	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 302938
*	[SelectionDAG] Improve support for promotion of <1 x fX> floating point ↵	Simon Pilgrim	2017-05-02	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	argument types (PR31088) PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats. This patch adds support for extension/truncation for both integer and float types. Differential Revision: https://reviews.llvm.org/D32391 llvm-svn: 301910
*	Add address space mangling to lifetime intrinsics	Matt Arsenault	2017-04-10	1	-4/+4
\| \| \| \| \| \|	In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
*	[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.	Jonas Paulsson	2017-03-16	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930
*	[NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.	Artem Belevich	2017-03-07	2	-0/+72
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D30672 llvm-svn: 297198
*	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)	Taewook Oh	2017-03-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825
*	[DAGCombiner] add missing folds for scalar select of {-1,0,1}	Sanjay Patel	2017-02-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137
*	[NVPTX] Added support for .f16x2 instructions.	Artem Belevich	2017-02-23	4	-36/+1629
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032
*	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵	Artem Belevich	2017-02-21	8	-36/+964
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
*	[NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc.	Justin Lebar	2017-02-04	1	-0/+24
\| \| \| \|	llvm-svn: 294082
*	[NVPTX] Enable combineRepeatedFPDivisors for NVPTX.	Justin Lebar	2017-02-03	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011
*	NVPTX: Fix not preserving volatile when expanding memset	Matt Arsenault	2017-02-02	1	-0/+13
\| \| \| \|	llvm-svn: 293851
*	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).	Justin Lebar	2017-01-31	2	-5/+7
\| \| \| \| \| \| \| \| \| \|	x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713
*	[DAGCombine] require UnsafeFPMath for re-association of addition	Nicolai Haehnle	2017-01-31	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635
*	[NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.	Justin Lebar	2017-01-31	3	-18/+219
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This lets us lower to sqrt.approx and rsqrt.approx under more circumstances. * Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32, when fast-math is enabled. Previously, we only would emit it for calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.) * Now we emit the ftz version of rsqrt.approx when ftz is enabled. Previously, we only emitted rsqrt.approx when ftz was disabled. Reviewers: hfinkel Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D28508 llvm-svn: 293605
*	NVPTX: Move InferAddressSpaces to generic code	Matt Arsenault	2017-01-31	2	-3/+3
\| \| \| \|	llvm-svn: 293579
*	NVPTX: Refactor NVPTXInferAddressSpaces to check TTI	Matt Arsenault	2017-01-30	1	-1/+2
\| \| \| \| \| \|	Add a new TTI hook for getting the generic address space value. llvm-svn: 293563
*	[NVPTX] Add intrinsics to support named barriers.	Arpith Chacko Jacob	2017-01-28	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384
*	Fix some broken CHECK lines.	Benjamin Kramer	2017-01-22	1	-1/+1
\| \| \| \| \| \|	The colon is important. llvm-svn: 292761
*	[NVPTX] Add explicit check for llvm.sqrt.f32 to intrinsics.ll.	Justin Lebar	2017-01-21	1	-0/+8
\| \| \| \| \| \|	Test-only change. llvm-svn: 292690
*	[NVPTX] Fix lowering of fp16 ISD::FNEG.	Artem Belevich	2017-01-19	1	-0/+15
\| \| \| \| \| \| \| \| \|	There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452
*	[NVPTX] Support global variables of integer type larger than i64.	Justin Lebar	2017-01-18	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra, majnemer Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28825 llvm-svn: 292316
*	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.	Justin Lebar	2017-01-18	1	-7/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change also lets us use max.{s,u}16. There's a vague warning in a test about this maybe being less efficient, but I could not come up with a case where the resulting SASS (sm_35 or sm_60) was different with or without max.{s,u}16. It's true that nvcc seems to emit only max.{s,u}32, but even ptxas 7.0 seems to have no problem generating efficient SASS from max.{s,u}16 (the casts up to i32 and back down to i16 seem to be implicit and nops, happening via register aliasing). In the absence of evidence, better to have fewer special cases, emit more straightforward code, etc. In particular, if a new GPU has 16-bit min/max instructions, we want to be able to use them. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28732 llvm-svn: 292304
*	[NVPTX] Lower integer absolute value idiom to abs instruction.	Justin Lebar	2017-01-18	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we lowered it literally, to shifts and xors. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28722 llvm-svn: 292303
*	[NVPTX] Improve lowering of llvm.ctpop.	Justin Lebar	2017-01-18	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302
*	[NVPTX] Add lowering for llvm.bitreverse.	Justin Lebar	2017-01-18	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301
*	[NVPTX] Fix function names in ctlz.ll test. Test-only change.	Justin Lebar	2017-01-18	1	-20/+20
\| \| \| \| \| \| \|	Looks like a copy/paste mistake, all the functions in ctlz.ll were named "ctpop". llvm-svn: 292300
*	[NVPTX] Improve lowering of llvm.ctlz.	Justin Lebar	2017-01-18	1	-18/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299