bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SDAG] Remove the reliance on MI's allocation strategy for	Chandler Carruth	2018-08-14	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
*	AMDGPU: Use splat vectors for undefs when folding canonicalize	Matt Arsenault	2018-08-12	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \|	If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512
*	AMDGPU: Fix packing undef parts of build_vector	Matt Arsenault	2018-08-12	1	-4/+13
\| \| \| \|	llvm-svn: 339511
*	AMDGPU: More canonicalized operations	Matt Arsenault	2018-08-10	1	-0/+7
\| \| \| \|	llvm-svn: 339464
*	AMDGPU: Combine and of seto/setuo and fp_class	Matt Arsenault	2018-08-10	1	-0/+23
\| \| \| \| \| \|	Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462
*	AMDGPU: Match isfinite pattern to class instructions	Matt Arsenault	2018-08-10	1	-3/+13
\| \| \| \|	llvm-svn: 339460
*	AMDGPU: Add LLVM_FALLTHROUGH	Matt Arsenault	2018-08-10	1	-0/+2
\| \| \| \|	llvm-svn: 339458
*	AMDGPU: Error more gracefully on libcalls	Matt Arsenault	2018-08-08	1	-0/+3
\| \| \| \| \| \| \|	I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271
*	AMDGPU: Fix shifts for i128	Matt Arsenault	2018-08-08	1	-0/+4
\| \| \| \|	llvm-svn: 339270
*	AMDGPU: cvt_pk_rtz_f16 canonicalizes	Matt Arsenault	2018-08-06	1	-0/+13
\| \| \| \|	llvm-svn: 339078
*	AMDGPU: Handle some vector operations in isCanonicalized	Matt Arsenault	2018-08-06	1	-0/+20
\| \| \| \|	llvm-svn: 339077
*	AMDGPU: Push fcanonicalize through partially constant build_vector	Matt Arsenault	2018-08-06	1	-1/+37
\| \| \| \| \| \| \|	This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072
*	AMDGPU: Refactor fcanonicalize combine	Matt Arsenault	2018-08-06	1	-36/+30
\| \| \| \| \| \|	This will make more complex combines easier. llvm-svn: 339070
*	AMDGPU: Treat more custom operations as canonicalizing	Matt Arsenault	2018-08-06	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065
*	AMDGPU: Conversions always produce canonical results	Matt Arsenault	2018-08-06	1	-7/+2
\| \| \| \| \| \| \| \| \|	Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064
*	AMDGPU: Fix implementation of isCanonicalized	Matt Arsenault	2018-08-06	1	-46/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061
*	DAG: Enhance isKnownNeverNaN	Matt Arsenault	2018-08-03	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
*	[AMDGPU] Minor change to d16 buffer load implementation	Tim Renouf	2018-08-02	1	-17/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784
*	AMDGPU: Fix scalarizing v4f16 fcanonicalize	Matt Arsenault	2018-08-02	1	-0/+2
\| \| \| \|	llvm-svn: 338714
*	AMDGPU: Improve hack for packing conversion ops	Matt Arsenault	2018-08-01	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \|	Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619
*	AMDGPU: Partially fix handling of packed amdgpu_ps arguments	Matt Arsenault	2018-08-01	1	-62/+9
\| \| \| \| \| \| \| \|	Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618
*	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero	Ryan Taylor	2018-08-01	1	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
*	AMDGPU: Add clamp bit to dot intrinsics	Konstantin Zhuravlyov	2018-08-01	1	-3/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
*	AMDGPU: Break 64-bit arguments into 32-bit pieces	Matt Arsenault	2018-07-31	1	-3/+16
\| \| \| \|	llvm-svn: 338421
*	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls	Matt Arsenault	2018-07-31	1	-2/+24
\| \| \| \| \| \| \|	This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418
*	AMDGPU: Scalarize vector argument types to calls	Matt Arsenault	2018-07-31	1	-31/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
*	AMDGPU: Don't handle FP16_TO_FP in isCanonicalized	Matt Arsenault	2018-07-31	1	-4/+0
\| \| \| \| \| \| \|	This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381
*	AMDGPU: Fold undef fcanonicalize to qNaN	Matt Arsenault	2018-07-31	1	-2/+10
\| \| \| \| \| \| \| \| \| \|	We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376
*	AMDGPU: Stop wasting argument registers with v3i32/v3f32	Matt Arsenault	2018-07-28	1	-0/+46
\| \| \| \| \| \| \| \| \| \|	SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
*	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"	Matt Arsenault	2018-07-20	1	-6/+3
\| \| \| \| \| \|	Reverts r337079 with fix for msan error. llvm-svn: 337535
*	[AMDGPU] [AMDGPU] Support a fdot2 pattern.	Farhana Aleen	2018-07-16	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
*	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"	Evgeniy Stepanov	2018-07-14	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
*	AMDGPU: Properly handle shader inputs with split arguments	Matt Arsenault	2018-07-13	1	-12/+27
\| \| \| \| \| \| \| \| \| \|	This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022
*	AMDGPU: Fix handling of alignment padding in DAG argument lowering	Matt Arsenault	2018-07-13	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
*	AMDGPU: Refactor Subtarget classes	Tom Stellard	2018-07-11	1	-29/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
*	AMDGPU: Separate R600 and GCN TableGen files	Tom Stellard	2018-06-28	1	-11/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
*	AMDGPU: Remove MFI::ABIArgOffset	Matt Arsenault	2018-06-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830
*	AMDGPU: Error on calls from graphics shaders	Matt Arsenault	2018-06-28	1	-0/+7
\| \| \| \| \| \| \| \|	In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829
*	[AMDGPU] Convert rcp to rcp_iflag	Stanislav Mekhanoshin	2018-06-27	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \|	If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
*	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic	Stanislav Mekhanoshin	2018-06-26	1	-0/+3
\| \| \| \| \| \| \| \|	This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654
*	AMDGPU: Remove commented out code	Matt Arsenault	2018-06-25	1	-2/+0
\| \| \| \|	llvm-svn: 335486
*	AMDGPU: Remove old-style image intrinsics	Nicolai Haehnle	2018-06-21	1	-303/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
*	AMDGPU: Select MIMG instructions manually in SITargetLowering	Nicolai Haehnle	2018-06-21	1	-28/+249
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
*	AMDGPU: Refactor MIMG instruction TableGen using generic tables	Nicolai Haehnle	2018-06-21	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227
*	AMDGPU: Use generic tables instead of SearchableTable	Nicolai Haehnle	2018-06-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
*	AMDGPU: Turn D16 for MIMG instructions into a regular operand	Nicolai Haehnle	2018-06-21	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222
*	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc	Stanislav Mekhanoshin	2018-06-16	1	-17/+43
\| \| \| \| \| \| \| \| \|	This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882
*	AMDGPU: Add combine for short vector extract_vector_elts	Matt Arsenault	2018-06-15	1	-1/+42
\| \| \| \| \| \| \| \| \| \|	Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836
*	AMDGPU: Make v4i16/v4f16 legal	Matt Arsenault	2018-06-15	1	-51/+155
\| \| \| \| \| \| \|	Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
*	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering	Tom Stellard	2018-06-13	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607