bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LTO] Add option to emit assembly from LTOCodeGenerator	Tobias Edler von Koch	2015-11-19	1	-7/+8
\| \| \| \| \| \| \| \| \| \|	This adds a new API, LTOCodeGenerator::setFileType, to choose the output file format for LTO CodeGen. A corresponding change to use this new API from llvm-lto and a test case is coming in a separate commit. Differential Revision: http://reviews.llvm.org/D14554 llvm-svn: 253622
*	[WinEH] Disable most forms of demotion	Reid Kleckner	2015-11-19	1	-117/+5
\| \| \| \| \| \| \| \| \| \|	Now that the register allocator knows about the barriers on funclet entry and exit, testing has shown that this is unnecessary. We still demote PHIs on unsplittable blocks due to the differences between the IR CFG and the Machine CFG. llvm-svn: 253619
*	Expand subregisters in MachineFrameInfo::getPristineRegs	Krzysztof Parzyszek	2015-11-19	1	-4/+3
\| \| \| \| \| \|	http://reviews.llvm.org/D14719 llvm-svn: 253600
*	[CGP] despeculate expensive cttz/ctlz intrinsics	Sanjay Patel	2015-11-19	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is another step towards allowing SimplifyCFG to speculate harder, but then have CGP clean things up if the target doesn't like it. Previous patches in this series: http://reviews.llvm.org/D12882 http://reviews.llvm.org/D13297 D13297 should catch most expensive ops, but speculation of cttz/ctlz requires special handling because of weirdness in the intrinsic definition for handling a zero input (that definition can probably be blamed on x86). For example, if we have the usual speculated-by-select expensive op pattern like this: %tobool = icmp eq i64 %A, 0 %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true) ; is_zero_undef == true %cond = select i1 %tobool, i64 64, i64 %0 ret i64 %cond There's an instcombine that will turn it into: %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 false) ; is_zero_undef == false This CGP patch is looking for that case and despeculating it back into: entry: %tobool = icmp eq i64 %A, 0 br i1 %tobool, label %cond.end, label %cond.true cond.true: %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true) ; is_zero_undef == true br label %cond.end cond.end: %cond = phi i64 [ %0, %cond.true ], [ 64, %entry ] ret i64 %cond This unfortunately may lead to poorer codegen (see the changes in the existing x86 test), but if we increase speculation in SimplifyCFG (the next step in this patch series), then we should avoid those kinds of cases in the first place. The need for this patch was originally mentioned here: http://reviews.llvm.org/D7506 with follow-up here: http://reviews.llvm.org/D7554 Differential Revision: http://reviews.llvm.org/D14630 llvm-svn: 253573
*	X86: More efficient legalization of wide integer compares	Hans Wennborg	2015-11-19	5	-1/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In particular, this makes the code for 64-bit compares on 32-bit targets much more efficient. Example: define i32 @test_slt(i64 %a, i64 %b) { entry: %cmp = icmp slt i64 %a, %b br i1 %cmp, label %bb1, label %bb2 bb1: ret i32 1 bb2: ret i32 2 } Before this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax setae %al cmpl 16(%esp), %ecx setge %cl je .LBB2_2 movb %cl, %al .LBB2_2: testb %al, %al jne .LBB2_4 movl $1, %eax retl .LBB2_4: movl $2, %eax retl After this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax sbbl 16(%esp), %ecx jge .LBB1_2 movl $1, %eax retl .LBB1_2: movl $2, %eax retl Differential Revision: http://reviews.llvm.org/D14496 llvm-svn: 253572
*	Revert "Change memcpy/memset/memmove to have dest and source alignments."	Pete Cooper	2015-11-19	2	-36/+32
\| \| \| \| \| \| \| \| \| \|	This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
*	Change memcpy/memset/memmove to have dest and source alignments.	Pete Cooper	2015-11-18	2	-32/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
*	[DAGCombiner] Vector constant folding for comparisons	Simon Pilgrim	2015-11-18	1	-6/+14
\| \| \| \| \| \| \| \| \| \|	This patch adds support for vector constant folding of integer/float comparisons. This requires FoldConstantVectorArithmetic to support scalar constant operands (in this case ISD::CONDCASE). In future we should be able to support other scalar constant types as necessary (and possibly start calling FoldConstantVectorArithmetic for all node creations) Differential Revision: http://reviews.llvm.org/D14683 llvm-svn: 253504
*	[PGO] Value profiling support	Betul Buyukkurt	2015-11-18	1	-1/+2
\| \| \| \| \| \| \| \| \|	This change introduces an instrumentation intrinsic instruction for value profiling purposes, the lowering of the instrumentation intrinsic and raw reader updates. The raw profile data files for llvm-profdata testing are updated. llvm-svn: 253484
*	[SelectionDAGBuilder] Make sure DemoteReg ends up in right reg-class.	Jonas Paulsson	2015-11-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The virtual register containing the address for returned value on stack should in the DAG be represented with a CopyFromReg node and not a Register node. Otherwise, InstrEmitter will not make sure that it ends up in the right register class for the target instruction. SystemZ needs this, becuause the reg class for address registers is a subset of the general 64 bit register class. test/SystemZ/CodeGen/args-07.ll and args-04.ll updated to run with -verify-machineinstrs. Reviewed by Hal Finkel. llvm-svn: 253461
*	Stop producing .data.rel sections.	Rafael Espindola	2015-11-18	2	-12/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a section is rw, it is irrelevant if the dynamic linker will write to it or not. It looks like llvm implemented this because gcc was doing it. It looks like gcc implemented this in the hope that it would put all the relocated items close together and speed up the dynamic linker. There are two problem with this: * It doesn't work. Both bfd and gold will map .data.rel to .data and concatenate the input sections in the order they are seen. * If we want a feature like that, it can be implemented directly in the linker since it knowns where the dynamic relocations are. llvm-svn: 253436
*	Remove a redundant assertion in MachineBasicBlock.cpp. NFC.	Cong Hou	2015-11-18	1	-1/+0
\| \| \| \|	llvm-svn: 253426
*	Remove redundant code in MachineBasicBlock.cpp. NFC.	Cong Hou	2015-11-18	1	-28/+8
\| \| \| \|	llvm-svn: 253425
*	Improving edge probabilities computation when choosing the best successor in ↵	Cong Hou	2015-11-18	1	-13/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	machine block placement. When looking for the best successor from the outer loop for a block belonging to an inner loop, the edge probability computation can be improved so that edges in the inner loop are ignored. For example, suppose we are building chains for the non-loop part of the following code, and looking for B1's best successor. Assume the true body is very hot, then B3 should be the best candidate. However, because of the existence of the back edge from B1 to B0, the probability from B1 to B3 can be very small, preventing B3 to be its successor. In this patch, when computing the probability of the edge from B1 to B3, the weight on the back edge B1->B0 is ignored, so that B1->B3 will have 100% probability. if (...) do { B0; ... // some branches B1; } while(...); else B2; B3; Differential revision: http://reviews.llvm.org/D10825 llvm-svn: 253414
*	Generalize ownership/passing semantics to allow dsymutil to own ↵	David Blaikie	2015-11-18	1	-12/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	abbreviations via unique_ptr While still allowing CodeGen/AsmPrinter in llvm to own them using a bump ptr allocator. (might be nice to replace the pointers there with something that at least automatically calls their dtors, if that's necessary/useful, rather than having it done explicitly (I think a typed BumpPtrAllocator already does this, or maybe a unique_ptr with a custom deleter, etc)) llvm-svn: 253409
*	[WinEH] Move WinEHFuncInfo from MachineModuleInfo to MachineFunction	Reid Kleckner	2015-11-17	6	-70/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Now that there is a one-to-one mapping from MachineFunction to WinEHFuncInfo, we don't need to use a DenseMap to select the right WinEHFuncInfo for the current funclet. The main challenge here is that X86WinEHStatePass is an IR pass that doesn't have access to the MachineFunction. I gave it its own WinEHFuncInfo object that it uses to calculate state numbers, which it then throws away. As long as nobody creates or removes EH pads between this pass and SDAG construction, we will get the same state numbers. The other thing X86WinEHStatePass does is to mark the EH registration node. Instead of communicating which alloca was the registration through WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This intrinsic generates no code and simply marks the alloca in use. Reviewers: JCTremoulet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14668 llvm-svn: 253378
*	Lower statepoints with multi-def targets.	Pat Gavlin	2015-11-17	1	-7/+11
\| \| \| \| \| \| \| \| \| \|	Statepoint lowering currently expects that the target method of a statepoint only defines a single value. This precludes using statepoints with ABIs that return values in multiple registers (e.g. the SysV AMD64 ABI). This change adds support for lowering statepoints with mutli-def targets. llvm-svn: 253339
*	Use TargetRegisterInfo for printing MachineOperand register comments	Dan Gohman	2015-11-17	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several places in AsmPrinter.cpp print comments describing MachineOperand registers using MCRegisterInfo, which uses MCOperand-oriented names. This doesn't work for targets that use virtual registers exclusively, as WebAssembly does, since virtual registers are represented and printed differently. This patch preserves what seems to be the spirit of r229978, avoiding the use of TM.getSubtargetImpl(), while still using MachineOperand-oriented printing for MachineOperands. Differential Revision: http://reviews.llvm.org/D14709 llvm-svn: 253338
*	Drop prelink support.	Rafael Espindola	2015-11-17	3	-40/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way prelink used to work was * The compiler decides if a given section only has relocations that are know to point to the same DSO. If so, it names it .data.rel.ro.local<something>. * The static linker puts all of these together. * The prelinker program assigns addresses to each library and resolves the local relocations. There are many problems with this: * It is incompatible with address space randomization. * The information passed by the compiler is redundant. The linker knows if a given relocation is in the same DSO or not. If could sort by that if so desired. * There are newer ways of speeding up DSO (gnu hash for example). * Even if we want to implement this again in the compiler, the previous implementation is pretty broken. It talks about relocations that are "resolved by the static linker". If they are resolved, there are none left for the prelinker. What one needs to track is if an expression will require only dynamic relocations that point to the same DSO. At this point it looks like the prelinker is an historical curiosity. For example, fedora has retired it because it failed to build for two releases (http://pkgs.fedoraproject.org/cgit/prelink.git/commit/?id=eb43100a8331d91c801ee3dcdb0a0bb9babfdc1f) This patch removes support for it. That is, it stops printing the ".local" sections. llvm-svn: 253280
*	Assume lane masks are always precise	Matthias Braun	2015-11-17	2	-48/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allowing imprecise lane masks in case of more than 32 sub register lanes lead to some tricky corner cases, and I need another bugfix for another one. Instead I rather declare lane masks as precise and let tablegen abort if we do not have enough bits. This does not affect any in-tree target, even AMDGPU only needs 16 lanes at the moment. If the 32 lanes turn out to be a problem in the future, then we can easily change the LaneBitmask typedef to uint64_t. Differential Revision: http://reviews.llvm.org/D14557 llvm-svn: 253279
*	[DIBuilder] Make createReferenceType take size and align	Keno Fischer	2015-11-16	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Since we're passing references to dbg.value as pointers, we need to have the frontend properly declare their sizes and alignments (as it already does for regular pointers) in preparation for my upcoming patch to have the verifer check that the sizes agree. Also augment the backend logic that skips actually emitting this information into DWARF such that it also handles reference types. Reviewers: aprantl, dexonsmith, dblaikie Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D14275 llvm-svn: 253186
*	Reduce the size of MCRelaxableFragment.	Akira Hatanaka	2015-11-14	2	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MCRelaxableFragment previously kept a copy of MCSubtargetInfo and MCInst to enable re-encoding the MCInst later during relaxation. A copy of MCSubtargetInfo (instead of a reference or pointer) was needed because the feature bits could be modified by the parser. This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment with a constant reference to MCSubtargetInfo. The copies of MCSubtargetInfo are kept in MCContext, and the target parsers are now responsible for asking MCContext to provide a copy whenever the feature bits of MCSubtargetInfo have to be toggled. With this patch, I saw a 4% reduction in peak memory usage when I compiled verify-uselistorder.lto.bc using llc. rdar://problem/21736951 Differential Revision: http://reviews.llvm.org/D14346 llvm-svn: 253127
*	[ShrinkWrapping] Disable the optimization for functions with sanitize like	Quentin Colombet	2015-11-14	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	attribute. Even if the target supports shrink-wrapping, the prologue and epilogue must not move because a crash can happen anywhere and sanitizers need to be able to unwind from the PC of the crash. llvm-svn: 253116
*	MachineScheduler: Print initial pressure in debug dump	Matthias Braun	2015-11-13	1	-0/+7
\| \| \| \|	llvm-svn: 253097
*	MachineScheduler: Improve debug output for "only one node in readyset"	Matthias Braun	2015-11-13	1	-2/+2
\| \| \| \| \| \| \|	When there is only 1 node left in the ready queue and it is picked call the reason "ONLY1" instead of "NOCAND". llvm-svn: 253096
*	[SDAG] Fix expansion of BITREVERSE	James Molloy	2015-11-13	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	Richard Trieu noted that UBSan detected an overflowing shift, and the obvious fix caused a crash. What was happening was that the shiftee (1U) was indeed too small for the possible range of shifts it had to handle, but also we were using "VT.getSizeInBits()" to get the maximum type bitwidth, but we wanted "VT.getScalarSizeInBits()" to get the vector lane size instead of the entire vector size. Use an APInt for the shift and VT.getScalarSizeInBits(). llvm-svn: 253023
*	[ImplicitNulls] Add some clarifying comments; NFC	Sanjoy Das	2015-11-13	1	-1/+25
\| \| \| \|	llvm-svn: 253020
*	Revert "Remove unnecessary call to getAllocatableRegClass"	Tom Stellard	2015-11-12	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
*	[ImplicitNulls] Fix wrapping by breaking up a condition, NFC	Sanjoy Das	2015-11-12	1	-4/+4
\| \| \| \|	llvm-svn: 252947
*	[ImplicitNull] Extract out a HazardDetector class, NFC	Sanjoy Das	2015-11-12	1	-53/+96
\| \| \| \| \| \|	This will make later functional changes easier to follow. llvm-svn: 252946
*	[ShrinkWrap] Fix a typo in a comment.	Quentin Colombet	2015-11-12	1	-1/+1
\| \| \| \|	llvm-svn: 252918
*	[ShrinkWrap] Make sure we do not mess up with EH funclet lowering.	Quentin Colombet	2015-11-12	1	-1/+11
\| \| \| \| \| \| \| \|	ShrinkWrapping does not understand exception handling constraints for now, so make sure we do not mess with them by aborting on functions that use EH funclets. llvm-svn: 252917
*	[WinEH] Fix problem with removing an element from a SetVector while iterating.	Andrew Kaylor	2015-11-12	1	-11/+5
\| \| \| \| \| \|	Patch provided by Yaron Keren. (Thanks!) llvm-svn: 252913
*	[SDAG] Introduce a new BITREVERSE node along with a corresponding LLVM intrinsic	James Molloy	2015-11-12	7	-2/+64
\| \| \| \| \| \| \| \| \| \|	Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target. This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support. The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently). llvm-svn: 252878
*	LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization	Matthias Braun	2015-11-12	1	-73/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Factor out code to query and modify the sign bit of a floatingpoint value as an integer. This also works if none of the targets integer types is big enough to hold all bits of the floatingpoint value. - Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available, otherwise perform bit manipulation on the sign bit. The previous code used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also takes 34 instructions on ARM Cortex-M4. With this patch we only require 5: vldr d0, LCPI0_0 vmov r2, r3, d0 lsrs r2, r3, #31 bfi r1, r2, #31, #1 bx lr (This could be further improved if the compiler would recognize that r2, r3 is zero). - Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is available otherwise perform bit manipulation on the sign bit. - Perform the sign(x) test by masking out the sign bit and comparing with 0 rather than shifting the sign bit to the highest position and testing for "<s 0". For x86 copysignl (on 80bit values) this gets us: testl $32768, %eax rather than: shlq $48, %rax sets %al testb %al, %al Differential Revision: http://reviews.llvm.org/D11172 llvm-svn: 252839
*	[WinEH] Don't forward branches across empty EH pad BBs	Reid Kleckner	2015-11-11	2	-1/+2
\| \| \| \| \| \| \|	For really simple SEH catchpads, we tried to forward the invoke unwind edge across the empty block. llvm-svn: 252822
*	[DAGCombiner] Improve zextload optimization.	Geoff Berry	2015-11-11	1	-22/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Don't fold (zext (and (load x), cst)) -> (and (zextload x), (zext cst)) if (and (load x) cst) will match as a zextload already and has additional users. For example, the following IR: %load = load i32, i32* %ptr, align 8 %load16 = and i32 %load, 65535 %load64 = zext i32 %load16 to i64 store i32 %load16, i32* %dst1, align 4 store i64 %load64, i64* %dst2, align 8 used to produce the following aarch64 code: ldr w8, [x0] and w9, w8, #0xffff and x8, x8, #0xffff str w9, [x1] str x8, [x2] but with this change produces the following aarch64 code: ldrh w8, [x0] str w8, [x1] str x8, [x2] Reviewers: resistor, mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14340 llvm-svn: 252789
*	Add target preference for GatherAllAliases max depth	Matt Arsenault	2015-11-11	2	-1/+2
\| \| \| \|	llvm-svn: 252775
*	clang-format lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp	Dehao Chen	2015-11-11	1	-7/+8
\| \| \| \|	llvm-svn: 252769
*	Emit discriminator for inlined callsites.	Dehao Chen	2015-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Inlined callsites need to be emitted in debug info so that sample profile can be annotated to the correct inlined instance. Reviewers: dnovillo, dblaikie Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D14511 llvm-svn: 252768
*	MachineInstr: addRegisterDefReadUndef() => setRegisterDefReadUndef()	Matthias Braun	2015-11-11	2	-3/+3
\| \| \| \| \| \|	This way we can not only add but also remove read undef flags. llvm-svn: 252678
*	TableGen: Emit LaneMask for register classes without subregisters as ~0u	Matthias Braun	2015-11-10	1	-10/+13
\| \| \| \| \| \| \|	This makes it slightly easier to handle classes with and without subregister uniformly. llvm-svn: 252671
*	less indent; NFCI	Sanjay Patel	2015-11-10	1	-46/+47
\| \| \| \|	llvm-svn: 252643
*	LegalizeDAG: Implement promote for scalar_to_vector	Matt Arsenault	2015-11-10	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \|	This allows avoiding the default Expand behavior which introduces stack usage. Bitcast the scalar and replace the missing elements with undef. This is covered by existing tests and used by a future commit which makes 64-bit vectors legal types on AMDGPU. llvm-svn: 252632
*	LegalizeDAG: Implement promote for insert_vector_elt	Matt Arsenault	2015-11-10	1	-1/+52
\| \| \| \| \| \| \|	This is covered by existing tests and used by a future commit which makes 64-bit vectors legal types on AMDGPU. llvm-svn: 252631
*	LegalizeDAG: Implement promote for extract_vector_elt	Matt Arsenault	2015-11-10	1	-4/+58
\| \| \| \| \| \| \| \| \| \|	This is for AMDGPU to implement v2i64 extract as extract of half of a v4i32. This is covered by existing tests and used by a future commit which makes 64-bit vectors legal types on AMDGPU. llvm-svn: 252630
*	add 'MustReduceDepth' as an objective/cost-metric for the MachineCombiner	Sanjay Patel	2015-11-10	1	-29/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is one of the problems noted in PR25016: https://llvm.org/bugs/show_bug.cgi?id=25016 and: http://lists.llvm.org/pipermail/llvm-dev/2015-October/090998.html The spilling problem is independent and not addressed by this patch. The MachineCombiner was doing reassociations that don't improve or even worsen the critical path. This is caused by inclusion of the "slack" factor when calculating the critical path of the original code sequence. If we don't add that, then we have a more conservative cost comparison of the old code sequence vs. a new sequence. The more liberal calculation must be preserved, however, for the AArch64 MULADD patterns because benchmark regressions were observed without that. The two failing test cases now have identical asm that does what we want: a + b + c + d ---> (a + b) + (c + d) Differential Revision: http://reviews.llvm.org/D13417 llvm-svn: 252616
*	Support for emitting inline stack probes	Andy Ayers	2015-11-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages between the current stack limit and the desired new stack pointer location. This implements support for the inline expansion on x64. For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications that arise when introducing new machine basic blocks during prolog and epilog creation. Added a new test case, modified an existing one to exclude non-x64 coreclr (for now). Add test case Fix tests llvm-svn: 252578
*	Remove unnecessary call to getAllocatableRegClass	Matt Arsenault	2015-11-10	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the point of this was. I'm not sure why you would ever define an instruction that produces an unallocatable register class. No tests fail with this removed, and it seems like it should be a verifier error to define such an instruction. This was problematic for AMDGPU because it would make bad decisions by arbitrarily changing the register class when unsetting isAllocatable for VS_32/VS_64, which is currently set as a workaround to this problem. AMDGPU uses the VS_32/VS_64 register classes to represent operands which can use either VGPRs or SGPRs. When isAllocatable is unset for these, this would need to pick either the SGPR or VGPR class and insert either a copy we don't want, or an illegal copy we would need to deal with later. A semi-arbitrary register class ordering decision is made in tablegen, which resulted in always picking a VGPR class because it happens to have more registers than the SGPR register class. We really just want to use whatever register class the original register had. llvm-svn: 252565
*	MachineVerifier: Streamline live interval related error reporting	Matthias Braun	2015-11-09	1	-90/+93
\| \| \| \| \| \| \| \| \|	Simply perform additional report_context() calls after a report() instead of adding more and more overloaded variations of report(). Also improve several instances where information was output in an ad-hoc way probably because no matching report() overload was available. llvm-svn: 252552