bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	ADT: correctly report isMSVCEnvironment for windows itanium	Saleem Abdulrasool	2014-11-17	2	-2/+2
\| \| \| \| \| \| \|	The itanium environment on Windows uses MSVC and is a MSVC environment. Report this correctly. llvm-svn: 222180
*	R600/SI: Don't copy flags when extracting subreg	Matt Arsenault	2014-11-17	1	-6/+8
\| \| \| \| \| \| \| \| \|	This was resulting in use of a register after a kill. For some reason this showed up as a problem in many tests when moving the SIFixSGPRCopies pass closer to instruction selection. llvm-svn: 222175
*	R600/SI: Assume SIFixSGPRCopies makes changes	Matt Arsenault	2014-11-17	1	-1/+2
\| \| \| \| \| \|	I'm not sure if this was breaking anything. llvm-svn: 222174
*	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs	Alexey Volkov	2014-11-17	1	-2/+3
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141
*	[Thumb1] Re-write emitThumbRegPlusImmediate	Oliver Stannard	2014-11-17	1	-136/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was motivated by a bug which caused code like this to be miscompiled: declare void @take_ptr(i8) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8 %addr1) ret void } This was emitting the following assembly to get the value of %addr1: add r0, sp, #1020 add r0, r0, #8 However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this could not be assembled. The generated object file contained this, resulting in r0 holding SP+8 rather tha SP+1028: add r0, sp, #1020 add r0, sp, #8 This function looked like it could have caused miscompilations for other combinations of registers and offsets (though I don't think it is currently called with these), and the heuristic it used did not match the emitted code in all cases. llvm-svn: 222125
*	Convert some EVTs to MVTs where only a SimpleValueType is needed.	Craig Topper	2014-11-16	1	-1/+1
\| \| \| \|	llvm-svn: 222109
*	[x86] Remove two redundant isel patterns. They equivalent already exists in ↵	Craig Topper	2014-11-16	1	-5/+0
\| \| \| \| \| \|	the instruction pattern. llvm-svn: 222094
*	[X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching	Simon Pilgrim	2014-11-15	1	-8/+19
\| \| \| \| \| \| \| \| \| \|	Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 llvm-svn: 222087
*	R600: Permute operands when selecting legacy min/max	Matt Arsenault	2014-11-15	1	-6/+9
\| \| \| \| \| \| \| \| \| \|	This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079
*	R600: Fix 64-bit integer division	Tom Stellard	2014-11-15	1	-2/+2
\| \| \| \| \| \| \| \|	This fixes a failure in one of the oclconform tests. Patch by: Jan Vesely llvm-svn: 222073
*	R600: Factor i64 UDIVREM lowering into its own fuction	Tom Stellard	2014-11-15	3	-68/+84
\| \| \| \| \| \| \| \|	This is so it could potentially be used by SI. However, the current implementation does not always produce correct results, so the IntegerDivisionPass is being used instead. llvm-svn: 222072
*	Rename EH related stuff to be more precise	Reid Kleckner	2014-11-14	2	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current "WinEH" exception handling type is more about Itanium-style LSDA tables layered on top of the Windows native unwind info format instead of .eh_frame tables or EHABI unwind info. Use the name "ItaniumWinEH" to better reflect the hybrid nature of the design. Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions, since the LSDA is part of the Itanium C++ ABI document, and not the DWARF standard. Reviewers: echristo Subscribers: llvm-commits, compnerd Differential Revision: http://reviews.llvm.org/D6279 llvm-svn: 222062
*	ARM: refactor .cfi_def_cfa_offset emission.	Tim Northover	2014-11-14	1	-97/+126
\| \| \| \| \| \| \| \| \| \| \| \|	We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057
*	ARM: correctly calculate the offset of FP in its push.	Tim Northover	2014-11-14	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \|	When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056
*	R600/SI: Mark s_movk_i32 as rematerializable	Tom Stellard	2014-11-14	1	-0/+2
\| \| \| \|	llvm-svn: 222037
*	R600/SI: Fix spilling of m0 register	Tom Stellard	2014-11-14	1	-1/+9
\| \| \| \| \| \| \| \| \| \|	If we have spilled the value of the m0 register, then we need to restore it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't write to m0. v_readlane_b32 can't write to m0, so llvm-svn: 222036
*	R600/SI: Combine min3/max3 instructions	Matt Arsenault	2014-11-14	6	-10/+135
\| \| \| \|	llvm-svn: 222032
*	R600/SI: Fix verifier error from a branch on IMPLICIT_DEF	Matt Arsenault	2014-11-14	1	-0/+8
\| \| \| \| \| \|	SIILowerI1Copies wasn't correctly handling this case. llvm-svn: 222020
*	Fix unused variable warning without asserts	Matt Arsenault	2014-11-14	1	-0/+2
\| \| \| \|	llvm-svn: 222017
*	R600/SI: Match integer min / max instructions	Matt Arsenault	2014-11-14	2	-29/+86
\| \| \| \|	llvm-svn: 222015
*	R600/SI: Use S_BFE_I64 for 64-bit sext_inreg	Matt Arsenault	2014-11-14	5	-17/+82
\| \| \| \|	llvm-svn: 222012
*	[AVX512] Add 512b masked integer shift by immediate patterns.	Cameron McInally	2014-11-14	2	-29/+21
\| \| \| \|	llvm-svn: 222002
*	R600/SI: Fix assembly names for exec_hi and exec_lo	Tom Stellard	2014-11-14	1	-2/+2
\| \| \| \|	llvm-svn: 221995
*	R600/SI: Start implementing an assembler	Tom Stellard	2014-11-14	13	-30/+413
\| \| \| \| \| \| \|	This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994
*	[PowerPC] Add VSX builtins for vec_div	Bill Schmidt	2014-11-14	1	-0/+6
\| \| \| \| \| \| \| \| \|	This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983
*	R600/SI: Make constant array static	Matt Arsenault	2014-11-14	1	-1/+1
\| \| \| \|	llvm-svn: 221965
*	X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR.	Tim Northover	2014-11-14	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|	getTargetConstant should only be used when you can guarantee the instruction selected will be able to cope with the raw value. BUILD_VECTOR is rather too generic for this so we should use getConstant instead. In that case, an instruction can still consume the constant, but if it doesn't it'll be materialised through its own round of ISel. Should fix PR21352. llvm-svn: 221961
*	Fix build of Mips code with MSVC by using our macro instead of ↵	Reid Kleckner	2014-11-14	2	-5/+4
\| \| \| \| \| \|	__attribute__((unused)) directly llvm-svn: 221956
*	First stage of call lowering for Mips fast-isel	Reed Kotler	2014-11-13	3	-2/+319
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This has most of what is needed for mips fast-isel call lowering for O32. What is missing I will add on the next patch because this patch is already too large. It should not be doing anything wrong but it will punt on some cases that it is basically capable of doing. The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues. The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers. However there is a way to think about this all very simply and this implementation reflects that. Basically, the ABI rules are written as if everything is passed on the stack and aligned as such. Once that is conceptually done, it is nearly trivial to reassign those locations to registers and then all the complexity disappears. So I have told tablegen that all the data is passed on the stack and during the lowering I fix this by assigning to registers as per the ABI doc. This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what is going on. Test Plan: callabi.ll Reviewers: dsanders Reviewed By: dsanders Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5714 llvm-svn: 221948
*	R600/SI: Fix fmin_legacy / fmax_legacy matching for SI	Matt Arsenault	2014-11-13	5	-33/+75
\| \| \| \| \| \|	select_cc is expanded on SI, so this was never matched. llvm-svn: 221941
*	We can get the TLOF from the TargetMachine - so constructor no longer ↵	Aditya Nandakumar	2014-11-13	12	-12/+12
\| \| \| \| \| \|	requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
*	[FastISel][AArch64] Don't bail during simple GEP instruction selection.	Juergen Ributzka	2014-11-13	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923
*	R600/SI: Use s_movk_i32	Matt Arsenault	2014-11-13	3	-2/+17
\| \| \| \|	llvm-svn: 221922
*	R600/SI: Fix definition for s_cselect_b32	Matt Arsenault	2014-11-13	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \|	These were directly using the old base instruction class, and specifying the wrong register classes for operands. The operands can be the other special inputs besides SGPRs. The op name was also being directly used for the asm string, so this was printed without any operands. llvm-svn: 221921
*	R600: Fix assert on empty function	Matt Arsenault	2014-11-13	1	-1/+0
\| \| \| \| \| \| \| \|	If a function is just an unreachable, this would hit a "this is not a MachO target" assertion because of setting HasSubsectionViaSymbols. llvm-svn: 221920
*	R600: Error on initializer for LDS.	Matt Arsenault	2014-11-13	1	-2/+21
\| \| \| \| \| \|	Also give a proper error for other address spaces. llvm-svn: 221917
*	R600/SI: Get rid of FCLAMP_SI pseudo	Matt Arsenault	2014-11-13	4	-25/+16
\| \| \| \| \| \| \|	It's not necessary. Also use complex patterns to allow src modifier usage. llvm-svn: 221916
*	R600/SI: Allow commuting with src2_modifiers	Matt Arsenault	2014-11-13	1	-5/+0
\| \| \| \|	llvm-svn: 221911
*	R600/SI: Allow commuting some 3 op instructions	Matt Arsenault	2014-11-13	1	-3/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c This simplifies matching v_madmk_f32. This looks somewhat surprising, but it appears to be OK to do this. We can commute src0 and src1 in all of these instructions, and that's all that appears to matter. llvm-svn: 221910
*	ARM: allow constpool entry to be moved to the user's block in all cases.	Tim Northover	2014-11-13	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally entries can only move to a lower address, but when that wasn't viable, the user's block was considered anyway. Unfortunately, it went via createNewWater which wasn't designed to handle the case where there's already an island after the block. Unfortunately, the test we have is slow and fragile, and I couldn't reduce it to anything sane even with the @llvm.arm.space intrinsic. The test change here is recreating the previous one after the change. rdar://problem/18545506 llvm-svn: 221905
*	ARM: avoid duplicating branches during constant islands.	Tim Northover	2014-11-13	1	-6/+10
\| \| \| \| \| \| \| \| \|	We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904
*	ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.	Tim Northover	2014-11-13	3	-0/+10
\| \| \| \| \| \| \| \|	Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903
*	[Hexagon]	Colin LeMahieu	2014-11-13	1	-4/+4
\| \| \| \| \| \|	NFC Renaming reserved identifier. llvm-svn: 221898
*	AVX-512: SINT_TO_FP cost model and some bugfixes	Elena Demikhovsky	2014-11-13	2	-4/+25
\| \| \| \| \| \| \|	Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883
*	This patch changes the ownership of TLOF from TargetLoweringBase to ↵	Aditya Nandakumar	2014-11-13	36	-59/+136
\| \| \| \| \| \|	TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
*	[x86] Teach the vector shuffle lowering to make a more nuanced decision	Chandler Carruth	2014-11-13	1	-12/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is always better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861
*	[x86] Don't form overly fragmented blends when splitting and	Chandler Carruth	2014-11-13	1	-2/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring many times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. llvm-svn: 221852
*	[FastISel][AArch64] Optimize select when one of the operands is a 'true' or ↵	Juergen Ributzka	2014-11-13	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \|	'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848
*	[FastISel][AArch64] Fold the cmp into the select when possible.	Juergen Ributzka	2014-11-13	1	-0/+54
\| \| \| \| \| \| \| \| \|	This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847
*	[FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.	Juergen Ributzka	2014-11-13	1	-34/+46
\| \| \| \| \| \|	Related to rdar://problem/18960150. llvm-svn: 221846