bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove 256-bit AVX non-temporal store intrinsics. Similar was previously ↵	Craig Topper	2012-05-08	1	-0/+24
\| \| \| \| \| \|	done for 128-bit. llvm-svn: 156375
*	Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.	Owen Anderson	2012-05-07	1	-0/+18
\| \| \| \|	llvm-svn: 156324
*	Fix a regression from r147481. This combine should only happen if there is a	Chad Rosier	2012-05-07	1	-1/+2
\| \| \| \| \| \| \|	single use. rdar://11360370 llvm-svn: 156316
*	X86: optimization for -(x != 0)	Manman Ren	2012-05-07	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312
*	Add support for the 'l' constraint.	Eric Christopher	2012-05-07	1	-0/+11
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156294
*	Add support for the 'c' constraint.	Eric Christopher	2012-05-07	1	-1/+7
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156293
*	Add support for the 'P' constraint.	Eric Christopher	2012-05-07	2	-0/+22
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156292
*	Add support for the 'O' constraint.	Eric Christopher	2012-05-07	2	-0/+22
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156285
*	Add support for the 'N' inline asm constraint.	Eric Christopher	2012-05-07	2	-0/+23
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156284
*	Add support for the 'L' inline asm constraint.	Eric Christopher	2012-05-07	2	-1/+22
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156283
*	Add support for the inline asm constraint 'K'.	Eric Christopher	2012-05-07	2	-0/+22
\| \| \| \|	llvm-svn: 156282
*	Add SSE4A MOVNTSS/MOVNTSD instructions.	Craig Topper	2012-05-07	1	-0/+19
\| \| \| \|	llvm-svn: 156281
*	Support the 'J' constraint.	Eric Christopher	2012-05-07	2	-0/+22
\| \| \| \| \| \|	Patch by Jack Carter. llvm-svn: 156280
*	Add support for the 'I' inline asm constraint. Also add tests	Eric Christopher	2012-05-07	5	-0/+99
\| \| \| \| \| \| \| \|	from the previous 2 patches. Patch by Jack Carter. llvm-svn: 156279
*	Switch the select to branch transformation on by default.	Benjamin Kramer	2012-05-06	2	-2/+2
\| \| \| \| \| \| \| \| \|	The primitive conservative heuristic seems to give a slight overall improvement while not regressing stuff. Make it available to wider testing. If you notice any speed regressions (or significant code size regressions) let me know! llvm-svn: 156258
*	CodeGenPrepare: Add a transform to turn selects into branches in some cases.	Benjamin Kramer	2012-05-05	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This came up when a change in block placement formed a cmov and slowed down a hot loop by 50%: ucomisd (%rdi), %xmm0 cmovbel %edx, %esi cmov is a really bad choice in this context because it doesn't get branch prediction. If we emit it as a branch, an out-of-order CPU can do a better job (if the branch is predicted right) and avoid waiting for the slow load+compare instruction to finish. Of course it won't help if the branch is unpredictable, but those are really rare in practice. This patch uses a dumb conservative heuristic, it turns all cmovs that have one use and a direct memory operand into branches. cmovs usually save some code size, so we disable the transform in -Os mode. In-Order architectures are unlikely to benefit as well, those are included in the "predictableSelectIsExpensive" flag. It would be better to reuse branch probability info here, but BPI doesn't support select instructions currently. It would make sense to use the same heuristics as the if-converter pass, which does the opposite direction of this transform. Test suite shows a small improvement here and there on corei7-level machines, but the actual results depend a lot on the used microarchitecture. The transformation is currently disabled by default and available by passing the -enable-cgp-select2branch flag to the code generator. Thanks to Chandler for the initial test case to him and Evan Cheng for providing me with comments and test-suite numbers that were more stable than mine :) llvm-svn: 156234
*	This patch adds a new NVPTX back-end to LLVM which supports code generation ↵	Justin Holewinski	2012-05-04	17	-0/+1994
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it. The new target machines are: nvptx (old ptx32) => 32-bit PTX nvptx64 (old ptx64) => 64-bit PTX The sources are based on the internal NVIDIA NVPTX back-end, and contain more functionality than the current PTX back-end currently provides. NV_CONTRIB llvm-svn: 156196
*	Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits ↵	Sebastian Pop	2012-05-04	1	-4/+14
\| \| \| \| \| \|	16-bits encoding of CMN instructions. llvm-svn: 156195
*	Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.	Craig Topper	2012-05-04	1	-0/+8
\| \| \| \|	llvm-svn: 156156
*	Support for target dependent Hexagon VLIW packetizer.	Sirish Pande	2012-05-03	4	-0/+69
\| \| \| \| \| \|	This patch creates and optimizes packets as per Hexagon ISA rules. llvm-svn: 156109
*	Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the ↵	Craig Topper	2012-05-03	1	-1/+1
\| \| \| \| \| \|	lower half correctly. Missed in r155982. llvm-svn: 156059
*	Fix two-address pass's aggressive instruction commuting heuristics. It's meant	Evan Cheng	2012-05-03	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to catch cases like: %reg1024<def> = MOV r1 %reg1025<def> = MOV r0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 By commuting ADD, it let coalescer eliminate all of the copies. However, there was a bug in the heuristics where it ended up commuting the ADD in: %reg1024<def> = MOV r0 %reg1025<def> = MOV 0 %reg1026<def> = ADD %reg1024, %reg1025 r0 = MOV %reg1026 That did no benefit but rather ensure the last MOV would not be coalesced. rdar://11355268 llvm-svn: 156048
*	Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, ↵	Owen Anderson	2012-05-02	1	-0/+9
\| \| \| \| \| \|	just like it now knows for FMULs. llvm-svn: 156029
*	Teach DAG combine that multiplication by 1.0 can always be constant folded.	Owen Anderson	2012-05-02	1	-0/+9
\| \| \| \|	llvm-svn: 156023
*	Revert r155853	Manman Ren	2012-05-02	1	-21/+0
\| \| \| \| \| \| \| \|	The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992
*	Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support ↵	Craig Topper	2012-05-02	1	-0/+14
\| \| \| \| \| \|	for AsmPrinter. llvm-svn: 155982
*	Strip the pointer casts off of allocas so that the selection DAG can find them.	Bill Wendling	2012-05-01	1	-0/+17
\| \| \| \| \| \|	PR10799 llvm-svn: 155954
*	X86: optimization for max-like struct	Manman Ren	2012-05-01	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will optimize the following cases on X86 (a > b) ? (a-b) : 0 (a >= b) ? (a-b) : 0 (b < a) ? (a-b) : 0 (b <= a) ? (a-b) : 0 FROM movl %edi, %ecx subl %esi, %ecx cmpl %edi, %esi movl $0, %eax cmovll %ecx, %eax TO xorl %eax, %eax subl %esi, %edi cmovll %eax, %edi movl %edi, %eax rdar: 10734411 llvm-svn: 155919
*	Regression test for PR2960.	Jay Foad	2012-05-01	1	-0/+13
\| \| \| \|	llvm-svn: 155912
*	X86: optimization for -(x != 0)	Manman Ren	2012-04-30	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853
*	test/CodeGen/X86/select.ll: remove spaces	Manman Ren	2012-04-30	1	-1/+1
\| \| \| \|	llvm-svn: 155840
*	Fix fastcc structure return with fast-isel on x86-32	Derek Schuff	2012-04-30	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. (this time, actually commit what was reviewed!) llvm-svn: 155825
*	Don't introduce illegal types when creating vmull operations. <rdar://11324364>	Bob Wilson	2012-04-30	1	-0/+74
\| \| \| \| \| \| \| \|	ARM BUILD_VECTORs created after type legalization cannot use i8 or i16 operands, since those types are not legal. Instead use i32 operands, which will be implicitly truncated by the BUILD_VECTOR to match the element type. llvm-svn: 155824
*	Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.	Andrew Trick	2012-04-28	1	-0/+46
\| \| \| \| \| \| \| \| \| \|	This time, also fix the caller of AddGlue to properly handle incomplete chains. AddGlue had failure modes, but shamefully hid them from its caller. It's luck ran out. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155749
*	Revert r155745	Derek Schuff	2012-04-27	1	-14/+0
\| \| \| \|	llvm-svn: 155746
*	Fix fastcc structure return with fast-isel on x86-32	Derek Schuff	2012-04-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. llvm-svn: 155745
*	Temporarily revert r155668: Fix the SD scheduler to avoid gluing.	Andrew Trick	2012-04-27	1	-46/+0
\| \| \| \| \| \|	This definitely caused regression with ARM -mno-thumb. llvm-svn: 155743
*	Add x86-specific DAG combine to simplify:	Chad Rosier	2012-04-27	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x == -y --> x+y == 0 x != -y --> x+y != 0 On x86, the generated code goes from negl %esi cmpl %esi, %edi je .LBB0_2 to addl %esi, %edi je .L4 This case is correctly handled for ARM with "cmn". Patch by Manman Ren. rdar://11245199 PR12545 llvm-svn: 155739
*	Make test less fragile.	Evan Cheng	2012-04-27	1	-2/+2
\| \| \| \|	llvm-svn: 155732
*	Fix the order of the operands in the llvm.fma intrinsic patterns for ARM,	Lang Hames	2012-04-27	1	-3/+3
\| \| \| \| \| \|	<rdar://problem/11325085>. llvm-svn: 155724
*	X86: Don't emit conditional floating point moves on when targeting ↵	Benjamin Kramer	2012-04-27	2	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704
*	Add mcpu to tests to prevent them from using AVX instructions on Sandy ↵	Craig Topper	2012-04-27	32	-49/+49
\| \| \| \| \| \|	Bridge after r155618. llvm-svn: 155696
*	Implement a bastardized ABI.	Evan Cheng	2012-04-27	1	-1/+0
\| \| \| \|	llvm-svn: 155686
*	- thumbv6 shouldn't imply +thumb2. Cortex-M0 doesn't suppport 32-bit Thumb2	Evan Cheng	2012-04-27	1	-0/+12
\| \| \| \| \| \| \| \|	instructions. - However, it does support dmb, dsb, isb, mrs, and msr. rdar://11331541 llvm-svn: 155685
*	Fix the SD scheduler to avoid gluing the same node twice.	Andrew Trick	2012-04-26	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \|	DAGCombine strangeness may result in multiple loads from the same offset. They both may try to glue themselves to another load. We could insist that the redundant loads glue themselves to each other, but the beter fix is to bail out from bad gluing at the time we detect it. Fixes rdar://11314175: BuildSchedUnits assert. llvm-svn: 155668
*	Use VLD1 in NEON extenting-load patterns instead of VLDR.	Tim Northover	2012-04-26	1	-2/+6
\| \| \| \| \| \| \|	On some cores it's a bad idea for performance to mix VFP and NEON instructions and since these patterns are NEON anyway, the NEON load should be used. llvm-svn: 155630
*	If triple is armv7 / thumbv7 and a CPU is specified, do not automatically assume	Evan Cheng	2012-04-26	1	-7/+14
\| \| \| \| \| \| \| \| \| \|	the feature set of v7a. This comes about if the user specifies something like -arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as uxtab in this case. rdar://11318438 llvm-svn: 155601
*	Try to fix llvm-arm-linux builder with -mcpu.	Jakob Stoklund Olesen	2012-04-25	1	-1/+1
\| \| \| \|	llvm-svn: 155589
*	Trivial change to make the test use -mcpu=generic so as to avoid	Preston Gurd	2012-04-25	1	-1/+1
\| \| \| \| \| \|	a failure if run on an Intel Atom with post RA instruction scheduling. llvm-svn: 155587
*	Do not use $gp as a dedicated global register if the target ABI is not O32.	Akira Hatanaka	2012-04-25	2	-5/+6
\| \| \| \|	llvm-svn: 155522