bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AVX-512: fixed a bug in fp_to_uint pattern on KNL	Elena Demikhovsky	2016-03-29	1	-151/+499
\| \| \| \| \| \| \| \| \|	Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701
*	[PowerPC] Refactor popcnt[dw] target features	Hal Finkel	2016-03-29	1	-0/+2
\| \| \| \| \| \| \| \| \|	Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690
*	[Codegen] Decrease minimum jump table density.	Kyle Butt	2016-03-29	8	-30/+111
\| \| \| \| \| \| \| \| \| \| \|	Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689
*	fix checks: _DAG -> -DAG	Sanjay Patel	2016-03-28	2	-4/+4
\| \| \| \|	llvm-svn: 264676
*	fix CHECK_NEXT -> CHECK-NEXT	Sanjay Patel	2016-03-28	2	-2/+2
\| \| \| \|	llvm-svn: 264674
*	fix CHECK_DAG -> CHECK-DAG	Sanjay Patel	2016-03-28	2	-4/+4
\| \| \| \|	llvm-svn: 264673
*	fix CHECK_NEXT -> CHECK-NEXT	Sanjay Patel	2016-03-28	1	-4/+4
\| \| \| \|	llvm-svn: 264672
*	fix CHECK_LABEL -> CHECK-LABEL	Sanjay Patel	2016-03-28	1	-16/+16
\| \| \| \|	llvm-svn: 264671
*	trailing whitespace	Sanjay Patel	2016-03-28	1	-315/+315
\| \| \| \|	llvm-svn: 264670
*	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op ↵	Simon Pilgrim	2016-03-28	4	-214/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666
*	fix CHECK_NEXT -> CHECK-NEXT	Sanjay Patel	2016-03-28	1	-15/+15
\| \| \| \|	llvm-svn: 264661
*	MIRParser: Add %subreg.xxx syntax for subregister index operands	Matthias Braun	2016-03-28	2	-0/+58
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18279 llvm-svn: 264608
*	CodeGen: Correct specification of PHI nodes	Matthias Braun	2016-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	They do have a def machine operand. Fixing the definition is necessary for an upcoming patch. Differential Revision: http://reviews.llvm.org/D18384 llvm-svn: 264607
*	[AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when ↵	Haicheng Wu	2016-03-28	1	-0/+45
\| \| \| \| \| \| \| \|	optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606
*	[PowerPC] On the A2, popcnt[dw] are very slow	Hal Finkel	2016-03-28	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600
*	Introduce MachineFunctionProperties and the AllVRegsAllocated property	Derek Schuff	2016-03-28	3	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593
*	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions	Tom Stellard	2016-03-28	3	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589
*	[Hexagon] Improve handling of unaligned vector loads and stores	Krzysztof Parzyszek	2016-03-28	1	-0/+31
\| \| \| \|	llvm-svn: 264584
*	[Hexagon] Only use restore functions for single register at -Oz	Krzysztof Parzyszek	2016-03-28	1	-0/+42
\| \| \| \|	llvm-svn: 264581
*	[lanai] Add Lanai backend.	Jacques Pienaar	2016-03-28	13	-0/+746
\| \| \| \| \| \| \| \| \| \|	Add the Lanai backend to lib/Target. General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html). Differential Revision: http://reviews.llvm.org/D17011 llvm-svn: 264578
*	AVX-512: Fixed ICMP instruction selection for i1 operands	Elena Demikhovsky	2016-03-28	1	-21/+111
\| \| \| \| \| \| \| \| \| \|	ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566
*	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵	Hal Finkel	2016-03-27	1	-4/+191
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
*	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets	Simon Pilgrim	2016-03-26	2	-352/+106
\| \| \| \| \| \|	Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517
*	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets	Simon Pilgrim	2016-03-26	2	-694/+52
\| \| \| \| \| \| \| \|	Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512
*	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors	Simon Pilgrim	2016-03-26	4	-5135/+695
\| \| \| \| \| \| \| \|	Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
*	[X86][SSE] Added v64i8 vector integer multiply tests	Simon Pilgrim	2016-03-26	1	-0/+946
\| \| \| \|	llvm-svn: 264510
*	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵	Simon Pilgrim	2016-03-26	1	-18/+7
\| \| \| \| \| \| \| \|	multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
*	[PowerPC] Disable the CTR optimization in the presence of {min,max}num	David Majnemer	2016-03-26	1	-0/+44
\| \| \| \| \| \| \| \| \|	The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
*	[X86][SSE] Refreshed vector integer multiply tests	Simon Pilgrim	2016-03-26	1	-121/+485
\| \| \| \| \| \| \| \|	Add all 256-bit vector tests. Added AVX512F/AVX512BW test targets. Renamed tests something more meaningful. llvm-svn: 264507
*	[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr	David Majnemer	2016-03-25	1	-0/+29
\| \| \| \| \| \| \| \| \|	We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465
*	[MachineCopyPropagation] Expose more dead copies across instructions with ↵	Jun Bum Lim	2016-03-25	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \|	regmasks When encountering instructions with regmasks, instead of cleaning up all the elements in MaybeDeadCopies map, remove only the instructions erased. By keeping more instruction in MaybeDeadCopies, this change will expose more dead copies across instructions with regmasks. llvm-svn: 264462
*	Prevent construction of cycle in DAG store merge	Nirav Dave	2016-03-25	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When merging stores in DAGCombiner, add check to ensure that no dependenices exist that would cause the construction of a cycle in our DAG. This may happen if one store has a data dependence on another instruction (e.g. a load) which itself has a (chain) dependence on another store being merged. These stores cannot be merged safely and doing so results in a cycle that is discovered in LegalizeDAG. This test is only done in cases where Antialias analysis is used (UseAA) as non-AA store merge candidates will be merged logically after all loads which have been checked to not alias. Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight Subscribers: llvm-commits, tberghammer, danalbert, srhines Differential Revision: http://reviews.llvm.org/D18336 llvm-svn: 264461
*	ARM: maintain BB ordering when expanding WIN__DBZCHK	Saleem Abdulrasool	2016-03-25	2	-25/+64
\| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454
*	[X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize	Hans Wennborg	2016-03-25	2	-1/+89
\| \| \| \| \| \| \| \| \| \| \| \|	64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440
*	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)	Hans Wennborg	2016-03-25	3	-102/+217
\| \| \| \| \| \| \| \| \|	This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
*	ARM: fix optimised division on WoA	Saleem Abdulrasool	2016-03-25	1	-3/+32
\| \| \| \| \| \| \| \| \|	We did not have an explicit branch to the continuation BB. When the check was hoisted, this could permit control follow to fall through into the division trap. Add the explicit branch to the continuation basic block to ensure that code execution is correct. llvm-svn: 264370
*	CXX TLS: collect return blocks after SelectAllBasicBlocks.	Manman Ren	2016-03-24	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	It is incorrect to get the corresponding MBB for a ReturnInst before SelectAllBasicBlocks since SelectAllBasicBlocks can change the correspondence between a ReturnInst and the MBB it is in. PR27062 llvm-svn: 264358
*	Lower varargs correctly in deopt bundle lowering	Sanjoy Das	2016-03-24	1	-0/+17
\| \| \| \| \| \| \|	Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg. llvm-svn: 264354
*	LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos	Matthias Braun	2016-03-24	1	-0/+53
\| \| \| \| \| \|	This fixes http://llvm.org/PR26991 llvm-svn: 264345
*	Finish the incomplete 'd' inline asm constraint support for PPC by	Eric Christopher	2016-03-24	1	-1/+31
\| \| \| \| \| \|	making sure we give it a register and mark it as a register constraint. llvm-svn: 264340
*	Reorder check lines, comments in test and remove unnecessary IR.	Eric Christopher	2016-03-24	1	-17/+15
\| \| \| \|	llvm-svn: 264339
*	Match call and target calling conventions in test	Sanjoy Das	2016-03-24	1	-3/+4
\| \| \| \| \| \|	Fixes an issue in rL264329. llvm-svn: 264337
*	Add lowering support for llvm.experimental.deoptimize	Sanjoy Das	2016-03-24	1	-0/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Only adds support for "naked" calls to llvm.experimental.deoptimize. Support for round-tripping through RewriteStatepointsForGC will come as a separate patch (should be simpler than this one). Reviewers: reames Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18429 llvm-svn: 264329
*	[Hexagon] Add support for run-time stack overflow checking	Krzysztof Parzyszek	2016-03-24	1	-0/+44
\| \| \| \| \| \|	Patch by Sundeep Kushwaha. llvm-svn: 264328
*	[Hexagon] Generate PIC-specific versions of save/restore routines	Krzysztof Parzyszek	2016-03-24	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \|	In PIC mode, the registers R14, R15 and R28 are reserved for use by the PLT handling code. This causes all functions to clobber these registers. While this is not new for regular function calls, it does also apply to save/restore functions, which do not follow the standard ABI conventions with respect to the volatile/non-volatile registers. Patch by Jyotsna Verma. llvm-svn: 264324
*	[Statepoints] Fix yet another issue around gc pointer uniqueing	Sanjoy Das	2016-03-24	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Given that StatepointLowering now uniques derived pointers before putting them in the per-statepoint spill map, we may end up with missing entries for derived pointers when we visit a gc.relocate on a pointer that was de-duplicated away. Fix this by keeping two maps, one mapping gc pointers to their de-duplicated values, and one mapping a de-duplicated value to the slot it is spilled in. llvm-svn: 264320
*	Remove unnecessary redirect from test	Sanjoy Das	2016-03-24	1	-2/+2
\| \| \| \|	llvm-svn: 264308
*	AVX-512: Generate KTEST instead of TEST fir i1 vectors	Elena Demikhovsky	2016-03-24	1	-2/+111
\| \| \| \| \| \| \| \| \| \| \| \|	KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298
*	CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS.	Tim Northover	2016-03-24	1	-2/+52
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the operation's type has been promoted during type legalization, we need to account for the fact that the high bits of the comparison operand are likely unspecified. The LHS is usually zero-extended, but MIPS sign extends it, so we have to be slightly careful. Patch by Simon Dardis. llvm-svn: 264296
*	Remove unsafe AssertZext after promoting result of FP_TO_FP16	Pirama Arumuga Nainar	2016-03-24	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32 instruction, do not guarantee that the top 16 bits are zeroed out. Remove the unsafe AssertZext and add tests to exercise this. Reviewers: jmolloy, sbaranga, kristof.beyls, aadg Subscribers: llvm-commits, srhines, aemerson Differential Revision: http://reviews.llvm.org/D18426 llvm-svn: 264285