bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	MachineScheduler: Export function to construct "default" scheduler.	Matthias Braun	2016-11-28	10	-51/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
*	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵	Stanislav Mekhanoshin	2016-11-28	3	-9/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053
*	Revert r287553: [CodeGenPrep] Skip merging empty case blocks	Joerg Sonnenberger	2016-11-28	1	-135/+32
\| \| \| \| \| \| \|	It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052
*	[StructurizeCFG] Use range-based for loops.	Justin Lebar	2016-11-28	1	-72/+51
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D27000 llvm-svn: 288051
*	[StructurizeCFG] Refactor NearestCommonDominator.	Justin Lebar	2016-11-28	1	-56/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As far as I can tell, doing our own computations in NearestCommonDominator is a false optimization -- DomTree will build up what appears to be exactly this data when it decides it's worthwhile. Moreover, by building the cache ourselves, we cannot take advantage of the cache that the domtree might have available. In addition, I am not convinced of the correctness of the original code. In particular, setting ResultIndex = 1 on the first addBlock instead of setting it to 0 is quite fishy. Similarly, it's not clear to me that setting IndexMap[Node] = 0 for every node as we walk up the tree finding a common parent is correct. But rather than ponder over these questions, I'd rather just make the code do the obviously-correct thing. This patch also changes the NearestCommonDominator API a bit, improving the names and getting rid of the boolean parameter in addBlock -- see http://jlebar.com/2011/12/16/Boolean_parameters_to_API_functions_considered_harmful..html Reviewers: arsenm Subscribers: aemerson, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26998 llvm-svn: 288050
*	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.	Simon Pilgrim	2016-11-28	1	-0/+9
\| \| \| \|	llvm-svn: 288049
*	[GVN, OptDiag] Include the value that is forwarded in load elimination	Adam Nemet	2016-11-28	2	-7/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288047
*	[GVN] Basic optimization remark support	Adam Nemet	2016-11-28	2	-3/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288046
*	[x86] fix formatting; NFC	Sanjay Patel	2016-11-28	1	-16/+14
\| \| \| \|	llvm-svn: 288045
*	[LTO] Move finishOptimizationRemarks after codegen	Adam Nemet	2016-11-28	1	-2/+2
\| \| \| \| \| \|	This addresses the comment D26832. llvm-svn: 288041
*	[X86][SSE] Added support for combining bit-shifts with shuffles.	Simon Pilgrim	2016-11-28	1	-5/+57
\| \| \| \| \| \| \| \|	Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040
*	Test commit	Daniel Cederman	2016-11-28	1	-1/+0
\| \| \| \|	llvm-svn: 288036
*	Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"	Nirav Dave	2016-11-28	1	-13/+2
\| \| \| \| \| \|	This reverts commit r287773 which caused issues with ppc64le builds. llvm-svn: 288035
*	[SystemZ] Fix build bot fallout from r288030	Ulrich Weigand	2016-11-28	1	-1/+0
\| \| \| \| \| \| \|	Remove unused variable that came in due to a copy-and-paste bug and caused build bot failures. llvm-svn: 288033
*	[SystemZ] Support execution hint instructions	Ulrich Weigand	2016-11-28	14	-6/+158
\| \| \| \| \| \| \| \| \|	This adds assembler support for the instructions provided by the execution-hint facility (NIAI and BP(R)P). This required adding support for the new relocation types for 12-bit and 24-bit PC- relative offsets used by the BP(R)P instructions. llvm-svn: 288031
*	[SystemZ] Support load-and-trap instructions	Ulrich Weigand	2016-11-28	9	-7/+116
\| \| \| \| \| \| \|	This adds support for the instructions provided with the load-and-trap facility. llvm-svn: 288030
*	[SystemZ] Add remaining branch instructions	Ulrich Weigand	2016-11-28	8	-32/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds assembler support for the remaining branch instructions: the non-relative branch on count variants, and all variants of branch on index. The only one of those that can be readily exploited for code generation is BRCTH (branch on count using a high 32-bit register as count). Do use it, however, it is necessary to also introduce a hew CHIMux pseudo to allow comparisons of a 32-bit value agains a short immediate to go into a high register as well (implemented via CHI/CIH). This causes a bit of codegen changes overall, but those have proven to be neutral (or even beneficial) in performance measurements. llvm-svn: 288029
*	[SystemZ] Improve use of conditional instructions	Ulrich Weigand	2016-11-28	15	-147/+616
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves formation of LOC-type instructions from (late) IfConversion to the early if-conversion pass, and in some cases additionally creates them directly from select instructions during DAG instruction selection. To make early if-conversion work, the patch implements the canInsertSelect / insertSelect callbacks. It also implements the commuteInstructionImpl and FoldImmediate callbacks to enable generation of the full range of LOC instructions. Finally, the patch adds support for all instructions of the load-store-on-condition-2 facility, which allows using LOC instructions also for high registers. Due to the use of the GRX32 register class to enable high registers, we now also have to handle the cases where there are still no single hardware instructions (conditional move from a low register to a high register or vice versa). These are converted back to a branch sequence after register allocation. Since the expandRAPseudos callback is not allowed to create new basic blocks, this requires a simple new pass, modelled after the ARM/AArch64 ExpandPseudos pass. Overall, this patch causes significantly more LOC-type instructions to be used, and results in a measurable performance improvement. llvm-svn: 288028
*	[PM] Remove weird marking of invalidated analyses as "preserved".	Chandler Carruth	2016-11-28	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This never made a lot of sense. They've been invalidated for one IR unit but they aren't really preserved in any normal sense. It seemed like it would be an elegant way of communicating to outer IR units that pass managers and adaptors had already handled invalidation, but we've since ended up adding sets that model this more clearly: we're now using the 'AllAnalysesOn<IRUnitT>' set to handle cases where the trick of "preserving" invalidated analyses didn't work. This patch moves to rely on that technique exclusively and removes the cumbersome API aspect of updating the preserved set when doing invalidation. This in turn will simplify a number of upcoming patches. This has a side benefit of exposing a number of places where we were failing to mark the 'AllAnalysesOn<IRUnitT>' set as preserved. This patch fixes those, and with those fixes shouldn't change any observable behavior. llvm-svn: 288023
*	[ThreadPool] Rollback recent changes until I figure out the breakage.	Davide Italiano	2016-11-28	1	-2/+4
\| \| \| \|	llvm-svn: 288018
*	[ThreadPool] Simplify the interface. NFCI.	Davide Italiano	2016-11-28	1	-4/+2
\| \| \| \| \| \| \|	The callers don't use the return value. Found by Michael Spencer. llvm-svn: 288016
*	Revert "Improve error handling in YAML parsing"	Mehdi Amini	2016-11-28	1	-11/+9
\| \| \| \| \| \|	This reverts commit r288014, the unittest isn't passing llvm-svn: 288015
*	Improve error handling in YAML parsing	Mehdi Amini	2016-11-28	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Some scanner errors were not checked and reported by the parser. Fix PR30934 Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu> Differential Revision: https://reviews.llvm.org/D26419 llvm-svn: 288014
*	[X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't ↵	Craig Topper	2016-11-27	1	-1/+0
\| \| \| \| \| \|	commutable as operand 0 should pass its upper bits through to the output. llvm-svn: 288011
*	[X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic ↵	Craig Topper	2016-11-27	1	-6/+8
\| \| \| \| \| \|	patterns. llvm-svn: 288010
*	[X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.	Craig Topper	2016-11-27	1	-0/+20
\| \| \| \|	llvm-svn: 288009
*	[X86] Add SHL by 1 to the load folding tables.	Craig Topper	2016-11-27	1	-0/+4
\| \| \| \| \| \|	I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007
*	[X86][SSE] Add support for combining target shuffles to 128/256-bit ↵	Simon Pilgrim	2016-11-27	1	-49/+22
\| \| \| \| \| \|	PSLL/PSRL bit shifts llvm-svn: 288006
*	[InstSimplify] allow integer vector types to use computeKnownBits	Sanjay Patel	2016-11-27	2	-7/+7
\| \| \| \| \| \| \| \|	Note that the non-splat lshr+lshr test folded, but that does not work in general. Something is missing or wrong in computeKnownBits as the non-splat shl+shl test still shows. llvm-svn: 288005
*	[AVX-512] Add integer and fp unpck instructions to load folding tables.	Craig Topper	2016-11-27	1	-0/+108
\| \| \| \|	llvm-svn: 288004
*	[X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.	Simon Pilgrim	2016-11-27	1	-31/+60
\| \| \| \| \| \|	Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit). llvm-svn: 288003
*	[X86] Add TB_NO_REVERSE to entries in the load folding table where the ↵	Craig Topper	2016-11-27	1	-188/+206
\| \| \| \| \| \| \| \| \| \|	instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001
*	fix formatting; NFC	Sanjay Patel	2016-11-27	1	-13/+15
\| \| \| \|	llvm-svn: 287997
*	[AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.	Craig Topper	2016-11-27	1	-0/+84
\| \| \| \|	llvm-svn: 287995
*	[X86] Remove alignment restrictions from load folding table for some ↵	Craig Topper	2016-11-27	1	-13/+13
\| \| \| \| \| \| \| \|	instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991
*	[X86] Remove hasOneUse check that is redundant with the one in ↵	Craig Topper	2016-11-26	1	-2/+0
\| \| \| \| \| \|	IsProfitableToFold. llvm-svn: 287987
*	[X86] Fix the zero extending load detection in ↵	Craig Topper	2016-11-26	1	-11/+12
\| \| \| \| \| \| \| \|	X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold. Previously we were passing the SCALAR_TO_VECTOR node. llvm-svn: 287986
*	[X86] Simplify control flow. NFCI	Craig Topper	2016-11-26	1	-3/+2
\| \| \| \|	llvm-svn: 287985
*	[X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load ↵	Craig Topper	2016-11-26	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	from being folded multiple times. Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied. Reviewers: RKSimon, zvi, delena, spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D26790 llvm-svn: 287983
*	[InstCombine] don't drop metadata in FoldOpIntoSelect()	Sanjay Patel	2016-11-26	1	-3/+3
\| \| \| \|	llvm-svn: 287980
*	add optional param to copy metadata when creating selects; NFC	Sanjay Patel	2016-11-26	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \|	There are other spots where we can use this; we're currently dropping metadata in some places, and there are proposed changes where we will want to propagate metadata. IRBuilder's CreateSelect() already has a parameter like this, so this change makes the regular 'Create' API line up with that. llvm-svn: 287976
*	[AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.	Craig Topper	2016-11-26	1	-0/+36
\| \| \| \|	llvm-svn: 287975
*	[AVX-512] Add masked 128/256-bit integer add/sub instructions to load ↵	Craig Topper	2016-11-26	1	-0/+64
\| \| \| \| \| \|	folding tables. llvm-svn: 287974
*	[AVX-512] Add masked 512-bit integer add/sub instructions to load folding ↵	Craig Topper	2016-11-26	1	-0/+31
\| \| \| \| \| \|	tables. llvm-svn: 287972
*	[AVX-512] Teach LowerFormalArguments to use the extended register class when ↵	Craig Topper	2016-11-26	1	-4/+4
\| \| \| \| \| \|	available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971
*	[AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.	Craig Topper	2016-11-26	1	-0/+8
\| \| \| \|	llvm-svn: 287970
*	AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics	Tom Stellard	2016-11-26	2	-2/+4
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26724 llvm-svn: 287962
*	[X86][XOP] Add a reversed reg/reg form for VPROT instructions.	Craig Topper	2016-11-26	1	-0/+7
\| \| \| \| \| \|	The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions. llvm-svn: 287961
*	[X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding ↵	Craig Topper	2016-11-26	1	-0/+6
\| \| \| \| \| \| \| \|	tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960
*	[AVX-512] Put the AVX-512 sections of the load folding tables into mostly ↵	Craig Topper	2016-11-25	1	-365/+373
\| \| \| \| \| \|	alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956