bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PPC] Add intrinsics for vector extract word and vector insert word.	Sean Fertile	2016-12-09	1	-0/+18
\| \| \| \| \|	Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-09	6	-115/+89
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-09	6	-89/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221
*	[PowerPC] Improvements for BUILD_VECTOR Vol. 4	Nemanja Ivanovic	2016-12-06	3	-22/+4876
\| \| \| \| \| \| \| \| \| \| \| \|	This is the final patch in the series of patches that improves BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations to remove redundant instructions. It also adds a large test case which encompasses a large set of code patterns that build vectors - this test case was the motivator for this series of patches. Differential Revision: https://reviews.llvm.org/D26066 llvm-svn: 288800
*	DAG: Fold out out of bounds insert_vector_elt	Matt Arsenault	2016-12-03	1	-25/+32
\| \| \| \| \| \| \|	getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603
*	Revert https://reviews.llvm.org/rL287679	Nemanja Ivanovic	2016-11-29	5	-83/+59
\| \| \| \| \| \| \|	This commit caused some miscompiles that did not show up on any of the bots. Reverting until we can investigate the cause of those failures. llvm-svn: 288214
*	[PowerPC] Improvements for BUILD_VECTOR Vol. 1	Nemanja Ivanovic	2016-11-29	3	-20/+14
\| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D25912 This is the first patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. llvm-svn: 288152
*	Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"	Nirav Dave	2016-11-28	3	-33/+21
\| \| \| \| \| \|	This reverts commit r287773 which caused issues with ppc64le builds. llvm-svn: 288035
*	[DAG] Improve loads-from-store forwarding to handle TokenFactor	Nirav Dave	2016-11-23	3	-21/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	Forward store values to matching loads down through token factors. Factored from D14834. Reviewers: jyknight, hfinkel Subscribers: hfinkel, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D26080 llvm-svn: 287773
*	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE	Nemanja Ivanovic	2016-11-22	5	-59/+83
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D26861 It also fixes PR30730. Committing on behalf of Lei Huang. llvm-svn: 287679
*	[Power9] Add patterns for vnegd, vnegw	Ehsan Amiri	2016-11-18	1	-0/+22
\| \| \| \| \| \| \|	Exploit new instructions by adding patterns to .td file. https://reviews.llvm.org/D26551 llvm-svn: 287334
*	[PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended	Ehsan Amiri	2016-11-18	1	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329
*	Always use relative jump table encodings on PowerPC64.	Joerg Sonnenberger	2016-11-16	1	-8/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
*	vector load store with length (left justified) llvm portion	Zaara Syeda	2016-11-15	1	-0/+46
\| \| \| \|	llvm-svn: 286993
*	[PowerPC] Implement BE VSX load/store builtins - llvm portion.	Tony Jiang	2016-11-15	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967
*	[PPC] Add intrinsic mapping to the xscvhpsp instruction	Sean Fertile	2016-11-14	1	-0/+11
\| \| \| \| \| \| \| \| \|	add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862
*	[PPC] add intrinsics for vec extract exp/significand and vec test data class.	Sean Fertile	2016-11-14	1	-0/+71
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829
*	[PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portion	Nemanja Ivanovic	2016-11-11	1	-0/+70
\| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638
*	[PowerPC] Add vector conversion builtins to altivec.h - LLVM portion	Nemanja Ivanovic	2016-11-11	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596
*	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()	Matthias Braun	2016-11-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544
*	[PowerPC] Implement vector shift builtins - llvm portion	Nemanja Ivanovic	2016-11-01	1	-0/+23
\| \| \| \| \| \| \|	This patch corresponds to review https://reviews.llvm.org/D26095. Committing on behalf of Tony Jiang. llvm-svn: 285681
*	[PPC] add absolute difference altivec instructions and matching intrinsics	Nemanja Ivanovic	2016-10-31	1	-0/+40
\| \| \| \| \| \| \|	This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627
*	Implement vector count leading/trailing bytes with zero lsb and vector parity	Nemanja Ivanovic	2016-10-28	1	-0/+55
\| \| \| \| \| \| \| \| \|	builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26003. Committing on behalf of Zaara Syeda. llvm-svn: 285434
*	[PPC] Adding the removed testcase again	Ehsan Amiri	2016-10-27	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This testcase was originally part of r284995, but I put it in a wrong directory. So I removed it. Before adding it back I did some small enhancements. Also I changed the assertions a little bit, to take into account the impact of some changes performed since code review is done. This is similar to changes done for another testcase in the original commit. See: https://reviews.llvm.org/D23614#577749 Basically for instead of vxor we now generate xxlxor in some cases, which is better. llvm-svn: 285333
*	[PowerPC] - No SExt/ZExt needed for count trailing zeros	Nemanja Ivanovic	2016-10-27	1	-0/+54
\| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D25896 It just eliminates the redundant ZExt after a count trailing zeros instruction. llvm-svn: 285267
*	Do not assume that FP vector operands are never legalized by expanding	Nemanja Ivanovic	2016-10-26	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \|	This patch ensures that if a floating point vector operand is legalized by expanding, it is legalized through the stack rather than by calling DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand is a non-integer type. This fixes PR 30715. llvm-svn: 285231
*	[PowerPC] Implement vec_insert_exp builtins - llvm portion	Nemanja Ivanovic	2016-10-26	1	-0/+24
\| \| \| \| \| \| \|	This revision corresponds to review: https://reviews.llvm.org/D25957. Committing on behalf of Zaara Syeda. llvm-svn: 285225
*	[PPC] Generate positive FP zero using xor insn instead of loading from ↵	Ehsan Amiri	2016-10-24	3	-10/+10
\| \| \| \| \| \| \| \| \| \| \|	constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995
*	[PPC] Better codegen for AND, ANY_EXT, SRL sequence	Ehsan Amiri	2016-10-24	1	-0/+29
\| \| \| \| \| \| \| \|	https://reviews.llvm.org/D24924 This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review. llvm-svn: 284983
*	[DAG] optimize negation of bool	Sanjay Patel	2016-10-19	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611
*	PowerPC: specify full triple to avoid different Darwin asm syntax.	Tim Northover	2016-10-14	1	-1/+1
\| \| \| \|	llvm-svn: 284281
*	[PowerPC] add tests for PR30661	Sanjay Patel	2016-10-14	1	-0/+26
\| \| \| \|	llvm-svn: 284279
*	[PPC] Shorter sequence to load 64bit constant with same hi/lo words	Guozhi Wei	2016-10-14	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-10-13	6	-115/+89
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-10-13	6	-89/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151
*	[PPCMIPeephole] Fix splat elimination	Tim Shen	2016-10-12	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation: B = Splat A C = Splat B => C = Splat A because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not: B = Splat A C = Splat B => B = Splat A C = COPY B Fixes PR30663. Reviewers: echristo, iteratee, kbarton, nemanjai Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25493 llvm-svn: 283961
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-11	4	-11/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283934
*	Revert "Codegen: Tail-duplicate during placement."	Daniel Jasper	2016-10-11	4	-188/+11
\| \| \| \| \| \| \| \| \|	This reverts commit r283842. test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our internal testing. I'll share a link with you. llvm-svn: 283857
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-11	4	-11/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283842
*	Revert "Codegen: Tail-duplicate during placement."	Kyle Butt	2016-10-08	3	-123/+11
\| \| \| \| \| \|	This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4. llvm-svn: 283647
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-07	3	-11/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283619
*	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs	Michael Kuperstein	2016-10-06	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480
*	Revert "Codegen: Tail-duplicate during placement."	Kyle Butt	2016-10-05	3	-123/+11
\| \| \| \| \| \| \| \| \| \|	This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc. Issue with loop info and block removal revealed by polly. I have a fix for this issue already in another patch, I'll re-roll this together with that fix, and a test case. llvm-svn: 283292
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-04	3	-11/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283274
*	[Target] move reciprocal estimate settings from TargetOptions to TargetLowering	Sanjay Patel	2016-10-04	1	-15/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO. Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current state. The ingredients of this patch are: Remove the reciprocal estimate command-line debug option. Add TargetRecip to TargetLowering. Remove TargetRecip from TargetOptions. Clean up the TargetRecip implementation to work with this new scheme. Set the default reciprocal settings in TargetLoweringBase (everything is off). Update the PowerPC defaults, users, and tests. Update the x86 defaults, users, and tests. Note that if this patch needs to be reverted, the related clang patch checked in at r283251 should be reverted too. Differential Revision: https://reviews.llvm.org/D24816 llvm-svn: 283252
*	[Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set	Nemanja Ivanovic	2016-10-04	14	-33/+259
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: The newly added VSX D-Form (register + offset) memory ops target the upper half of the VSX register set. The existing ones target the lower half. In order to unify these and have the ability to target all the VSX registers using D-Form operations, this patch defines Pseudo-ops for the loads/stores which are expanded post-RA. The expansion then choses the correct opcode based on the register that was allocated for the operation. llvm-svn: 283212
*	Fix a test case failure on Apple PPC.	Nemanja Ivanovic	2016-10-04	1	-4/+4
\| \| \| \|	llvm-svn: 283191
*	[Power9] Part-word VSX integer scalar loads/stores and sign extend instructions	Nemanja Ivanovic	2016-10-04	15	-238/+1307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190
*	Revert "Codegen: Tail-duplicate during placement."	Kyle Butt	2016-10-04	3	-123/+11
\| \| \| \| \| \| \| \|	This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2. Causing crashes on aarch64 build. llvm-svn: 283172
*	Codegen: Tail-duplicate during placement.	Kyle Butt	2016-10-04	3	-11/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. llvm-svn: 283164