bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-01-26	1	-0/+10
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-01-26	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
*	AMDGPU: Fold fneg into round instructions	Matt Arsenault	2017-01-26	1	-1/+7
\| \| \| \|	llvm-svn: 293127
*	AMDGPU: Propagate fast math flags in fneg combines	Matt Arsenault	2017-01-23	1	-3/+3
\| \| \| \| \| \|	Can't for fma/mad since it seems they can't have flags currently. llvm-svn: 292818
*	AMDGPU/R600: Serialize vector trunc stores to private AS	Jan Vesely	2017-01-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
*	AMDGPU: Disable some fneg combines unless nsz	Matt Arsenault	2017-01-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
*	AMDGPU: Skip fneg/select combine if it can fold into other	Matt Arsenault	2017-01-12	1	-29/+40
\| \| \| \|	llvm-svn: 291792
*	AMDGPU: Fold free fneg into sin	Matt Arsenault	2017-01-12	1	-1/+5
\| \| \| \|	llvm-svn: 291790
*	AMDGPU: Fold fneg into fmul_legacy	Matt Arsenault	2017-01-12	1	-2/+5
\| \| \| \|	llvm-svn: 291784
*	AMDGPU: Fold fneg into rcp	Matt Arsenault	2017-01-12	1	-1/+7
\| \| \| \|	llvm-svn: 291779
*	AMDGPU: Fold fneg into fp_round	Matt Arsenault	2017-01-12	1	-2/+18
\| \| \| \|	llvm-svn: 291778
*	AMDGPU: Fold fneg into fp_extend	Matt Arsenault	2017-01-12	1	-0/+14
\| \| \| \|	llvm-svn: 291777
*	AMDGPU: Fold fneg into fma or fmad	Matt Arsenault	2017-01-12	1	-0/+24
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291733
*	AMDGPU: Fold fneg into fmul	Matt Arsenault	2017-01-12	1	-0/+17
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291732
*	AMDGPU: Fold fneg into fadd	Matt Arsenault	2017-01-12	1	-0/+60
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291731
*	AMDGPU: Pull fneg/fabs out of a select	Matt Arsenault	2017-01-11	1	-0/+74
\| \| \| \| \| \|	Allows better source modifier usage. llvm-svn: 291729
*	AMDGPU: Add tests for HasMultipleConditionRegisters	Matt Arsenault	2017-01-10	1	-0/+7
\| \| \| \| \| \|	This was enabled without many specific tests or the comment. llvm-svn: 291586
*	AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes	Jan Vesely	2017-01-06	1	-10/+0
\| \| \| \| \| \| \| \|	This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
*	AMDGPU/SI: Implement sendmsghalt intrinsic	Jan Vesely	2017-01-04	1	-0/+1
\| \| \| \| \| \| \| \|	v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
*	AMDGPU: Invert cmp + select with constant	Matt Arsenault	2016-12-22	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
*	AMDGPU: Update isFPImmLegal for f16	Matt Arsenault	2016-12-22	1	-1/+2
\| \| \| \| \| \|	I don't think this matters because ConstantFP is legal. llvm-svn: 290299
*	AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*	Tom Stellard	2016-12-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
*	AMDGPU: Fix asserting on returned tail calls	Matt Arsenault	2016-12-15	1	-2/+4
\| \| \| \|	llvm-svn: 289868
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-14	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
*	Replace APFloatBase static fltSemantics data members with getter functions	Stephan Bergmann	2016-12-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
*	AMDGPU: Fix i128 mul	Matt Arsenault	2016-12-09	1	-0/+4
\| \| \| \|	llvm-svn: 289231
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-12-09	1	-0/+10
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2016-12-09	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221
*	AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.	Tom Stellard	2016-12-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
*	AMDGPU: Refactor exp instructions	Matt Arsenault	2016-12-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
*	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵	Stanislav Mekhanoshin	2016-11-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053
*	AMDGPU: Fix f16 fabs/fneg	Matt Arsenault	2016-11-15	1	-4/+3
\| \| \| \|	llvm-svn: 286931
*	[AMDGPU] Add f16 support (VI+)	Konstantin Zhuravlyov	2016-11-13	1	-1/+55
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
*	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate ↵	Stanislav Mekhanoshin	2016-11-11	1	-1/+0
\| \| \| \| \| \| \| \|	condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530
*	[DAG Combiner] Fix the native computation of the Newton series for reciprocals	Evandro Menezes	2016-11-10	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-10	1	-3/+24
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
*	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition ↵	Stanislav Mekhanoshin	2016-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171
*	Revert "AMDGPU: Add VI i16 support"	Tom Stellard	2016-11-04	1	-24/+3
\| \| \| \| \| \|	This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
*	AMDGPU: Add VI i16 support	Tom Stellard	2016-11-03	1	-3/+24
\| \| \| \| \| \| \| \|	Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
*	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32	Konstantin Zhuravlyov	2016-11-01	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \|	This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
*	AMDGPU: Fix buildbots broken by r285704	Tom Stellard	2016-11-01	1	-2/+1
\| \| \| \|	llvm-svn: 285711
*	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64	Tom Stellard	2016-11-01	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704
*	[AMDGPU] Expand vector mulhu/mulhs	Valery Pykhtin	2016-11-01	1	-0/+2
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684
*	AMDGPU/SI: Fix crash caused by r284267	Tom Stellard	2016-10-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25782 llvm-svn: 284875
*	[Target] remove TargetRecip class; 2nd try	Sanjay Patel	2016-10-20	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746
*	revert r284495: [Target] remove TargetRecip class	Sanjay Patel	2016-10-18	1	-4/+6
\| \| \| \| \| \|	There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
*	[Target] remove TargetRecip class; move reciprocal estimate isel ↵	Sanjay Patel	2016-10-18	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
*	AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations	Tom Stellard	2016-10-14	1	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2016-10-13	1	-0/+10
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157