bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU: Refactor argument lowering	Matt Arsenault	2017-04-11	1	-5/+18
\| \| \| \| \| \| \|	Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
*	AMDGPU: Stop using CCAssignToRegWithShadow	Matt Arsenault	2017-04-06	1	-0/+31
\| \| \| \| \| \| \|	This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667
*	AMDGPU: Remove llvm.SI.vs.load.input	Matt Arsenault	2017-04-03	1	-1/+0
\| \| \| \|	llvm-svn: 299391
*	AMDGPU: Remove legacy bfe intrinsics	Matt Arsenault	2017-04-03	1	-27/+0
\| \| \| \|	llvm-svn: 299372
*	AMDGPU: Remove unnecessary ands when f16 is legal	Matt Arsenault	2017-03-31	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
*	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits	Simon Pilgrim	2017-03-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
*	[DAGCombiner] Add vector demanded elements support to ↵	Simon Pilgrim	2017-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
*	[AMDGPU] Tidy up ↵	Simon Pilgrim	2017-03-29	1	-13/+6
\| \| \| \| \| \| \| \|	computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991
*	[AMDGPU] Get address space mapping by target triple environment	Yaxun Liu	2017-03-27	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
*	AMDGPU: Implement f16 fround	Matt Arsenault	2017-03-24	1	-13/+18
\| \| \| \|	llvm-svn: 298730
*	AMDGPU: Rename SI_RETURN	Matt Arsenault	2017-03-21	1	-2/+3
\| \| \| \| \| \| \| \|	This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
*	AMDGPU: Cleanup control flow intrinsics	Matt Arsenault	2017-03-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
*	AMDGPU: Fix unnecessary ands when packing f16 vectors	Matt Arsenault	2017-03-15	1	-3/+16
\| \| \| \| \| \| \| \| \|	computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
*	AMDGPU: Constant fold rcp node	Matt Arsenault	2017-03-08	1	-2/+12
\| \| \| \| \| \| \|	When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
*	AMDGPU: Support v2i16/v2f16 packed operations	Matt Arsenault	2017-02-27	1	-3/+8
\| \| \| \|	llvm-svn: 296396
*	AMDGPU : Replace FMAD with FMA when denormals are enabled.	Wei Ding	2017-02-24	1	-1/+5
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
*	AMDGPU: Add cvt.pkrtz intrinsic	Matt Arsenault	2017-02-22	1	-0/+1
\| \| \| \| \| \|	Convert llvm.SI.packf16 test uses llvm-svn: 295797
*	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic	Matt Arsenault	2017-02-21	1	-11/+0
\| \| \| \|	llvm-svn: 295789
*	AMDGPU: Redefine clamp node as clamp 0.0-1.0	Matt Arsenault	2017-02-21	1	-14/+45
\| \| \| \| \| \| \| \| \| \| \|	Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
*	AMDGPU: Add pass to expand memcpy/memmove/memset	Matt Arsenault	2017-02-09	1	-4/+5
\| \| \| \|	llvm-svn: 294635
*	AMDGPU: Fold fneg into fmin/fmax_legacy	Matt Arsenault	2017-02-03	1	-2/+24
\| \| \| \|	llvm-svn: 293972
*	AMDGPU: Fold fneg into fminnum/fmaxnum	Matt Arsenault	2017-02-03	1	-0/+30
\| \| \| \|	llvm-svn: 293968
*	AMDGPU: Check if users of fneg can fold mods	Matt Arsenault	2017-02-02	1	-4/+64
\| \| \| \| \| \|	In multi-use cases this can save a few instructions. llvm-svn: 293962
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-02-02	1	-0/+10
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-02	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
*	AMDGPU: Use source modifiers with f16->f32 conversions	Matt Arsenault	2017-02-02	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \|	The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
*	AMDGPU: Cleanup fmin/fmax legacy function	Matt Arsenault	2017-02-01	1	-11/+8
\| \| \| \| \| \|	Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
*	Re-commit AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
*	Revert "AMDGPU/GlobalISel: Add support for simple shaders"	Tom Stellard	2017-01-30	1	-6/+0
\| \| \| \| \| \| \| \|	This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
*	AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
*	Cleanup dump() functions.	Matthias Braun	2017-01-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-01-26	1	-0/+10
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-01-26	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
*	AMDGPU: Fold fneg into round instructions	Matt Arsenault	2017-01-26	1	-1/+7
\| \| \| \|	llvm-svn: 293127
*	AMDGPU: Propagate fast math flags in fneg combines	Matt Arsenault	2017-01-23	1	-3/+3
\| \| \| \| \| \|	Can't for fma/mad since it seems they can't have flags currently. llvm-svn: 292818
*	AMDGPU/R600: Serialize vector trunc stores to private AS	Jan Vesely	2017-01-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
*	AMDGPU: Disable some fneg combines unless nsz	Matt Arsenault	2017-01-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
*	AMDGPU: Skip fneg/select combine if it can fold into other	Matt Arsenault	2017-01-12	1	-29/+40
\| \| \| \|	llvm-svn: 291792
*	AMDGPU: Fold free fneg into sin	Matt Arsenault	2017-01-12	1	-1/+5
\| \| \| \|	llvm-svn: 291790
*	AMDGPU: Fold fneg into fmul_legacy	Matt Arsenault	2017-01-12	1	-2/+5
\| \| \| \|	llvm-svn: 291784
*	AMDGPU: Fold fneg into rcp	Matt Arsenault	2017-01-12	1	-1/+7
\| \| \| \|	llvm-svn: 291779
*	AMDGPU: Fold fneg into fp_round	Matt Arsenault	2017-01-12	1	-2/+18
\| \| \| \|	llvm-svn: 291778
*	AMDGPU: Fold fneg into fp_extend	Matt Arsenault	2017-01-12	1	-0/+14
\| \| \| \|	llvm-svn: 291777
*	AMDGPU: Fold fneg into fma or fmad	Matt Arsenault	2017-01-12	1	-0/+24
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291733
*	AMDGPU: Fold fneg into fmul	Matt Arsenault	2017-01-12	1	-0/+17
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291732
*	AMDGPU: Fold fneg into fadd	Matt Arsenault	2017-01-12	1	-0/+60
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291731
*	AMDGPU: Pull fneg/fabs out of a select	Matt Arsenault	2017-01-11	1	-0/+74
\| \| \| \| \| \|	Allows better source modifier usage. llvm-svn: 291729
*	AMDGPU: Add tests for HasMultipleConditionRegisters	Matt Arsenault	2017-01-10	1	-0/+7
\| \| \| \| \| \|	This was enabled without many specific tests or the comment. llvm-svn: 291586
*	AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes	Jan Vesely	2017-01-06	1	-10/+0
\| \| \| \| \| \| \| \|	This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
*	AMDGPU/SI: Implement sendmsghalt intrinsic	Jan Vesely	2017-01-04	1	-0/+1
\| \| \| \| \| \| \| \|	v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977