bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Add intrinsics for tbuffer load and store	David Stuttard	2017-06-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
*	AMDGPU: Cleanup CreateLiveInRegister	Matt Arsenault	2017-06-19	1	-7/+14
\| \| \| \|	llvm-svn: 305748
*	Sort the remaining #include lines in include/... and lib/....	Chandler Carruth	2017-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
*	[AMDGPU] Combine and (srl) into shl (bfe)	Stanislav Mekhanoshin	2017-05-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
*	[AMDGPU] Convert shl (add) into add (shl)	Stanislav Mekhanoshin	2017-05-23	1	-2/+40
\| \| \| \| \| \| \| \| \| \| \|	shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
*	[AMDGPU] Narrow lshl from 64 to 32 bit if possible	Stanislav Mekhanoshin	2017-05-22	1	-11/+33
\| \| \| \| \| \| \| \| \|	Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569
*	AMDGPU: Start defining a calling convention	Matt Arsenault	2017-05-17	1	-28/+91
\| \| \| \| \| \| \| \|	Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
*	[KnownBits] Add bit counting methods to KnownBits struct and use them where ↵	Craig Topper	2017-05-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
*	AMDGPU: Pull fneg out of extract_vector_elt	Matt Arsenault	2017-05-11	1	-1/+7
\| \| \| \| \| \| \|	This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
*	[KnownBits] Add wrapper methods for setting and clear all bits in the ↵	Craig Topper	2017-05-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262
*	AMDGPU: Add AMDGPU_HS calling convention	Marek Olsak	2017-05-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930
*	AMDGPU: Add new amdgcn.init.exec intrinsics	Marek Olsak	2017-04-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
*	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵	Craig Topper	2017-04-28	1	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \|	simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
*	AMDGPU: Move trap lowering to DAG	Matt Arsenault	2017-04-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019
*	Revert r300932 and r300930.	Akira Hatanaka	2017-04-21	1	-3/+2
\| \| \| \| \| \| \| \| \|	It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930
*	Revert "[AArch64] Improve code generation for logical instructions taking"	Akira Hatanaka	2017-04-20	1	-3/+2
\| \| \| \| \| \| \| \|	This reverts r300913. This broke bots. llvm-svn: 300916
*	[AArch64] Improve code generation for logical instructions taking	Akira Hatanaka	2017-04-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913
*	AMDGPU: Refactor argument lowering	Matt Arsenault	2017-04-11	1	-5/+18
\| \| \| \| \| \| \|	Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
*	AMDGPU: Stop using CCAssignToRegWithShadow	Matt Arsenault	2017-04-06	1	-0/+31
\| \| \| \| \| \| \|	This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667
*	AMDGPU: Remove llvm.SI.vs.load.input	Matt Arsenault	2017-04-03	1	-1/+0
\| \| \| \|	llvm-svn: 299391
*	AMDGPU: Remove legacy bfe intrinsics	Matt Arsenault	2017-04-03	1	-27/+0
\| \| \| \|	llvm-svn: 299372
*	AMDGPU: Remove unnecessary ands when f16 is legal	Matt Arsenault	2017-03-31	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
*	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits	Simon Pilgrim	2017-03-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
*	[DAGCombiner] Add vector demanded elements support to ↵	Simon Pilgrim	2017-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
*	[AMDGPU] Tidy up ↵	Simon Pilgrim	2017-03-29	1	-13/+6
\| \| \| \| \| \| \| \|	computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991
*	[AMDGPU] Get address space mapping by target triple environment	Yaxun Liu	2017-03-27	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
*	AMDGPU: Implement f16 fround	Matt Arsenault	2017-03-24	1	-13/+18
\| \| \| \|	llvm-svn: 298730
*	AMDGPU: Rename SI_RETURN	Matt Arsenault	2017-03-21	1	-2/+3
\| \| \| \| \| \| \| \|	This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
*	AMDGPU: Cleanup control flow intrinsics	Matt Arsenault	2017-03-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
*	AMDGPU: Fix unnecessary ands when packing f16 vectors	Matt Arsenault	2017-03-15	1	-3/+16
\| \| \| \| \| \| \| \| \|	computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
*	AMDGPU: Constant fold rcp node	Matt Arsenault	2017-03-08	1	-2/+12
\| \| \| \| \| \| \|	When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
*	AMDGPU: Support v2i16/v2f16 packed operations	Matt Arsenault	2017-02-27	1	-3/+8
\| \| \| \|	llvm-svn: 296396
*	AMDGPU : Replace FMAD with FMA when denormals are enabled.	Wei Ding	2017-02-24	1	-1/+5
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
*	AMDGPU: Add cvt.pkrtz intrinsic	Matt Arsenault	2017-02-22	1	-0/+1
\| \| \| \| \| \|	Convert llvm.SI.packf16 test uses llvm-svn: 295797
*	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic	Matt Arsenault	2017-02-21	1	-11/+0
\| \| \| \|	llvm-svn: 295789
*	AMDGPU: Redefine clamp node as clamp 0.0-1.0	Matt Arsenault	2017-02-21	1	-14/+45
\| \| \| \| \| \| \| \| \| \| \|	Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
*	AMDGPU: Add pass to expand memcpy/memmove/memset	Matt Arsenault	2017-02-09	1	-4/+5
\| \| \| \|	llvm-svn: 294635
*	AMDGPU: Fold fneg into fmin/fmax_legacy	Matt Arsenault	2017-02-03	1	-2/+24
\| \| \| \|	llvm-svn: 293972
*	AMDGPU: Fold fneg into fminnum/fmaxnum	Matt Arsenault	2017-02-03	1	-0/+30
\| \| \| \|	llvm-svn: 293968
*	AMDGPU: Check if users of fneg can fold mods	Matt Arsenault	2017-02-02	1	-4/+64
\| \| \| \| \| \|	In multi-use cases this can save a few instructions. llvm-svn: 293962
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-02-02	1	-0/+10
\| \| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-02	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
*	AMDGPU: Use source modifiers with f16->f32 conversions	Matt Arsenault	2017-02-02	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \|	The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
*	AMDGPU: Cleanup fmin/fmax legacy function	Matt Arsenault	2017-02-01	1	-11/+8
\| \| \| \| \| \|	Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
*	Re-commit AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
*	Revert "AMDGPU/GlobalISel: Add support for simple shaders"	Tom Stellard	2017-01-30	1	-6/+0
\| \| \| \| \| \| \| \|	This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
*	AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
*	Cleanup dump() functions.	Matthias Braun	2017-01-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359