bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Generate range metadata for workitem id	Stanislav Mekhanoshin	2017-04-12	1	-3/+3
\| \| \| \| \| \| \| \| \|	If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
*	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.	Kannan Narayanan	2017-04-12	1	-1/+11
\| \| \| \| \| \|	Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
*	[AMDGPU] Add A5 to data layout for amdgiz environment	Yaxun Liu	2017-04-11	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964
*	[AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes	Sam Kolton	2017-04-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757
*	[AMDGPU] Temporarily change constant address space from 4 to 2	Yaxun Liu	2017-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690
*	[AMDGPU] Resubmit SDWA peephole: enable by default	Sam Kolton	2017-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654
*	Revert r299536. [AMDGPU] SDWA peephole: enable by default.	Ivan Krasin	2017-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583
*	[AMDGPU] SDWA peephole: enable by default	Sam Kolton	2017-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536
*	[AMDGPU] Add GlobalOpt parameter to Always Inliner pass	Stanislav Mekhanoshin	2017-03-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108
*	[AMDGPU] Split -amdgpu-early-inline-all option	Stanislav Mekhanoshin	2017-03-28	1	-3/+13
\| \| \| \| \| \| \| \| \| \|	Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935
*	[AMDGPU] Get address space mapping by target triple environment	Yaxun Liu	2017-03-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
*	[AMDGPU] Switch data layout by triple environment amdgiz	Yaxun Liu	2017-03-25	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758
*	AMDGPU: Unify divergent function exits.	Matt Arsenault	2017-03-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
*	[AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline	Stanislav Mekhanoshin	2017-03-24	1	-1/+24
\| \| \| \| \| \| \| \|	Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721
*	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler	Valery Pykhtin	2017-03-21	1	-0/+25
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
*	[ADMGPU] SDWA peephole optimization pass.	Sam Kolton	2017-03-21	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
*	[AMDGPU] Run always inliner early in opt	Konstantin Zhuravlyov	2017-03-20	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281
*	Revert "[AMDGPU] Run always inliner early in opt"	Konstantin Zhuravlyov	2017-03-20	1	-1/+0
\| \| \| \| \| \|	This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239
*	[AMDGPU] Add address space based alias analysis pass	Stanislav Mekhanoshin	2017-03-17	1	-0/+16
\| \| \| \| \| \| \| \| \|	This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
*	Only unswitch loops with uniform conditions	Stanislav Mekhanoshin	2017-03-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104
*	[AMDGPU] Run always inliner early in opt	Stanislav Mekhanoshin	2017-03-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958
*	AMDGPU: Merge initial gfx9 support	Matt Arsenault	2017-02-18	1	-1/+2
\| \| \| \|	llvm-svn: 295554
*	[AMDGPU] Revert failed scheduling	Stanislav Mekhanoshin	2017-02-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
*	AMDGPU: Add pass to expand memcpy/memmove/memset	Matt Arsenault	2017-02-09	1	-0/+3
\| \| \| \|	llvm-svn: 294635
*	AMDGPU: Enable InferAddressSpaces	Matt Arsenault	2017-02-08	1	-0/+1
\| \| \| \|	llvm-svn: 294408
*	Re-commit AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-6/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
*	[AMDGPU] Internalize non-kernel symbols	Stanislav Mekhanoshin	2017-01-30	1	-2/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549
*	AMDGPU: Run AMDGPUCodeGenPrepare after inlining	Matt Arsenault	2017-01-30	1	-9/+9
\| \| \| \| \| \| \|	With leaf functions, this makes nonsensical decisions based on the uniformity of the arguments. llvm-svn: 293525
*	Revert "AMDGPU/GlobalISel: Add support for simple shaders"	Tom Stellard	2017-01-30	1	-27/+6
\| \| \| \| \| \| \| \|	This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
*	AMDGPU/GlobalISel: Add support for simple shaders	Tom Stellard	2017-01-30	1	-6/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
*	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass	Stanislav Mekhanoshin	2017-01-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300
*	Replace addEarlyAsPossiblePasses callback with adjustPassManager	Stanislav Mekhanoshin	2017-01-26	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
*	AMDGPU: Implement early ifcvt target hooks.	Matt Arsenault	2017-01-25	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
*	[AMDGPU] Add VGPR copies post regalloc fix pass	Stanislav Mekhanoshin	2017-01-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
*	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵	Eugene Zelenko	2016-12-12	1	-17/+26
\| \| \| \| \| \|	You Use warnings; other minor fixes (NFC). llvm-svn: 289475
*	[AMDGPU] Add amdgpu-unify-metadata pass	Stanislav Mekhanoshin	2016-12-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092
*	[AMDGPU] Scalarization of global uniform loads.	Alexander Timofeev	2016-12-08	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076
*	AMDGPU: Don't required structured CFG	Matt Arsenault	2016-12-06	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744
*	MachineScheduler: Export function to construct "default" scheduler.	Matthias Braun	2016-11-28	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
*	Revert "AMDGPU: Enable ConstrainCopy DAG mutation"	Konstantin Zhuravlyov	2016-11-17	1	-3/+0
\| \| \| \| \| \| \| \|	This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
*	AMDGPU: Enable ConstrainCopy DAG mutation	Matt Arsenault	2016-11-16	1	-0/+3
\| \| \| \| \| \| \|	This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
*	AMDGPU: Enable store clustering	Matt Arsenault	2016-11-15	1	-1/+8
\| \| \| \| \| \| \|	Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
*	Move the global variables representing each Target behind accessor function	Mehdi Amini	2016-10-09	1	-2/+2
\| \| \| \| \| \| \| \|	This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
*	BranchRelaxation: Support expanding unconditional branches	Matt Arsenault	2016-10-06	1	-0/+1
\| \| \| \| \| \| \|	AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
*	[AMDGPU] Pass optimization level to SelectionDAGISel	Konstantin Zhuravlyov	2016-10-03	1	-1/+1
\| \| \| \|	llvm-svn: 283133
*	[AMDGPU] Do not run scalar optimization passes at "-O0"	Konstantin Zhuravlyov	2016-09-30	1	-2/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D25055 llvm-svn: 282873
*	AMDGPU: Partially fix control flow at -O0	Matt Arsenault	2016-09-29	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
*	AMDGPU: Run LoadStoreVectorizer pass by default	Matt Arsenault	2016-09-09	1	-1/+1
\| \| \| \|	llvm-svn: 281112
*	AMDGPU/SI: Implement a custom MachineSchedStrategy	Tom Stellard	2016-08-29	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
*	AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler	Tom Stellard	2016-08-29	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991