summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] Add vector demanded elements support to ComputeNumSignBitsSimon Pilgrim2017-03-312-2/+4
| | | | | | | | | | | | | | Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
* [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patternsSam Kolton2017-03-314-43/+75
| | | | | | | | | | | | | | | | | Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-312-1/+2
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* AMDGPU: Rename isKernelMatt Arsenault2017-03-303-6/+22
| | | | | | | | What we really want to do is distinguish functions that may be called by other functions, and graphics shaders are not called kernels. llvm-svn: 299140
* AMDGPU: Add all atomicrmw fields to atomic.inc/decMatt Arsenault2017-03-301-2/+5
| | | | | | Add scope, order, isVolatile llvm-svn: 299122
* [AMDGPU] Add GlobalOpt parameter to Always Inliner passStanislav Mekhanoshin2017-03-303-7/+11
| | | | | | | | | If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108
* [AMDGPU] Tidy up ↵Simon Pilgrim2017-03-291-13/+6
| | | | | | | | computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991
* [AMDGPU] Boost unroll threshold for loops reading local memoryStanislav Mekhanoshin2017-03-281-30/+72
| | | | | | | | | | | | | This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 llvm-svn: 298948
* [AMDGPU] Fix recorded region boundaries in max-occupancy schedulerStanislav Mekhanoshin2017-03-282-17/+7
| | | | | | | | | | This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945
* [AMDGPU] Split -amdgpu-early-inline-all optionStanislav Mekhanoshin2017-03-281-3/+13
| | | | | | | | | | Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935
* [AMDGPU] Update SI scheduler colorHighLatenciesGroupsValery Pykhtin2017-03-282-22/+100
| | | | | | | | | | Depends on rL298896: MachineScheduler/ScheduleDAG: Add support for GetSubGraph Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30152 llvm-svn: 298902
* [AMDGPU] SISched: Detect dependency types between blocksValery Pykhtin2017-03-272-26/+39
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30153 llvm-svn: 298872
* [AMDGPU] SISched: Update colorEndsAccordingToDependenciesValery Pykhtin2017-03-271-0/+14
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30150 llvm-svn: 298861
* [AMDGPU] Fix SI scheduler LiveOut Refcount issueValery Pykhtin2017-03-272-0/+26
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30145 llvm-svn: 298857
* [AMDGPU][MC] Fix for Bug 28207 + LIT testsDmitry Preobrazhensky2017-03-275-17/+95
| | | | | | | | | | Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-2739-290/+446
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* [AMDGPU] Switch data layout by triple environment amdgizYaxun Liu2017-03-251-1/+6
| | | | | | | | | | | | Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758
* AMDGPU: Fix annotating loops with nested loop conditionsMatt Arsenault2017-03-241-9/+21
| | | | | | | | If the branch condition for a loop was a phi which itself was fed from a phi from a loop, it isn't safe to try to delete the phi until after the loop is handled. llvm-svn: 298737
* AMDGPU: Implement f16 froundMatt Arsenault2017-03-243-14/+20
| | | | llvm-svn: 298730
* AMDGPU: Unify divergent function exits.Matt Arsenault2017-03-247-15/+254
| | | | | | | | | | StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
* [AMDGPU] Fold V_CNDMASK with identical source operandsStanislav Mekhanoshin2017-03-241-0/+29
| | | | | | | | Such instructions sometimes appear after lowering and folding. Differential Revision: https://reviews.llvm.org/D31318 llvm-svn: 298723
* [AMDGPU] Rename Kind to ValueKind in metadata to be consistentKonstantin Zhuravlyov2017-03-242-2/+2
| | | | llvm-svn: 298722
* [AMDGPU] Add AMDGPUAliasAnalysis to opt pipelineStanislav Mekhanoshin2017-03-241-1/+24
| | | | | | | | Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721
* [AMDGPU] Don't enforce constexpr, there are still old standard libraries ↵Benjamin Kramer2017-03-241-4/+4
| | | | | | around that don't have a constexpr std::pair. llvm-svn: 298719
* [AMDGPU] Remove double map lookups in SI schedulerValery Pykhtin2017-03-241-25/+8
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30382 llvm-svn: 298718
* [AMDGPU] Fix SGPR usage count in SI schedulerValery Pykhtin2017-03-241-2/+2
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30149 llvm-svn: 298710
* [AMDGPU] Add a new line after a debug messageValery Pykhtin2017-03-241-0/+1
| | | | | | | | Patch by Axel Davy (axel.davy@normalesup.org) Differential revision: https://reviews.llvm.org/D30146 llvm-svn: 298708
* Don't build up std::vectors with constant sizes when an array suffices.Benjamin Kramer2017-03-241-2/+6
| | | | | | NFC. llvm-svn: 298701
* [AMDGPU] Do not emit isa info as code object metadataKonstantin Zhuravlyov2017-03-227-141/+20
| | | | | | | | - It was decided to expose this information through other means (rocr) Differential Revision: https://reviews.llvm.org/D30970 llvm-svn: 298560
* [AMDGPU] Emit kernel debug properties as code object metadataKonstantin Zhuravlyov2017-03-223-9/+110
| | | | | | Differential Revision: https://reviews.llvm.org/D30969 llvm-svn: 298558
* [AMDGPU] Emit kernel code properties as code object metadataKonstantin Zhuravlyov2017-03-227-45/+174
| | | | | | | | - These are not required for low level runtime Differential Revision: https://reviews.llvm.org/D29949 llvm-svn: 298556
* [AMDGPU] Restructure code object metadata creationKonstantin Zhuravlyov2017-03-2213-883/+1134
| | | | | | | | | | | | | | | | | - Rename runtime metadata -> code object metadata - Make metadata not flow - Switch enums to use ScalarEnumerationTraits - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc - Introduce in-memory representation for attributes - Code object metadata streamer - Create metadata for isa and printf during EmitStartOfAsmFile - Create metadata for kernel during EmitFunctionBodyStart - Finalize and emit metadata to .note during EmitEndOfAsmFile - Other minor improvements/bug fixes Differential Revision: https://reviews.llvm.org/D29948 llvm-svn: 298552
* [AMDGPU] Fix bug 31610Konstantin Zhuravlyov2017-03-222-7/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D31258 llvm-svn: 298551
* [AMDGPU][MC] Fix for Bug 28204 + LIT testsDmitry Preobrazhensky2017-03-221-8/+22
| | | | | | | | | | Fixed v_mad_i64_i32/u64_u32 encoding Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30828 llvm-svn: 298502
* AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOGMatt Arsenault2017-03-211-1/+0
| | | | llvm-svn: 298454
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-218-16/+32
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* Let llvm.objectsize be conservative with null pointersGeorge Burgess IV2017-03-211-2/+2
| | | | | | | | | | | This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430
* AMDGPU: Buffer descriptor changes for GFX9Marek Olsak2017-03-211-7/+13
| | | | | | | | | | Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397
* AMDGPU: Always use VGPR indexing on GFX9Marek Olsak2017-03-213-3/+7
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
* Rename AttributeSet to AttributeListReid Kleckner2017-03-212-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* AMDGPU: Fix not including v2i16/v2f16 in register classMatt Arsenault2017-03-211-1/+1
| | | | llvm-svn: 298390
* AMDGPU: Fix asserting on 0 dmask for image intrinsicsMatt Arsenault2017-03-211-0/+58
| | | | | | Fold these to undef during lowering so users get eliminated. llvm-svn: 298387
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-2110-3/+1474
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* [ADMGPU] SDWA peephole optimization pass.Sam Kolton2017-03-217-1/+720
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
* [AMDGPU] Run always inliner early in optKonstantin Zhuravlyov2017-03-201-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281
* [AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT testsDmitry Preobrazhensky2017-03-201-8/+34
| | | | | | | | | | This fix enables sp3 abs modifier with constants Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30825 llvm-svn: 298265
* [AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT testsDmitry Preobrazhensky2017-03-202-25/+108
| | | | | | | | | | Fixed several related issues with VOP3 fp modifiers. Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30821 llvm-svn: 298255
* Revert "[AMDGPU] Run always inliner early in opt"Konstantin Zhuravlyov2017-03-201-1/+0
| | | | | | This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239
* Fix MSVC warning: "switch statement contains 'default' but no 'case' ↵Simon Pilgrim2017-03-191-4/+1
| | | | | | labels". NFCI. llvm-svn: 298225
* [AMDGPU] Add address space based alias analysis passStanislav Mekhanoshin2017-03-175-0/+223
| | | | | | | | | This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
OpenPOWER on IntegriCloud