summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Enable some f32 fadd/fsub combines for f16Matt Arsenault2016-12-221-7/+12
| | | | llvm-svn: 290308
* AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16Matt Arsenault2016-12-221-0/+2
| | | | llvm-svn: 290307
* AMDGPU: Allow rcp and rsq usage with f16Matt Arsenault2016-12-222-4/+8
| | | | llvm-svn: 290302
* AMDGPU: Custom lower f16 fdivMatt Arsenault2016-12-222-1/+22
| | | | llvm-svn: 290301
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-223-0/+9
| | | | llvm-svn: 290300
* AMDGPU: Update isFPImmLegal for f16Matt Arsenault2016-12-221-1/+2
| | | | | | I don't think this matters because ConstantFP is legal. llvm-svn: 290299
* AMDGPU/SI: Fix file headerTom Stellard2016-12-211-1/+1
| | | | llvm-svn: 290265
* [AMDGPU] Garbage collect dead code. NFCI.Davide Italiano2016-12-211-15/+0
| | | | llvm-svn: 290249
* AMDGPU: Allow 16-bit types in inline asm constraintsMatt Arsenault2016-12-201-0/+2
| | | | llvm-svn: 290193
* AMDGPU: Don't add same instruction multiple times to worklistMatt Arsenault2016-12-201-1/+7
| | | | | | | | | When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191
* AMDGPU/SI: Make a function constTom Stellard2016-12-202-4/+3
| | | | llvm-svn: 290185
* AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*Tom Stellard2016-12-206-6/+77
| | | | | | | | | | Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
* AMDGPU/SI: Add a MachineMemOperand to MIMG instructionsTom Stellard2016-12-204-6/+57
| | | | | | | | | | | | | | | Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
* [AMDGPU] When unifying metadata, add operands to named metadata individuallyKonstantin Zhuravlyov2016-12-191-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27725 llvm-svn: 290114
* AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵Sam Kolton2016-12-194-72/+143
| | | | | | | | | | | | | | | | | | | | | | | | functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
* AMDGPU: Fix name for v_ashrrev_i16Matt Arsenault2016-12-161-3/+3
| | | | llvm-svn: 289967
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-153-0/+21
| | | | llvm-svn: 289877
* AMDGPU: Fix asserting on returned tail callsMatt Arsenault2016-12-151-2/+4
| | | | llvm-svn: 289868
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-153-6/+108
| | | | llvm-svn: 289866
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-1/+2
| | | | llvm-svn: 289822
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-151-1/+2
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* fix gcc warning about a superfluous ;Nico Weber2016-12-141-1/+1
| | | | llvm-svn: 289705
* Fix build failure due to r289674 on certain systemsYaxun Liu2016-12-141-1/+0
| | | | | | Removed a useless include which caused conflict. llvm-svn: 289700
* AMDGPU: Emit runtime metadata version 2 as YAMLYaxun Liu2016-12-147-403/+550
| | | | | | Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674
* AMDGPU: Make AllocationPriority of SGPRs higher than VGPRsMatt Arsenault2016-12-141-11/+13
| | | | | | | | Since SGPRs should spill to VGPRs, they should be allocated first. I don't think this is sufficient for SGPRs to always spill to VGPRs though. llvm-svn: 289671
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-141-0/+10
| | | | | | | | | | UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
* AMDGPU: Change vintrp printingMatt Arsenault2016-12-144-6/+37
| | | | llvm-svn: 289664
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-141-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-142-9/+9
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-128-95/+127
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289475
* AMDGPU: llvm.amdgcn.interp.mov is a source of divergenceNicolai Haehnle2016-12-121-0/+1
| | | | | | | | | | | | | | Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447
* AMDGPU: Fix asan errors when folding operandsMatt Arsenault2016-12-101-2/+2
| | | | | | | This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337
* AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecastsMatt Arsenault2016-12-101-1/+8
| | | | | | | The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-1019-222/+741
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU: Fix vintrp disassemblyMatt Arsenault2016-12-102-12/+12
| | | | llvm-svn: 289292
* AMDGPU: Change vintrp printing to better match scMatt Arsenault2016-12-102-12/+15
| | | | | | | Some of the immediates need to be printed differently eventually. llvm-svn: 289291
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-0913-134/+215
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* AMDGPU/SI: Remove XNACK feature from CIMarek Olsak2016-12-091-2/+1
| | | | | | | | | | | | Summary: CI doesn't have XNACK. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27175 llvm-svn: 289263
* AMDGPU/SI: Don't reserve XNACK when it's disabledMarek Olsak2016-12-092-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262
* AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objectsMarek Olsak2016-12-093-8/+22
| | | | | | | | | | | | Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261
* AMDGPU/SI: Allow using SGPRs 96-101 on VIMarek Olsak2016-12-094-20/+42
| | | | | | | | | | | | | | | Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260
* AMDGPU: Fix isTypeDesirableForOp for i16Matt Arsenault2016-12-091-4/+16
| | | | | | This should do nothing for targets without i16. llvm-svn: 289235
* AMDGPU: Fix i128 mulMatt Arsenault2016-12-091-0/+4
| | | | llvm-svn: 289231
* AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructionsMatt Arsenault2016-12-091-2/+2
| | | | | | Fixes assembler regressions. llvm-svn: 289230
* AMDGPU: Clean up instruction bitsMatt Arsenault2016-12-092-98/+117
| | | | | | | | | Sort the instruction bits by type and make sure there is one for each format. Also cleanup namespaces. llvm-svn: 289229
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-091-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-091-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221
* AMDGPU/SI: Don't mark VINTRP instructions as mayLoadTom Stellard2016-12-091-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219
* AMDGPU: Select i16 instructions to VOP3 formsMatt Arsenault2016-12-091-10/+10
| | | | | | | | | | | | | | These were selecting directly to the VOP2 form instead of VOP3 like the i32 instructions. Fixes regressions in future commits where an immediate isn't folded because it was initially used for the second operand. Because uniform 16-bit operations are promoted to i32, it's difficult to get a simple testcase where this matters. Fold failures in SIFoldOperands here tend to be hidden by commute and fold in SIShrinkInstructions. llvm-svn: 289189
* AMDGPU: Make f16 ConstantFP legalMatt Arsenault2016-12-083-16/+14
| | | | | | | | | | | | | | Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096
OpenPOWER on IntegriCloud