summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-233-22/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
* AMDGPU: Invert cmp + select with constantMatt Arsenault2016-12-221-0/+19
| | | | | | | | | | | Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
* AMDGPU: Use i16 for i16 shift amountMatt Arsenault2016-12-222-8/+10
| | | | llvm-svn: 290351
* AMDGPU: Fix missing 16-bit cmpx instructionsMatt Arsenault2016-12-221-0/+39
| | | | llvm-svn: 290349
* AMDGPU: Use i16 comparison instructionsMatt Arsenault2016-12-222-5/+43
| | | | llvm-svn: 290348
* AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assertMatt Arsenault2016-12-221-17/+4
| | | | | | | | Caused by dereferencing end iterator when trying to const cast the iterator. Patch by Martin Sherburn llvm-svn: 290347
* [AMDGPU] Add pseudo SDWA instructionsSam Kolton2016-12-225-85/+159
| | | | | | | | | | | | Summary: This is needed for later SDWA support in CodeGen. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27412 llvm-svn: 290338
* [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwaSam Kolton2016-12-224-5/+26
| | | | | | | | | | | | Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands. Reviewers: nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27847 llvm-svn: 290336
* AMDGPU: Fix missing commute table entries for cmpxMatt Arsenault2016-12-221-4/+4
| | | | | | No tests because these aren't currently used anywhere. llvm-svn: 290316
* AMDGPU: Swap order of operands in fadd/fsub combineMatt Arsenault2016-12-221-4/+4
| | | | | | | FMA is canonicalized to constant in the middle operand. Do the same so fmad matches and avoid an extra combine step. llvm-svn: 290313
* AMDGPU: Check fast math flags in fadd/fsub combinesMatt Arsenault2016-12-222-7/+15
| | | | llvm-svn: 290312
* AMDGPU: Form more FMAs if fusion is allowedMatt Arsenault2016-12-222-30/+46
| | | | | | | Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311
* AMDGPU: Move combines into separate functionsMatt Arsenault2016-12-222-152/+174
| | | | llvm-svn: 290309
* AMDGPU: Enable some f32 fadd/fsub combines for f16Matt Arsenault2016-12-221-7/+12
| | | | llvm-svn: 290308
* AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16Matt Arsenault2016-12-221-0/+2
| | | | llvm-svn: 290307
* AMDGPU: Allow rcp and rsq usage with f16Matt Arsenault2016-12-222-4/+8
| | | | llvm-svn: 290302
* AMDGPU: Custom lower f16 fdivMatt Arsenault2016-12-222-1/+22
| | | | llvm-svn: 290301
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-223-0/+9
| | | | llvm-svn: 290300
* AMDGPU: Update isFPImmLegal for f16Matt Arsenault2016-12-221-1/+2
| | | | | | I don't think this matters because ConstantFP is legal. llvm-svn: 290299
* AMDGPU/SI: Fix file headerTom Stellard2016-12-211-1/+1
| | | | llvm-svn: 290265
* [AMDGPU] Garbage collect dead code. NFCI.Davide Italiano2016-12-211-15/+0
| | | | llvm-svn: 290249
* AMDGPU: Allow 16-bit types in inline asm constraintsMatt Arsenault2016-12-201-0/+2
| | | | llvm-svn: 290193
* AMDGPU: Don't add same instruction multiple times to worklistMatt Arsenault2016-12-201-1/+7
| | | | | | | | | When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191
* AMDGPU/SI: Make a function constTom Stellard2016-12-202-4/+3
| | | | llvm-svn: 290185
* AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*Tom Stellard2016-12-206-6/+77
| | | | | | | | | | Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
* AMDGPU/SI: Add a MachineMemOperand to MIMG instructionsTom Stellard2016-12-204-6/+57
| | | | | | | | | | | | | | | Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
* [AMDGPU] When unifying metadata, add operands to named metadata individuallyKonstantin Zhuravlyov2016-12-191-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27725 llvm-svn: 290114
* AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵Sam Kolton2016-12-194-72/+143
| | | | | | | | | | | | | | | | | | | | | | | | functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
* AMDGPU: Fix name for v_ashrrev_i16Matt Arsenault2016-12-161-3/+3
| | | | llvm-svn: 289967
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-153-0/+21
| | | | llvm-svn: 289877
* AMDGPU: Fix asserting on returned tail callsMatt Arsenault2016-12-151-2/+4
| | | | llvm-svn: 289868
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-153-6/+108
| | | | llvm-svn: 289866
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-1/+2
| | | | llvm-svn: 289822
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-151-1/+2
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* fix gcc warning about a superfluous ;Nico Weber2016-12-141-1/+1
| | | | llvm-svn: 289705
* Fix build failure due to r289674 on certain systemsYaxun Liu2016-12-141-1/+0
| | | | | | Removed a useless include which caused conflict. llvm-svn: 289700
* AMDGPU: Emit runtime metadata version 2 as YAMLYaxun Liu2016-12-147-403/+550
| | | | | | Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674
* AMDGPU: Make AllocationPriority of SGPRs higher than VGPRsMatt Arsenault2016-12-141-11/+13
| | | | | | | | Since SGPRs should spill to VGPRs, they should be allocated first. I don't think this is sufficient for SGPRs to always spill to VGPRs though. llvm-svn: 289671
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-141-0/+10
| | | | | | | | | | UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
* AMDGPU: Change vintrp printingMatt Arsenault2016-12-144-6/+37
| | | | llvm-svn: 289664
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-141-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-142-9/+9
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-128-95/+127
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289475
* AMDGPU: llvm.amdgcn.interp.mov is a source of divergenceNicolai Haehnle2016-12-121-0/+1
| | | | | | | | | | | | | | Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447
* AMDGPU: Fix asan errors when folding operandsMatt Arsenault2016-12-101-2/+2
| | | | | | | This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337
* AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecastsMatt Arsenault2016-12-101-1/+8
| | | | | | | The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-1019-222/+741
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
OpenPOWER on IntegriCloud