summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.Artem Tamazov2016-10-071-0/+10
| | | | | | | | | Partially fixes Bug 28232. Lit tests added. Differential Revision: https://reviews.llvm.org/D25367 llvm-svn: 283567
* [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx ↵Sam Kolton2016-10-078-54/+139
| | | | | | | | | | | | to AMDGPUBaseInfo.h Reviewers: artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25084 llvm-svn: 283560
* [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'Konstantin Zhuravlyov2016-10-071-1/+1
| | | | llvm-svn: 283558
* [AMDGPU] Promote uniform (i1, i16] operations to i32Konstantin Zhuravlyov2016-10-071-97/+101
| | | | | | Differential Revision: https://reviews.llvm.org/D25302 llvm-svn: 283555
* AMDGPU: Fix use-after-free in SIOptimizeExecMaskingNicolai Haehnle2016-10-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There was a bug with sequences like s_mov_b64 s[0:1], exec s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill> ... s_mov_b64_term exec, s[2:3] because s[2:3] was defined and used in the same instruction, ending up with SaveExecInst inside OtherUseInsts. Note that the test case also exposes an unrelated bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028 Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25306 llvm-svn: 283528
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-073-33/+0
| | | | llvm-svn: 283515
* AMDGPU: Don't fold undef uses or copies with implicit usesMatt Arsenault2016-10-061-4/+22
| | | | llvm-svn: 283476
* AMDGPU: Remove scheduling info from si_mask_branchMatt Arsenault2016-10-061-0/+2
| | | | llvm-svn: 283475
* AMDGPU: Remove leftover implicit operands when folding immediatesMatt Arsenault2016-10-061-7/+26
| | | | | | | | When constant folding an operation to a copy or an immediate mov, the implicit uses/defs of the old instruction were left behind, e.g. replacing v_or_b32 left the implicit exec use on the new copy. llvm-svn: 283471
* Reapply "AMDGPU: Support using tablegened MC pseudo expansions"Matt Arsenault2016-10-065-44/+75
| | | | | | Fix bad merge llvm-svn: 283470
* Revert "AMDGPU: Support using tablegened MC pseudo expansions"Matt Arsenault2016-10-065-68/+44
| | | | llvm-svn: 283469
* AMDGPU: Support using tablegened MC pseudo expansionsMatt Arsenault2016-10-065-44/+68
| | | | | | Make the necessary refactorings to make use of PseudoInstExpansion llvm-svn: 283467
* BranchRelaxation: Support expanding unconditional branchesMatt Arsenault2016-10-069-17/+271
| | | | | | | AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
* [AMDGPU] Disassembler: print label names in branch instructionsSam Kolton2016-10-063-66/+156
| | | | | | | | | | | | | Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table. Initialize MCObjectFileInfo with some default values. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D24802 llvm-svn: 283450
* AMDGPU: Partially fix reported code size for some instructionsMatt Arsenault2016-10-064-4/+8
| | | | | | | | These ones need to have the size on the pseudo instruction set for getInstSizeInBytes to work correctly. These also have a statically known size. llvm-svn: 283437
* [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32Konstantin Zhuravlyov2016-10-061-11/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D25121 llvm-svn: 283415
* AMDGPU: Do not re-use tmpreg in spill/restore loweringMatthias Braun2016-10-051-2/+2
| | | | | | | | | The register scavenging code does not support multiple definitions of the same vreg. Differential Revision: https://reviews.llvm.org/D25220 llvm-svn: 283369
* AMDGPU: Refactor indirect vector loweringMatt Arsenault2016-10-041-36/+42
| | | | | | | Allow inserting multiple instructions in the expanded loop. llvm-svn: 283177
* AMDGPU: Factor SGPR spilling into separate functionsMatt Arsenault2016-10-042-129/+166
| | | | llvm-svn: 283175
* [AMDGPU] Pass optimization level to SelectionDAGISelKonstantin Zhuravlyov2016-10-033-8/+11
| | | | llvm-svn: 283133
* [AMDGPU] Sign extend AShr when promoting (instead of zero extending)Konstantin Zhuravlyov2016-10-031-2/+2
| | | | llvm-svn: 283130
* AMDGPU: Fix typoMatt Arsenault2016-10-031-1/+1
| | | | llvm-svn: 283108
* Add new target hooks for LoadStoreVectorizerVolkan Keles2016-10-032-2/+2
| | | | | | | | | | | | Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099
* [AMDGPU] Remove unused variables from SIOptimizeExecMaskingKonstantin Zhuravlyov2016-10-031-3/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D25110 llvm-svn: 283087
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-0130-56/+34
| | | | llvm-svn: 283004
* Revert "AMDGPU: Don't use offen if it is 0"Mehdi Amini2016-10-012-100/+14
| | | | | | | This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003
* AMDGPU: Don't use offen if it is 0Matt Arsenault2016-10-012-14/+100
| | | | | | This removes many re-initializations of a base register to 0. llvm-svn: 282999
* [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa versionKonstantin Zhuravlyov2016-09-305-16/+69
| | | | | | Differential Revision: https://reviews.llvm.org/D24973 llvm-svn: 282877
* [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier ↵Konstantin Zhuravlyov2016-09-302-2/+9
| | | | | | | | instruction Differential Revision: https://reviews.llvm.org/D24985 llvm-svn: 282875
* [AMDGPU] Do not run scalar optimization passes at "-O0"Konstantin Zhuravlyov2016-09-301-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D25055 llvm-svn: 282873
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-306-19/+19
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Partially fix control flow at -O0Matt Arsenault2016-09-297-21/+426
| | | | | | | | | | | | | | | Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
* [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit ↵Konstantin Zhuravlyov2016-09-282-3/+238
| | | | | | | | instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-09-281-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-09-281-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600
* [AMDGPU] Enable changing instprinter's behavior based on the per-functionKonstantin Zhuravlyov2016-09-273-132/+214
| | | | | | | | | | subtarget This is a prerequisite for coming waitcnt changes Differential Revision: https://reviews.llvm.org/D24939 llvm-svn: 282489
* AMDGPU/SI: Don't crash on anonymous GlobalValuesTom Stellard2016-09-263-7/+14
| | | | | | | | | | | | | | Summary: We need to call AsmPrinter::getNameWithPrefix() in order to handle anonymous GlobalValues (e.g. @0, @1). Reviewers: arsenm, b-sumner Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D24865 llvm-svn: 282420
* Revert "[AMDGPU] Disassembler: print label names in branch instructions"Sam Kolton2016-09-263-156/+66
| | | | | | This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550. llvm-svn: 282396
* [AMDGPU] Disassembler: print label names in branch instructionsSam Kolton2016-09-263-66/+156
| | | | | | | | | | | | Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D24802 llvm-svn: 282394
* [AMDGPU] Fix for bz30427: wrong MTBUF encoding on VIValery Pykhtin2016-09-231-6/+10
| | | | | | Differential revision: https://reviews.llvm.org/D24875 llvm-svn: 282296
* [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitionsValery Pykhtin2016-09-2311-1691/+1379
| | | | | | Differential revision: https://reviews.llvm.org/D24738 llvm-svn: 282234
* AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_sizeTom Stellard2016-09-233-1/+25
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 llvm-svn: 282223
* [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.Artem Tamazov2016-09-221-35/+22
| | | | | | | | | Also added range checking for DPP attributes. Assembler tests added as well. Differential Revision: https://reviews.llvm.org/D24755 llvm-svn: 282145
* [AMDGPU][mc] Add support for ds_add_[rtn_]f32.Artem Tamazov2016-09-211-0/+5
| | | | | | | | | Lit tests added. Resolves https://github.com/RadeonOpenCompute/hcc/issues/122. Differential Revision: https://reviews.llvm.org/D24765 llvm-svn: 282086
* GlobalISel: pass Function to lowerFormalArguments directly (NFC).Tim Northover2016-09-212-7/+5
| | | | | | | | The only implementation that exists immediately looks it up anyway, and the information is needed to handle various parameter attributes (stored on the function itself). llvm-svn: 282068
* [AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter.Sam Kolton2016-09-211-25/+0
| | | | | | | | | | | | Summary: It is replaced by AMDGPUELFObjectWriter Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl Differential Revision: https://reviews.llvm.org/D24654 llvm-svn: 282065
* [AMDGPU] Refactor VOP3 instruction TD definitionsValery Pykhtin2016-09-206-373/+448
| | | | | | Differential revision: https://reviews.llvm.org/D24664 llvm-svn: 281965
* [AMDGPU] Refactor VOPC instruction TD definitionsValery Pykhtin2016-09-196-648/+1118
| | | | | | Differential Revision: https://reviews.llvm.org/D24546 llvm-svn: 281903
* [AMDGPU] Fix s_branch with -1 offsetSam Kolton2016-09-191-5/+2
| | | | | | | | | | | | | | | | | | | Summary: In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0. ''' label: s_branch label // should emit [0xff,0xff,0x82,0xbf] ''' Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result. Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl Differential Revision: https://reviews.llvm.org/D24671 llvm-svn: 281896
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-175-99/+19
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
OpenPOWER on IntegriCloud