summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* fix trivial typos, NFCHiroshi Inoue2017-06-271-2/+2
| | | | | | succesor -> successor llvm-svn: 306393
* AMDGPU: M0 operands to spill/restore opcodes are deadNicolai Haehnle2017-06-271-2/+2
| | | | | | | | | | | | | | | | | Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-263-2/+78
| | | | llvm-svn: 306312
* AMDGPU/GlobalISel: Mark 32-bit G_SHL as legalTom Stellard2017-06-261-0/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
* AMDGPU: Whitespace fixesMatt Arsenault2017-06-264-6/+6
| | | | llvm-svn: 306265
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-266-30/+34
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Remove a processFixupValue hack.Rafael Espindola2017-06-242-35/+32
| | | | | | | | | | | The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200
* Remove redundant argument.Rafael Espindola2017-06-241-2/+2
| | | | llvm-svn: 306189
* Move Value adjustment to applyFixup. NFC.Rafael Espindola2017-06-231-2/+1
| | | | llvm-svn: 306178
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-231-2/+4
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
* AMDGPU/GlobalISel: Mark 32-bit G_AND as legalTom Stellard2017-06-231-0/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112
* [AMDGPU] Add intrinsics for tbuffer load and store - build error fixDavid Stuttard2017-06-221-2/+1
| | | | | | | Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-228-121/+535
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit ↵Sam Kolton2017-06-221-11/+15
| | | | | | | | | | | | | | | | encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-226-39/+127
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* [AMDGPU] Add FP_CLASS to the add/setcc combineStanislav Mekhanoshin2017-06-211-1/+3
| | | | | | | | This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970
* Use a MutableArrayRef. NFC.Rafael Espindola2017-06-211-4/+4
| | | | llvm-svn: 305968
* [AMDGPU] Combine add and adde, sub and subeStanislav Mekhanoshin2017-06-212-9/+81
| | | | | | | | | If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-214-0/+59
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* [AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failuresDmitry Preobrazhensky2017-06-214-11/+6
| | | | | | | | | | See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509 Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin Differential Revision: https://reviews.llvm.org/D34360 llvm-svn: 305923
* [AMDGPU][MC] Corrected V_*QSAD* instructions to check that dest register is ↵Dmitry Preobrazhensky2017-06-213-5/+84
| | | | | | | | | | | | different than any of the src See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D34003 llvm-svn: 305915
* [AMDGPU] SDWA: merge VI and GFX9 pseudo instructionsSam Kolton2017-06-2115-281/+323
| | | | | | | | | | | | Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
* AMDGPU: Allow vectorization of packed typesMatt Arsenault2017-06-202-8/+20
| | | | llvm-svn: 305844
* [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32Stanislav Mekhanoshin2017-06-201-0/+2
| | | | | | | | | If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840
* AMDGPU: Start adding global_* instructionsMatt Arsenault2017-06-206-6/+106
| | | | llvm-svn: 305838
* AMDGPU: Do operand folding in program orderMatt Arsenault2017-06-201-5/+3
| | | | | | | | | Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821
* AMDGPU: Preserve undef when folding register operandsMatt Arsenault2017-06-201-0/+2
| | | | | | | | If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816
* [AMDGPU] Eliminate SGPR to VGPR copy when possibleStanislav Mekhanoshin2017-06-201-0/+30
| | | | | | | | SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815
* AMDGPU: Fix crash with undef vreg input operandMatt Arsenault2017-06-201-1/+1
| | | | llvm-svn: 305814
* AMDGPU: Fix scratch wave offset relative FI expansionMatt Arsenault2017-06-191-9/+20
| | | | | | | | The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761
* [AMDGPU] Add infer address spaces pass before SROAStanislav Mekhanoshin2017-06-191-0/+8
| | | | | | | | | It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-195-34/+45
| | | | llvm-svn: 305748
* AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legalTom Stellard2017-06-191-0/+7
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34129 llvm-svn: 305692
* [AMDGPU] Testing commit access only, no real changeAlfred Huang2017-06-151-1/+1
| | | | llvm-svn: 305523
* DivergencyAnalysis patch for reviewAlexander Timofeev2017-06-153-1/+15
| | | | llvm-svn: 305494
* [AMDGPU] Remove now dead defaultOffsetS13(). NFCI.Davide Italiano2017-06-131-5/+0
| | | | | | Fixes the GCC7 build with -Werror. llvm-svn: 305329
* AMDGPU/GlobalISel: Mark 32-bit G_ADD as legalTom Stellard2017-06-121-0/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232
* AMDGPU: Don't add same implicit use multiple timesMatt Arsenault2017-06-121-4/+2
| | | | | | | For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205
* AMDGPU: Teach isLegalAddressingMode about flat offsetsMatt Arsenault2017-06-121-3/+11
| | | | | | | Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203
* AMDGPU: Start selecting flat instruction offsetsMatt Arsenault2017-06-122-18/+42
| | | | llvm-svn: 305201
* AMDGPU: Verify that flat offsets aren't used pre-GFX9Matt Arsenault2017-06-121-2/+11
| | | | | | | For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200
* AMDGPU: Start adding offset fields to flat instructionsMatt Arsenault2017-06-125-25/+94
| | | | llvm-svn: 305194
* Const correctness for TTI::getRegisterBitWidthDaniel Neilson2017-06-122-2/+2
| | | | | | | | | | | | | | Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189
* AMDGPU : Fix ISA Version Definitions.Wei Ding2017-06-104-27/+99
| | | | | | Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
* [AMDGPU] Add intrinsics for alignbit and alignbyte instructionsStanislav Mekhanoshin2017-06-091-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098
* [AMDGPU] Fix for issue in alloca to vector promotion passDavid Stuttard2017-06-091-6/+12
| | | | | | | | | | | | | | | Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079
* AMDGPU: Work around build special casing .inc filesMatt Arsenault2017-06-083-1/+7
| | | | | | | It complains because it assumes these were autogenerated files in the source directory. llvm-svn: 305005
* AMDGPU: Use correct register names in inline assemblyMatt Arsenault2017-06-083-0/+410
| | | | | | Fixes using physical registers in inline asm from clang. llvm-svn: 305004
* [AMDGPU] Force qsads instrs to use different dest register than source registersMark Searles2017-06-081-0/+5
| | | | | | | | The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes. Differential Revision: https://reviews.llvm.org/D33783 llvm-svn: 304998
* [AMDGPU][MC] Corrected error message for s_waitcnt helpersDmitry Preobrazhensky2017-06-071-12/+16
| | | | | | | | | | See Bug 32711: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D33781 llvm-svn: 304922
OpenPOWER on IntegriCloud