summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-071-2/+11
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* AMDGPU: Add macro fusion schedule DAG mutationMatt Arsenault2017-07-064-0/+86
| | | | | | Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
* AMDGPU: Minor cleanup of shrinking logicMatt Arsenault2017-07-061-8/+4
| | | | llvm-svn: 307312
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-062-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [Constants] If we already have a ConstantInt*, prefer to use ↵Craig Topper2017-07-061-1/+1
| | | | | | | | isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
* [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-052-48/+50
| | | | | | NFC llvm-svn: 307186
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-041-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* [AMDGPU] Fix latency of MIMG instructionsMarek Olsak2017-07-041-0/+1
| | | | | | Patch by cwabbott (Connor Abbott). llvm-svn: 307081
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-041-1/+1
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-031-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
* AMDGPU: Add operand target flags serializationMatt Arsenault2017-07-022-0/+26
| | | | llvm-svn: 306995
* fix trivial typos; NFCHiroshi Inoue2017-07-021-1/+1
| | | | | | suport -> support llvm-svn: 306968
* AMDGPU: Remove SITypeRewriterMatt Arsenault2017-06-284-159/+0
| | | | | | | This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-282-2/+3
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* [AMDGPU] Add pattern for v_alignbit_b32 with immediateStanislav Mekhanoshin2017-06-282-3/+6
| | | | | | | | If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500
* [AMDGPU] Add 2 new alignbit patternsStanislav Mekhanoshin2017-06-271-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449
* [AMDGPU] Simplify setcc (sext from i1 b), -1|0, ccStanislav Mekhanoshin2017-06-271-1/+29
| | | | | | | | | | | Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446
* [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0Stanislav Mekhanoshin2017-06-271-2/+28
| | | | | | | | | | Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-276-33/+43
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* fix trivial typos, NFCHiroshi Inoue2017-06-271-2/+2
| | | | | | succesor -> successor llvm-svn: 306393
* AMDGPU: M0 operands to spill/restore opcodes are deadNicolai Haehnle2017-06-271-2/+2
| | | | | | | | | | | | | | | | | Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-263-2/+78
| | | | llvm-svn: 306312
* AMDGPU/GlobalISel: Mark 32-bit G_SHL as legalTom Stellard2017-06-261-0/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
* AMDGPU: Whitespace fixesMatt Arsenault2017-06-264-6/+6
| | | | llvm-svn: 306265
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-266-30/+34
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Remove a processFixupValue hack.Rafael Espindola2017-06-242-35/+32
| | | | | | | | | | | The intention of processFixupValue is not to redefine the semantics of MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or not depending on what it looks like. At least it is a local hack now. I left a fix for anyone trying to figure out what producers should be producing a different expression. llvm-svn: 306200
* Remove redundant argument.Rafael Espindola2017-06-241-2/+2
| | | | llvm-svn: 306189
* Move Value adjustment to applyFixup. NFC.Rafael Espindola2017-06-231-2/+1
| | | | llvm-svn: 306178
* ARM: move some logic from processFixupValue to applyFixup.Rafael Espindola2017-06-231-2/+4
| | | | | | | | | | | | processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177
* AMDGPU/GlobalISel: Mark 32-bit G_AND as legalTom Stellard2017-06-231-0/+1
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112
* [AMDGPU] Add intrinsics for tbuffer load and store - build error fixDavid Stuttard2017-06-221-2/+1
| | | | | | | Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-228-121/+535
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit ↵Sam Kolton2017-06-221-11/+15
| | | | | | | | | | | | | | | | encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-226-39/+127
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* [AMDGPU] Add FP_CLASS to the add/setcc combineStanislav Mekhanoshin2017-06-211-1/+3
| | | | | | | | This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970
* Use a MutableArrayRef. NFC.Rafael Espindola2017-06-211-4/+4
| | | | llvm-svn: 305968
* [AMDGPU] Combine add and adde, sub and subeStanislav Mekhanoshin2017-06-212-9/+81
| | | | | | | | | If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-214-0/+59
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* [AMDGPU][MC][GFX9] Corrected VOP3P relevant code to fix disassembler failuresDmitry Preobrazhensky2017-06-214-11/+6
| | | | | | | | | | See Bug 33509: https://bugs.llvm.org//show_bug.cgi?id=33509 Reviewers: Sam Kolton, Artem Tamazov, Valery Pykhtin Differential Revision: https://reviews.llvm.org/D34360 llvm-svn: 305923
* [AMDGPU][MC] Corrected V_*QSAD* instructions to check that dest register is ↵Dmitry Preobrazhensky2017-06-213-5/+84
| | | | | | | | | | | | different than any of the src See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D34003 llvm-svn: 305915
* [AMDGPU] SDWA: merge VI and GFX9 pseudo instructionsSam Kolton2017-06-2115-281/+323
| | | | | | | | | | | | Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9. Reviewers: dp, arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, artem.tamazov Differential Revision: https://reviews.llvm.org/D34026 llvm-svn: 305886
* AMDGPU: Allow vectorization of packed typesMatt Arsenault2017-06-202-8/+20
| | | | llvm-svn: 305844
* [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32Stanislav Mekhanoshin2017-06-201-0/+2
| | | | | | | | | If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840
* AMDGPU: Start adding global_* instructionsMatt Arsenault2017-06-206-6/+106
| | | | llvm-svn: 305838
* AMDGPU: Do operand folding in program orderMatt Arsenault2017-06-201-5/+3
| | | | | | | | | Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821
* AMDGPU: Preserve undef when folding register operandsMatt Arsenault2017-06-201-0/+2
| | | | | | | | If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816
* [AMDGPU] Eliminate SGPR to VGPR copy when possibleStanislav Mekhanoshin2017-06-201-0/+30
| | | | | | | | SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815
* AMDGPU: Fix crash with undef vreg input operandMatt Arsenault2017-06-201-1/+1
| | | | llvm-svn: 305814
* AMDGPU: Fix scratch wave offset relative FI expansionMatt Arsenault2017-06-191-9/+20
| | | | | | | | The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761
* [AMDGPU] Add infer address spaces pass before SROAStanislav Mekhanoshin2017-06-191-0/+8
| | | | | | | | | It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759
OpenPOWER on IntegriCloud