summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-071-0/+12
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* AMDGPU: Add macro fusion schedule DAG mutationMatt Arsenault2017-07-0614-305/+422
| | | | | | Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
* AMDGPU: Remove unnecessary IR from MIR testsMatt Arsenault2017-07-067-358/+113
| | | | llvm-svn: 307311
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-062-43/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [RegisterCoalescer] Fix for SubRange join unreachableDavid Stuttard2017-07-061-0/+162
| | | | | | | | | | | | | | | | Summary: During remat, some subranges might end up having invalid segments which caused problems for later coalescing. Added in a check to remove segments that are invalidated as part of the remat. See http://llvm.org/PR33524 Subscribers: MatzeB, qcolombet Differential Revision: https://reviews.llvm.org/D34391 llvm-svn: 307247
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-04141-559/+789
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-04140-788/+558
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-03140-558/+788
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
* Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTORRichard Smith2017-06-301-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | This is a short-term fix for PR33650 aimed to get the modules build bots green again. Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR macros to try to locally specialize a global template for a global type. That's not how C++ works. Instead, we now centrally define how to format vectors of fundamental types and of string (std::string and StringRef). We use flow formatting for the former cases, since that's the obvious right thing to do; in the latter case, it's less clear what the right choice is, but flow formatting is really bad for some cases (due to very long strings), so we pick block formatting. (Many of the cases that were using flow formatting for strings are improved by this change.) Other than the flow -> block formatting change for some vectors of strings, this should result in no functionality change. Differential Revision: https://reviews.llvm.org/D34907 Corresponding updates to clang, clang-tools-extra, and lld to follow. llvm-svn: 306878
* AMDGPU: Remove SITypeRewriterMatt Arsenault2017-06-2813-405/+401
| | | | | | | This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603
* Fold fneg and fabs like multiplicationsStanislav Mekhanoshin2017-06-281-0/+37
| | | | | | | | | | | Given no NaNs and no signed zeroes it folds: (fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X)) (fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X) Differential Revision: https://reviews.llvm.org/D34579 llvm-svn: 306592
* [AMDGPU] Add pattern for v_alignbit_b32 with immediateStanislav Mekhanoshin2017-06-283-16/+42
| | | | | | | | If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500
* Allow to truncate left shift with non-constant shift amountStanislav Mekhanoshin2017-06-282-69/+74
| | | | | | | | | | | That is pretty common for clang to produce code like (shl %x, (and %amt, 31)). In this situation we can still perform trunc (shl) into shl (trunc) conversion given the known value range of shift amount. Differential Revision: https://reviews.llvm.org/D34723 llvm-svn: 306499
* [AMDGPU] Add 2 new alignbit patternsStanislav Mekhanoshin2017-06-271-0/+142
| | | | | | Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449
* [AMDGPU] Simplify setcc (sext from i1 b), -1|0, ccStanislav Mekhanoshin2017-06-271-0/+292
| | | | | | | | | | | Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446
* RenameIndependentSubregs: Fix infinite loopMatt Arsenault2017-06-271-0/+19
| | | | | | | | Apparently this replacement can really be substituting the same as the original register. Avoid restarting the loop when there's been no change in the register uses. llvm-svn: 306441
* [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0Stanislav Mekhanoshin2017-06-272-0/+47
| | | | | | | | | | Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439
* [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructionsSam Kolton2017-06-272-1/+447
| | | | | | | | | | | | | | Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413
* AMDGPU: M0 operands to spill/restore opcodes are deadNicolai Haehnle2017-06-271-0/+22
| | | | | | | | | | | | | | | | | Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375
* ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags.Matthias Braun2017-06-271-0/+45
| | | | | | | | | Remove invalid shortcut in fixupKills(): A register needs to be marked live even when we are not adding a kill flag. This is because a partially live register must not get a kill flags, but it still needs to be fully marked live when walking backwards. llvm-svn: 306352
* RenameIndependentSubregs: Fix iterator problemMatt Arsenault2017-06-261-0/+67
| | | | | | | | | | Fixes bug 33597. Use of substituteRegister in the tied operand case messes up the register use iterator, causing some uses to be left unprocessed. llvm-svn: 306333
* AMDGPU: Setup SP/FP in callee function prolog/epilogMatt Arsenault2017-06-262-3/+32
| | | | llvm-svn: 306312
* AMDGPU/GlobalISel: Mark 32-bit G_SHL as legalTom Stellard2017-06-261-0/+18
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-262-0/+59
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* Add missing %s to RUN line.Rafael Espindola2017-06-241-1/+1
| | | | llvm-svn: 306199
* Test the object file creation too.Rafael Espindola2017-06-241-0/+10
| | | | | | | This should *really* be a llvm-mc test, but the parser is broken. See PR33579 for the parser bug. llvm-svn: 306198
* AMDGPU/GlobalISel: Mark 32-bit G_AND as legalTom Stellard2017-06-231-0/+22
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34349 llvm-svn: 306112
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-228-25/+275
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit ↵Sam Kolton2017-06-221-0/+61
| | | | | | | | | | | | | | | | encoding Summary: Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA. There are no real instructions for them, but there are pseudo instructions. Reviewers: arsenm, vpykhtin, cfang Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34403 llvm-svn: 305999
* [AMDGPU] SDWA: add support for GFX9 in peephole passSam Kolton2017-06-226-77/+220
| | | | | | | | | | | | | | | | Summary: Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers. Added several subtarget features for GFX9 SDWA. This diff also contains changes from D34026. Depends D34026 Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34241 llvm-svn: 305986
* [AMDGPU] Add FP_CLASS to the add/setcc combineStanislav Mekhanoshin2017-06-211-0/+36
| | | | | | | | This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970
* [AMDGPU] Combine add and adde, sub and subeStanislav Mekhanoshin2017-06-211-0/+80
| | | | | | | | | If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-211-0/+43
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32Stanislav Mekhanoshin2017-06-201-0/+101
| | | | | | | | | If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding. Differential Revison: https://reviews.llvm.org/D34291 llvm-svn: 305840
* AMDGPU: Do operand folding in program orderMatt Arsenault2017-06-201-0/+47
| | | | | | | | | Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821
* RegisterScavenging: Followup to r305625Matthias Braun2017-06-202-21/+21
| | | | | | | | | | | | | This does some improvements/cleanup to the recently introduced scavengeRegisterBackwards() functionality: - Rewrite findSurvivorBackwards algorithm to use the existing LiveRegUnit::accumulateBackward() code. This also avoids the Available and Candidates bitset and just need 1 LiveRegUnit instance (= 1 bitset). - Pick registers in allocation order instead of register number order. llvm-svn: 305817
* AMDGPU: Preserve undef when folding register operandsMatt Arsenault2017-06-201-0/+6
| | | | | | | | If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816
* [AMDGPU] Eliminate SGPR to VGPR copy when possibleStanislav Mekhanoshin2017-06-206-8/+349
| | | | | | | | SGPRs are generally cheaper, so try to use them over VGPRs. Differential Revision: https://reviews.llvm.org/D34130 llvm-svn: 305815
* AMDGPU: Fix crash with undef vreg input operandMatt Arsenault2017-06-201-0/+21
| | | | llvm-svn: 305814
* AMDGPU: Fix scratch wave offset relative FI expansionMatt Arsenault2017-06-191-7/+47
| | | | | | | | The offset may not be an inline immediate, so this needs to be materialized into a register. The post-RA run of SIShrinkInstructions is able to fold it later if it can. llvm-svn: 305761
* [AMDGPU] Add infer address spaces pass before SROAStanislav Mekhanoshin2017-06-191-0/+10
| | | | | | | | | It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759
* AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legalTom Stellard2017-06-191-0/+23
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34129 llvm-svn: 305692
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2017-06-173-45/+50
| | | | | | | | | | | | | | | | | | | Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse to place spills as the very first instruciton of a basic block and thus artifically increase pressure (test in test/CodeGen/PowerPC/scavenging.mir:spill_at_begin) This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305625
* Revert "RegScavenging: Add scavengeRegisterBackwards()"Matthias Braun2017-06-163-50/+45
| | | | | | | | | Revert because of reports of some PPC input starting to spill when it was predicted that it wouldn't and no spillslot was reserved. This reverts commit r305516. llvm-svn: 305566
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2017-06-153-45/+50
| | | | | | | | | | | | | | | | | | Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64 problems reported in the stage2 build last time, which I cannot reproduce right now. This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305516
* DivergencyAnalysis patch for reviewAlexander Timofeev2017-06-152-0/+54
| | | | llvm-svn: 305494
* AMDGPU/GlobalISel: Mark 32-bit G_ADD as legalTom Stellard2017-06-121-0/+22
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33992 llvm-svn: 305232
* AMDGPU: Teach isLegalAddressingMode about flat offsetsMatt Arsenault2017-06-121-7/+116
| | | | | | | Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203
* AMDGPU: Start selecting flat instruction offsetsMatt Arsenault2017-06-123-54/+174
| | | | llvm-svn: 305201
* AMDGPU: Start adding offset fields to flat instructionsMatt Arsenault2017-06-128-66/+66
| | | | llvm-svn: 305194
OpenPOWER on IntegriCloud