summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Merging r340959:Hans Wennborg2018-09-041-14/+25
| | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340959 | mareko | 2018-08-29 22:03:00 +0200 (Wed, 29 Aug 2018) | 9 lines AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51203 ------------------------------------------------------------------------ llvm-svn: 341351
* Merging r340417:Hans Wennborg2018-08-301-0/+12
| | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340417 | hakzsam | 2018-08-22 18:08:48 +0200 (Wed, 22 Aug 2018) | 14 lines AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 ------------------------------------------------------------------------ llvm-svn: 341041
* Merging r340416:Hans Wennborg2018-08-301-0/+12
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340416 | hakzsam | 2018-08-22 18:08:43 +0200 (Wed, 22 Aug 2018) | 8 lines AMDGPU: fix existing alias rules for constant and global Constant and global may alias, also one rules table wasn't ordered correctly. Pinpointed by Matt. v2: add a test with swapped parameters ------------------------------------------------------------------------ llvm-svn: 341040
* Merging r339600:Hans Wennborg2018-08-141-0/+30
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339600 | scott.linder | 2018-08-13 20:44:21 +0200 (Mon, 13 Aug 2018) | 8 lines [CodeGen] Fix assert in SelectionDAG::computeKnownBits Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 ------------------------------------------------------------------------ llvm-svn: 339664
* Merging r339190:Hans Wennborg2018-08-081-3/+90
| | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339190 | jvesely | 2018-08-07 23:54:37 +0200 (Tue, 07 Aug 2018) | 12 lines AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 ------------------------------------------------------------------------ llvm-svn: 339235
* Merging r338610:Hans Wennborg2018-08-072-18/+10
| | | | | | | | | | | | ------------------------------------------------------------------------ r338610 | jvesely | 2018-08-01 20:36:07 +0200 (Wed, 01 Aug 2018) | 3 lines AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) ------------------------------------------------------------------------ llvm-svn: 339105
* [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zeroRyan Taylor2018-08-011-0/+113
| | | | | | | | | | | | | | | Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523
* AMDGPU: Add clamp bit to dot intrinsicsKonstantin Zhuravlyov2018-08-017-35/+155
| | | | | | Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470
* AMDGPU: Split amdgcn/r600 fminnum/fmaxnum testsMatt Arsenault2018-07-314-443/+667
| | | | | | | R600 breaks on too many things to usefully test changes with ieee_mode on vs. off. llvm-svn: 338435
* AMDGPU: Break 64-bit arguments into 32-bit piecesMatt Arsenault2018-07-311-7/+43
| | | | llvm-svn: 338421
* AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on callsMatt Arsenault2018-07-313-13/+71
| | | | | | | This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418
* AMDGPU: Scalarize vector argument types to callsMatt Arsenault2018-07-313-32/+71
| | | | | | | | | | | | | | | | | When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
* DAG: Fix PromoteFloatResult for fcanonicalizeMatt Arsenault2018-07-311-83/+101
| | | | llvm-svn: 338382
* AMDGPU: Fold undef fcanonicalize to qNaNMatt Arsenault2018-07-311-0/+9
| | | | | | | | | | We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376
* AMDGPU: Fix test check line bugsMatt Arsenault2018-07-313-23/+32
| | | | llvm-svn: 338374
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-304-48/+71
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU: Make fneg combine handle fcanonicalizeMatt Arsenault2018-07-301-0/+21
| | | | llvm-svn: 338243
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-303-1/+84
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* AMDGPU: Stop wasting argument registers with v3i32/v3f32Matt Arsenault2018-07-285-5/+170
| | | | | | | | | | SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
* AMDGPU: Stop trying to extend arguments for cloverMatt Arsenault2018-07-286-265/+539
| | | | | | | This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193
* AMDGPU/R600: Add MOV instructions to BFE patternsJan Vesely2018-07-271-0/+175
| | | | | | | | | R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127
* AMDGPU: Fix code size for return_to_epilog pseudoMatt Arsenault2018-07-271-0/+6
| | | | llvm-svn: 338113
* AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 typesTom Stellard2018-07-271-0/+17
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102
* [AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bitsScott Linder2018-07-261-0/+213
| | | | | | | | | | Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060
* [AMDGPU] Use AssumptionCacheTracker in the divrem32 expansionStanislav Mekhanoshin2018-07-251-0/+43
| | | | | | Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938
* AMDGPU/GlobalISel: Legalize G_INSERTTom Stellard2018-07-241-0/+123
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
* Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Matt Arsenault2018-07-204-21/+206
| | | | | | Reverts r337079 with fix for msan error. llvm-svn: 337535
* More fixes for subreg join failure in RegCoalescerTim Renouf2018-07-171-0/+319
| | | | | | | | | | | | | | | | | | | | | | Summary: Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange intervals when updating. 2 changes. The first to ensure that bogus SubRange Segments aren't propagated when encountering Segments of the form [1234r, 1234d:0) when preparing to merge value numbers. These can be removed in this case. The second forces a shrinkToUses call if SubRanges end on the copy index (instead of just the parent register). V2: Addressed review comments, plus MIR test instead of ll test Subscribers: MatzeB, qcolombet, nhaehnle Differential Revision: https://reviews.llvm.org/D40308 Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05 llvm-svn: 337273
* [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELTSimon Pilgrim2018-07-171-29/+19
| | | | | | | | If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-161-0/+232
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* run post-RA hazard recognizer pass lateMark Searles2018-07-163-9/+51
| | | | | | | | | | | | | Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
* [DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)Sanjay Patel2018-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130
* [AMDGPU] adjusted test checks because minnum with NaN gets simplifiedSanjay Patel2018-07-151-4/+5
| | | | | | | | This was improved with rL337127, but I missed the failure in this test. I'm not sure what the expected result will be, so I've generalized it and added a FIXME comment. llvm-svn: 337128
* Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"Evgeniy Stepanov2018-07-144-206/+21
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
* AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnumTom Stellard2018-07-132-0/+131
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056
* AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.expTom Stellard2018-07-131-0/+33
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046
* AMDGPU: Fix handling of alignment padding in DAG argument loweringMatt Arsenault2018-07-134-21/+206
| | | | | | | | | | | | | | | | | This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
* AMDGPU: Fix assert in truncate combine with vectorsMatt Arsenault2018-07-121-0/+27
| | | | | | | The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935
* [CodeGen] Emit more precise AssertZext/AssertSext nodes.Eli Friedman2018-07-111-1/+1
| | | | | | | | | | | | This is marginally helpful for removing redundant extensions, and the code is easier to read, so it seems like an all-around win. In the new test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an AssertZext i2, which allows the extension of the return value to be eliminated. Differential Revision: https://reviews.llvm.org/D49004 llvm-svn: 336868
* [FileCheck] Add -allow-deprecated-dag-overlap to failing llvm testsJoel E. Denny2018-07-1138-121/+121
| | | | | | | | | | | | | | | | | | | See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843
* AMDGPU: Make hidden argument metadata consistent withKonstantin Zhuravlyov2018-07-103-46/+262
| | | | | | | | amdgpu-implicitarg-num-bytes attribute Differential Revision: https://reviews.llvm.org/D49096 llvm-svn: 336697
* AMDGPU/NFC: Fix typo in test nameKonstantin Zhuravlyov2018-07-101-0/+0
| | | | | | | hsa-metadata-enqueu-kernel.ll -> hsa-metadata-enqueue-kernel.ll llvm-svn: 336689
* Reapply "AMDGPU: Force inlining if LDS global address is used"Matt Arsenault2018-07-103-2/+87
| | | | | | This reverts commit r336623 llvm-svn: 336675
* Revert "AMDGPU: Force inlining if LDS global address is used"Vlad Tsyrklevich2018-07-103-87/+2
| | | | | | | This reverts commit r336587, it was causing test failures on the sanitizer bots. llvm-svn: 336623
* RenameIndependentSubregs: Fix handling of undef tied operandsMark Searles2018-07-091-0/+18
| | | | | | | | | Ensure that, if updating a tied operand pair, to only update that pair. Differential Revision: https://reviews.llvm.org/D49052 llvm-svn: 336593
* AMDGPU: Force inlining if LDS global address is usedMatt Arsenault2018-07-093-2/+87
| | | | | | | | | | These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. llvm-svn: 336587
* AMDGPU: Don't use spir_kernel in a testMatt Arsenault2018-07-051-3/+2
| | | | | | Also use verify-machineinstrs. llvm-svn: 336374
* AMDGPU/GlobalISel: Implement custom kernel arg loweringMatt Arsenault2018-07-052-20/+789
| | | | | | | | | | | | | Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. llvm-svn: 336373
* [AMDGPU] Add VALU to V_INTERP InstructionsRyan Taylor2018-07-051-0/+19
| | | | | | | | | | | | Wait states are not properly being inserted after buffer_store for v_interp instructions. Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can check and insert the appropriate wait states when needed. Differential Revision: https://reviews.llvm.org/D48772 Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64 llvm-svn: 336339
* AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.Tom Stellard2018-06-301-0/+20
| | | | | | | | | | | | | | | | | | | Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041
OpenPOWER on IntegriCloud