summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields.Valery Pykhtin2016-06-234-23/+26
| | | | | | Differential Revision: http://reviews.llvm.org/D21380 llvm-svn: 273561
* [AMDGPU] Remove exit-on-error in test (PR27761)Diana Picus2016-06-232-2/+5
| | | | | | | | | | | | | | | | | The exit-on-error flag was necessary in order to avoid an assertion when handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize. We can avoid the assertion by creating some dummy nodes. This enables us to remove the exit-on-error flag on the first 2 run lines (SI), but on the third run line (R600) we would run into another assertion when trying to reserve indirect registers. This patch also replaces that assertion with an early exit from the function. Fixes PR27761. Differential Revision: http://reviews.llvm.org/D20852 llvm-svn: 273550
* AMDGPU: readlane/writelane do not read execMatt Arsenault2016-06-232-2/+26
| | | | llvm-svn: 273525
* AMDGPU: Fix liveness when expanding m0 loopMatt Arsenault2016-06-222-23/+67
| | | | llvm-svn: 273514
* AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32Changpeng Fang2016-06-221-0/+12
| | | | | | | | Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D21533 llvm-svn: 273496
* AMDGPU: Run verifier after 2nd run of SIShrinkInstructionsMatt Arsenault2016-06-221-1/+1
| | | | llvm-svn: 273469
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-2210-133/+217
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
* AMDGPU: Make FrameLowering stack alignment 16Matt Arsenault2016-06-221-3/+4
| | | | | | | We don't need it to be that high. The natural alignment for a single workitem's stack is 16. llvm-svn: 273448
* AMDGPU: Fix gcc warningsMatt Arsenault2016-06-224-197/+60
| | | | | | | Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363
* Delete more dead code.Rafael Espindola2016-06-213-191/+0
| | | | | | Found by gcc 6. llvm-svn: 273322
* AMDGPU: Add implicitarg.ptr intrinsic.Jan Vesely2016-06-213-11/+24
| | | | | | | | Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317
* Delete some dead code.Rafael Espindola2016-06-213-25/+0
| | | | | | Found by gcc 6. llvm-svn: 273303
* AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32Matt Arsenault2016-06-201-16/+13
| | | | | | | | | The implicit operand is added by the initial instruction construction, so this was adding an additional vcc use. The original one was missing the undef flag the original condition had, so the verifier would complain. llvm-svn: 273182
* AMDGPU: Fold more custom nodes to undefMatt Arsenault2016-06-201-11/+40
| | | | | | | | | | | This will help sneak undefs past GVN into the DAG for some tests. Also add missing intrinsic for rsq_legacy, even though the node was already selected to the instruction. Also start passing the debug location to intrinsic errors. llvm-svn: 273181
* Generalize DiagnosticInfoStackSize to support other limitsMatt Arsenault2016-06-201-3/+11
| | | | | | | Backends may want to report errors on resources other than stack size. llvm-svn: 273177
* AMDGPU: Use correct method for determining instruction sizeMatt Arsenault2016-06-201-2/+4
| | | | llvm-svn: 273172
* AMDGPU: Add support for R_AMDGPU_REL32 relocationsTom Stellard2016-06-202-1/+8
| | | | | | | | | | Reviewers: arsenm, kzhuravl, rafael Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21401 llvm-svn: 273168
* AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocationsTom Stellard2016-06-201-4/+15
| | | | | | | | | | Reviewers: arsenm, rafael, kzhuravl Subscribers: rafael, arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21400 llvm-svn: 273166
* Reformat blank lines.NAKAMURA Takumi2016-06-203-11/+0
| | | | llvm-svn: 273131
* Untabify.NAKAMURA Takumi2016-06-203-9/+6
| | | | llvm-svn: 273129
* AMDGPU: Fix kernel argument alignment impacting stack sizeMatt Arsenault2016-06-184-14/+29
| | | | | | | | Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080
* AMDGPU: Temporarily select trap to s_endpgmMatt Arsenault2016-06-173-0/+21
| | | | | | | | | | | | This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062
* AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress()Tom Stellard2016-06-171-1/+1
| | | | | | This change were suggested in http://reviews.llvm.org/D21154. llvm-svn: 273059
* AMDGPU: Remove llvm.SI.tid intrinsicMatt Arsenault2016-06-173-9/+0
| | | | | | Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050
* AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and ↵Changpeng Fang2016-06-163-15/+29
| | | | | | | | | | eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958
* AMDGPU: Fix maximum instruction size for amdgcnMatt Arsenault2016-06-161-1/+3
| | | | | | | This was causing the conservative estimate of inline asm size to be twice as big as expected. llvm-svn: 272956
* AMDGPU: Add v_mad 16-bit instructions definition.Wei Ding2016-06-162-0/+11
| | | | | | Differential Revision: http://reviews.llvm.org/D21362 llvm-svn: 272919
* [AMDGPU] Fix few coding style issues. NFC.Valery Pykhtin2016-06-152-23/+23
| | | | llvm-svn: 272785
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* AMDGPU/SI: Correctly encode constant expressionsTom Stellard2016-06-151-9/+23
| | | | | | | | | | | | | | | Summary: We we have an MCConstantExpr, we can encode it directly into the instruction instead of emitting fixups. Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21236 Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948 llvm-svn: 272750
* AMDGPU/AsmParser: Add support for parsing symbol operandsTom Stellard2016-06-151-2/+46
| | | | | | | | | | | | | | Summary: We can now reference symbols directly in operands, like this: s_mov_b32 s0, global Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21038 llvm-svn: 272748
* AMDGPU: Run pointer optimization passesMatt Arsenault2016-06-151-7/+46
| | | | llvm-svn: 272736
* IR: Introduce local_unnamed_addr attribute.Peter Collingbourne2016-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-1413-47/+76
| | | | | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
* Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"Tom Stellard2016-06-1413-73/+50
| | | | | | This reverts commit r272675. llvm-svn: 272677
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-1413-50/+73
| | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
* [AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly ↵Artem Tamazov2016-06-141-2/+15
| | | | | | | | | | | | source (.option.machine_version...) The feature allows for conditional assembly etc. TODO: make those symbols read-only. Test added. Differential Revision: http://reviews.llvm.org/D21238 llvm-svn: 272673
* AMDGPU/SI: Set INDEX_STRIDE for scratch coalescingMarek Olsak2016-06-132-3/+6
| | | | | | | | | | | | | | | | | Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556
* AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAllocMatt Arsenault2016-06-131-14/+16
| | | | | | | The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554
* Move instances of std::function.Benjamin Kramer2016-06-122-5/+5
| | | | | | Or replace with llvm::function_ref if it's never stored. NFC intended. llvm-svn: 272513
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-1212-138/+109
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocationsTom Stellard2016-06-102-1/+7
| | | | | | | | | | | | | | | Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417
* [AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning ↵Sam Kolton2016-06-105-257/+348
| | | | | | | | | | | | | | | | | | in AMDGPUOperand. Summary: sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported. Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier. Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers. Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...). Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20968 llvm-svn: 272384
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-06-1010-40/+40
| | | | llvm-svn: 272364
* AMDGPU: v_cndmask_b32 does not def vccMatt Arsenault2016-06-101-2/+2
| | | | | | Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351
* AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permuteTom Stellard2016-06-101-2/+2
| | | | | | | | | | | | | | | Summary: This fixes a bug with ds_*permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349
* AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMITom Stellard2016-06-091-64/+3
| | | | | | | | | | Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-094-41/+90
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-094-40/+82
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* AMDGPU: Run verifer after insert waits passMatt Arsenault2016-06-091-1/+1
| | | | llvm-svn: 272338
OpenPOWER on IntegriCloud