summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Add v_mad 16-bit instructions definition.Wei Ding2016-06-162-0/+11
| | | | | | Differential Revision: http://reviews.llvm.org/D21362 llvm-svn: 272919
* [AMDGPU] Fix few coding style issues. NFC.Valery Pykhtin2016-06-152-23/+23
| | | | llvm-svn: 272785
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* AMDGPU/SI: Correctly encode constant expressionsTom Stellard2016-06-151-9/+23
| | | | | | | | | | | | | | | Summary: We we have an MCConstantExpr, we can encode it directly into the instruction instead of emitting fixups. Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21236 Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948 llvm-svn: 272750
* AMDGPU/AsmParser: Add support for parsing symbol operandsTom Stellard2016-06-151-2/+46
| | | | | | | | | | | | | | Summary: We can now reference symbols directly in operands, like this: s_mov_b32 s0, global Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21038 llvm-svn: 272748
* AMDGPU: Run pointer optimization passesMatt Arsenault2016-06-151-7/+46
| | | | llvm-svn: 272736
* IR: Introduce local_unnamed_addr attribute.Peter Collingbourne2016-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-1413-47/+76
| | | | | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
* Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"Tom Stellard2016-06-1413-73/+50
| | | | | | This reverts commit r272675. llvm-svn: 272677
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-1413-50/+73
| | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
* [AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly ↵Artem Tamazov2016-06-141-2/+15
| | | | | | | | | | | | source (.option.machine_version...) The feature allows for conditional assembly etc. TODO: make those symbols read-only. Test added. Differential Revision: http://reviews.llvm.org/D21238 llvm-svn: 272673
* AMDGPU/SI: Set INDEX_STRIDE for scratch coalescingMarek Olsak2016-06-132-3/+6
| | | | | | | | | | | | | | | | | Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556
* AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAllocMatt Arsenault2016-06-131-14/+16
| | | | | | | The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554
* Move instances of std::function.Benjamin Kramer2016-06-122-5/+5
| | | | | | Or replace with llvm::function_ref if it's never stored. NFC intended. llvm-svn: 272513
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-1212-138/+109
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocationsTom Stellard2016-06-102-1/+7
| | | | | | | | | | | | | | | Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417
* [AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning ↵Sam Kolton2016-06-105-257/+348
| | | | | | | | | | | | | | | | | | in AMDGPUOperand. Summary: sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported. Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier. Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers. Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...). Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20968 llvm-svn: 272384
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-06-1010-40/+40
| | | | llvm-svn: 272364
* AMDGPU: v_cndmask_b32 does not def vccMatt Arsenault2016-06-101-2/+2
| | | | | | Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351
* AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permuteTom Stellard2016-06-101-2/+2
| | | | | | | | | | | | | | | Summary: This fixes a bug with ds_*permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349
* AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMITom Stellard2016-06-091-64/+3
| | | | | | | | | | Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-094-41/+90
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-094-40/+82
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* AMDGPU: Run verifer after insert waits passMatt Arsenault2016-06-091-1/+1
| | | | llvm-svn: 272338
* AMDGPU: Remove incorrect assertionMatt Arsenault2016-06-091-4/+0
| | | | | | | I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337
* AMDGPU: Properly initialize SIShrinkInstructionsMatt Arsenault2016-06-093-8/+6
| | | | llvm-svn: 272336
* AMDGPU/SI: Fix 32-bit fdiv loweringWei Ding2016-06-091-16/+53
| | | | | | | | | We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292
* [AMDGPU] Disassembler: Support for sdwa instructionsSam Kolton2016-06-091-1/+5
| | | | | | | | | | Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21129 llvm-svn: 272255
* AMDGPU: Add amdgpu-ps-wqm-outputs function attributesNicolai Haehnle2016-06-071-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063
* Revert "Differential Revision: http://reviews.llvm.org/D20557"Eric Christopher2016-06-071-55/+17
| | | | | | | | | | | | | | | | Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056
* Differential Revision: http://reviews.llvm.org/D20557Wei Ding2016-06-071-17/+55
| | | | llvm-svn: 272044
* AMDGPU: Add function for getting instruction sizeMatt Arsenault2016-06-062-0/+51
| | | | llvm-svn: 271936
* AMDGPU: Fix constantexpr addrspacecastsMatt Arsenault2016-06-062-4/+71
| | | | | | | | If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. llvm-svn: 271935
* [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when ↵Artem Tamazov2016-06-062-14/+76
| | | | | | | | | | | | src2 == VCC. Another step for unification llvm assembler/disassembler with sp3. Besides, CodeGen output is a bit improved, thus changes in CodeGen tests. Assembler/Disassembler tests updated/added. Differential Revision: http://reviews.llvm.org/D20796 llvm-svn: 271900
* [test/AMDGPU] Square-braced-syntax for registers: add macro test/example.Artem Tamazov2016-06-031-19/+45
| | | | | | | | | | Test added as per discussion in http://reviews.llvm.org/D20588. The macro is just a demonstration, useless in practice. Coding style fixes. Differential Revision: http://reviews.llvm.org/D20797 llvm-svn: 271675
* [AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float ↵Sam Kolton2016-06-032-16/+23
| | | | | | | | | | | | | | modifiers. Summary: Depends on D20625 Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20674 llvm-svn: 271662
* [AMDGPU] Assembler: Custom converters for SDWA instructions. Support for ↵Sam Kolton2016-06-032-62/+162
| | | | | | | | | | | | | | | | _dpp and _sdwa suffixes in mnemonics. Summary: Added custom converters for SDWA instruction to support optional operands and modifiers. Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20625 llvm-svn: 271655
* AMDGPU: Handle flat in getMemOpBaseRegImmOfsMatt Arsenault2016-06-021-0/+7
| | | | | | | It can still report the base register, and the uses give up when it fails. llvm-svn: 271575
* AMDGPU: Cleanup load testsMatt Arsenault2016-06-021-0/+14
| | | | | | | | | There are a lot of different kinds of loads to test for, and these were scattered around inconsistently with some redundancy. Try to comprehensively test all loads in a consistent way. llvm-svn: 271571
* AMDGPU: Temporary fix for broken store combineMatt Arsenault2016-06-021-0/+2
| | | | llvm-svn: 271567
* AMDGPU: Fix crashes on unknown processor nameMatt Arsenault2016-06-024-7/+9
| | | | | | | | | | | | | | If the processor name failed to parse for amdgcn, the resulting output would have R600 ISA in it. If the processor name was missing or invalid for R600, the wavefront size would not be set and there would be crashes from missing itinerary data. Fixes crashes in future commit caused by dividing by the unset/0 wavefront size. llvm-svn: 271561
* AMDGPU: Fix incorrectly setting kill flag when copying register tuplesMatt Arsenault2016-06-021-1/+1
| | | | | | | This fixes some verifier errors when trackLivenessAfterRegAlloc is enabled. llvm-svn: 271446
* AMDGPU: SIDebuggerInsertNops preserves CFGMatt Arsenault2016-06-022-0/+6
| | | | | | | This saves an additional run of the DominatorTree and MachineLoopInfo llvm-svn: 271444
* AMDGPU: Remove unused address spaceMatt Arsenault2016-05-312-12/+10
| | | | | | Also return a single StringRef instead of building a string. llvm-svn: 271296
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-05-281-5/+5
| | | | llvm-svn: 271081
* AMDGPU: Add fract intrinsicMatt Arsenault2016-05-283-36/+21
| | | | | | | | | Remove broken patterns matching it. This was matching the unsafe math pattern and expanding the fix for the buggy instruction from the pattern. The problems are also on CI. Remove the workarounds and only use fract with unsafe math or from the intrinsic. llvm-svn: 271078
* [AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional.Artem Tamazov2016-05-271-6/+10
| | | | | | | | | | | | | | | | | Register numbers may be specified as assembly-time expressions. This feature can be useful in macros and alike. However, expressions are supported within sqare braces only. Sqare braces were initially intended to support specifying of multiple (pairs/quads...) registers. Syntax like v[8:8] which specifies single register is also supported. That allows expressions but looks a bit unnatural. This change supports syntax REG[EXPR]. Tests added. Differential Revision: http://reviews.llvm.org/D20588 llvm-svn: 270990
* Avoid some copies by using const references.Benjamin Kramer2016-05-271-3/+1
| | | | | | | clang-tidy's performance-unnecessary-copy-initialization with some manual fixes. No functional changes intended. llvm-svn: 270988
* Apply clang-tidy's misc-static-assert where it makes sense.Benjamin Kramer2016-05-271-12/+6
| | | | | | | Also fold conditions into assert(0) where it makes sense. No functional change intended. llvm-svn: 270982
* AMDGPU/SI: Enable load-store-opt by default.Changpeng Fang2016-05-261-1/+1
| | | | | | | | | | Summary: Enable load-store-opt by default, and update LIT tests. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D20694 llvm-svn: 270894
OpenPOWER on IntegriCloud