summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-2711-63/+378
| | | | llvm-svn: 296396
* AMDGPU: Add some of the new gfx9 VOP3 instructionsMatt Arsenault2017-02-271-0/+12
| | | | llvm-svn: 296382
* AMDGPU: Support inlineasm for packed instructionsMatt Arsenault2017-02-271-1/+42
| | | | | | | Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
* AMDGPU: Don't fold immediate if clamp/omod are setMatt Arsenault2017-02-272-8/+13
| | | | | | | Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375
* AMDGPU: Fold omod into instructionsMatt Arsenault2017-02-273-6/+146
| | | | llvm-svn: 296372
* AMDGPU: Add f16 to shader calling conventionsMatt Arsenault2017-02-271-3/+3
| | | | | | Mostly useful for writing tests for f16 features. llvm-svn: 296370
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-2723-86/+879
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* [AMDGPU] Runtime metadata fixes:Konstantin Zhuravlyov2017-02-275-32/+79
| | | | | | | | | | | - Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it: .amdgpu_runtime_metadata { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ... - Make IsaInfo optional, and always emit it. Differential Revision: https://reviews.llvm.org/D30349 llvm-svn: 296324
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-244-1/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* Revert "Correct register pressure calculation in presence of subregs"Stanislav Mekhanoshin2017-02-242-22/+0
| | | | | | | | This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
* [AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.Stanislav Mekhanoshin2017-02-231-0/+2
| | | | | | | Clang issues warning about hidden overload. That was intended, so add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it. llvm-svn: 296021
* Correct register pressure calculation in presence of subregsStanislav Mekhanoshin2017-02-232-0/+20
| | | | | | | | | | If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-232-6/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-232-0/+36
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-233-39/+52
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-223-7/+11
| | | | llvm-svn: 295908
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-226-6/+89
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-13/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-221-0/+6
| | | | llvm-svn: 295899
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* AMDGPU: Don't look at chain users when adjusting writemaskMatt Arsenault2017-02-221-0/+4
| | | | | | Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Change exp with compr bit printingMatt Arsenault2017-02-221-3/+11
| | | | llvm-svn: 295873
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-224-16/+12
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-12/+16
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-227-5/+56
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Remove llvm.AMDGPU.clamp intrinsicMatt Arsenault2017-02-212-13/+0
| | | | llvm-svn: 295789
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-2112-29/+163
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Formatting fixesMatt Arsenault2017-02-211-4/+5
| | | | llvm-svn: 295783
* AMDGPU: Remove llvm.AMDGPU.flbit intrinsicMatt Arsenault2017-02-212-4/+0
| | | | llvm-svn: 295754
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-218-90/+225
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Fix assembler subtarget predicate for gfx9Matt Arsenault2017-02-183-1/+13
| | | | | | This was accepting GFX9 instructions on VI. llvm-svn: 295557
* AMDGPU: Fix disassembly of aperture registersMatt Arsenault2017-02-181-0/+5
| | | | llvm-svn: 295555
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-1818-41/+239
| | | | llvm-svn: 295554
* AMDGPU/R600: Assert on infinite loop in EmitClauseMarkersJan Vesely2017-02-181-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D29792 llvm-svn: 295539
* AMDGPU: Fix crashes on invalid icmp/fcmp intrinsicsMatt Arsenault2017-02-171-5/+9
| | | | llvm-svn: 295489
* AMDGPU: Remove llvm.AMDGPU.cube intrinsicMatt Arsenault2017-02-163-25/+1
| | | | llvm-svn: 295359
* AMDGPU: Remove llvm.AMDGPU.rsq intrinsicMatt Arsenault2017-02-162-6/+0
| | | | llvm-svn: 295358
* AMDGPU: Remove llvm.SI.sendmsgMatt Arsenault2017-02-162-6/+3
| | | | llvm-svn: 295270
* AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsicsMatt Arsenault2017-02-163-50/+3
| | | | | | Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
* AMDGPU: Remove dead node definitionsMatt Arsenault2017-02-151-10/+0
| | | | llvm-svn: 295247
* AMDGPU: Consolidate sendmsg/sendmsghalt handling and testsMatt Arsenault2017-02-151-7/+4
| | | | llvm-svn: 295244
* AMDGPU: Replace assert with report_fatal_errorMatt Arsenault2017-02-151-1/+2
| | | | | | Also use a more refined condition. llvm-svn: 295239
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-153-37/+106
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroupsStanislav Mekhanoshin2017-02-151-1/+5
| | | | | | | | | | This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134
* Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"Alexander Timofeev2017-02-142-3/+4
| | | | | | This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907. llvm-svn: 295054
* [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-142-15/+33
| | | | | | | | minor fixes (NFC). Same changes in files affected by reduced MC headers dependencies. llvm-svn: 295009
* AMDGPU::expandMemIntrinsicUses(): Fix an uninitialized variable. This ↵NAKAMURA Takumi2017-02-121-1/+1
| | | | | | function returned true or undef. llvm-svn: 294895
* AMDGPU: Fix trailing whitespaceMatt Arsenault2017-02-105-14/+13
| | | | llvm-svn: 294694
* AMDGPU : Add trap handler support.Wei Ding2017-02-1010-24/+99
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
OpenPOWER on IntegriCloud