summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-244-1/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* Revert "Correct register pressure calculation in presence of subregs"Stanislav Mekhanoshin2017-02-242-22/+0
| | | | | | | | This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
* [AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.Stanislav Mekhanoshin2017-02-231-0/+2
| | | | | | | Clang issues warning about hidden overload. That was intended, so add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it. llvm-svn: 296021
* Correct register pressure calculation in presence of subregsStanislav Mekhanoshin2017-02-232-0/+20
| | | | | | | | | | If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-232-6/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-232-0/+36
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-233-39/+52
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-223-7/+11
| | | | llvm-svn: 295908
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-226-6/+89
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-13/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-221-0/+6
| | | | llvm-svn: 295899
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* AMDGPU: Don't look at chain users when adjusting writemaskMatt Arsenault2017-02-221-0/+4
| | | | | | Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Change exp with compr bit printingMatt Arsenault2017-02-221-3/+11
| | | | llvm-svn: 295873
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-224-16/+12
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-12/+16
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-227-5/+56
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Remove llvm.AMDGPU.clamp intrinsicMatt Arsenault2017-02-212-13/+0
| | | | llvm-svn: 295789
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-2112-29/+163
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Formatting fixesMatt Arsenault2017-02-211-4/+5
| | | | llvm-svn: 295783
* AMDGPU: Remove llvm.AMDGPU.flbit intrinsicMatt Arsenault2017-02-212-4/+0
| | | | llvm-svn: 295754
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-218-90/+225
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Fix assembler subtarget predicate for gfx9Matt Arsenault2017-02-183-1/+13
| | | | | | This was accepting GFX9 instructions on VI. llvm-svn: 295557
* AMDGPU: Fix disassembly of aperture registersMatt Arsenault2017-02-181-0/+5
| | | | llvm-svn: 295555
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-1818-41/+239
| | | | llvm-svn: 295554
* AMDGPU/R600: Assert on infinite loop in EmitClauseMarkersJan Vesely2017-02-181-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D29792 llvm-svn: 295539
* AMDGPU: Fix crashes on invalid icmp/fcmp intrinsicsMatt Arsenault2017-02-171-5/+9
| | | | llvm-svn: 295489
* AMDGPU: Remove llvm.AMDGPU.cube intrinsicMatt Arsenault2017-02-163-25/+1
| | | | llvm-svn: 295359
* AMDGPU: Remove llvm.AMDGPU.rsq intrinsicMatt Arsenault2017-02-162-6/+0
| | | | llvm-svn: 295358
* AMDGPU: Remove llvm.SI.sendmsgMatt Arsenault2017-02-162-6/+3
| | | | llvm-svn: 295270
* AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsicsMatt Arsenault2017-02-163-50/+3
| | | | | | Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
* AMDGPU: Remove dead node definitionsMatt Arsenault2017-02-151-10/+0
| | | | llvm-svn: 295247
* AMDGPU: Consolidate sendmsg/sendmsghalt handling and testsMatt Arsenault2017-02-151-7/+4
| | | | llvm-svn: 295244
* AMDGPU: Replace assert with report_fatal_errorMatt Arsenault2017-02-151-1/+2
| | | | | | Also use a more refined condition. llvm-svn: 295239
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-153-37/+106
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroupsStanislav Mekhanoshin2017-02-151-1/+5
| | | | | | | | | | This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134
* Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"Alexander Timofeev2017-02-142-3/+4
| | | | | | This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907. llvm-svn: 295054
* [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-142-15/+33
| | | | | | | | minor fixes (NFC). Same changes in files affected by reduced MC headers dependencies. llvm-svn: 295009
* AMDGPU::expandMemIntrinsicUses(): Fix an uninitialized variable. This ↵NAKAMURA Takumi2017-02-121-1/+1
| | | | | | function returned true or undef. llvm-svn: 294895
* AMDGPU: Fix trailing whitespaceMatt Arsenault2017-02-105-14/+13
| | | | llvm-svn: 294694
* AMDGPU : Add trap handler support.Wei Ding2017-02-1010-24/+99
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
* [AMDGPU] Override PSet for M0Stanislav Mekhanoshin2017-02-102-0/+10
| | | | | | | | | | | | This change returns empty PSet list for M0 register. Otherwise its PSet as defined by tablegen is SReg_32. This results in incorrect register pressure calculation every time an instruction uses M0. Such uses count as SReg_32 PSet and inadequately increase pressure on SGPRs. Differential Revision: https://reviews.llvm.org/D29798 llvm-svn: 294691
* AMDGPU: Add pass to expand memcpy/memmove/memsetMatt Arsenault2017-02-095-4/+136
| | | | llvm-svn: 294635
* [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of ↵Konstantin Zhuravlyov2017-02-092-68/+31
| | | | | | | | using switch statement Differential Revision: https://reviews.llvm.org/D29741 llvm-svn: 294627
* Drop graph_ prefixDaniel Berlin2017-02-091-2/+2
| | | | llvm-svn: 294621
* GraphTraits: Add range versions of graph traits functions (graph_nodes, ↵Daniel Berlin2017-02-091-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | graph_children, inverse_graph_nodes, inverse_graph_children). Summary: Convert all obvious node_begin/node_end and child_begin/child_end pairs to range based for. Sending for review in case someone has a good idea how to make graph_children able to be inferred. It looks like it would require changing GraphTraits to be two argument or something. I presume inference does not happen because it would have to check every GraphTraits in the world to see if the noderef types matched. Note: This change was 3-staged with clang as well, which uses Dominators/etc from LLVM. Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith Subscribers: arsenm, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D29767 llvm-svn: 294620
* [AMDGPU] Implement register pressure callbacksStanislav Mekhanoshin2017-02-082-0/+37
| | | | | | | | | | | | | | | | | | | Implement getRegPressureLimit and getRegPressureSetLimit callbacks in SIRegisterInfo. This makes standard converge scheduler to behave almost the same as GCNScheduler, sometime slightly better sometimes a bit worse. In gerenal that is also possible to switch GCNScheduler to use these callbacks instead of getMaxWaves(), which also makes GCNScheduler slightly better on some tests and slightly worse on another. A big win is behavior with converge scheduler. Note, these are used not only by scheduling, but in places like MachineLICM. Differential Revision: https://reviews.llvm.org/D29700 llvm-svn: 294518
* [AMDGPU][NFC] Assign IsaInfo to reference variable in order to shorten long ↵Konstantin Zhuravlyov2017-02-081-16/+13
| | | | | | lines llvm-svn: 294454
* [AMDGPU] Add target information that is required by tools to metadataKonstantin Zhuravlyov2017-02-0813-260/+616
| | | | | | Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449
OpenPOWER on IntegriCloud