summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Improve code generation for logical instructions takingAkira Hatanaka2017-04-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019
* Revert r300932 and r300930.Akira Hatanaka2017-04-211-1/+1
| | | | | | | | | It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940
* [AArch64] Improve code generation for logical instructions takingAkira Hatanaka2017-04-211-1/+1
| | | | | | | | | | | | | | | | | | | | immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930
* AMDGPU: Custom lower illegal small select typesMatt Arsenault2017-04-191-0/+29
| | | | | | | Promote them to i32 vectors to avoid unpacking and re-packing the vectors. llvm-svn: 300754
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-2/+28
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* AMDGPU: Refactor SIMachineFunctionInfo slightlyMatt Arsenault2017-04-111-1/+1
| | | | | | Prepare for handling non-entry functions. llvm-svn: 299999
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-256/+318
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU/GFX9: Fix shared and private aperture queriesKonstantin Zhuravlyov2017-04-061-11/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727
* AMDGPU: Replace fp16SrcZerosHighBits with a whitelistMatt Arsenault2017-04-061-4/+50
| | | | | | | FCOPYSIGN is lowered to bit operations which don't clear the high bits. llvm-svn: 299708
* AMDGPU: Stop using CCAssignToRegWithShadowMatt Arsenault2017-04-061-11/+0
| | | | | | | This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667
* [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront sizeStanislav Mekhanoshin2017-04-061-0/+11
| | | | | | | | | | If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guarantied to come to the same point at the same time. Differential Revision: https://reviews.llvm.org/D31731 llvm-svn: 299659
* AMDGPU: Remove legacy export intrinsicMatt Arsenault2017-04-041-23/+0
| | | | llvm-svn: 299444
* AMDGPU: Remove llvm.SI.vs.load.inputMatt Arsenault2017-04-031-5/+0
| | | | llvm-svn: 299391
* AMDGPU: Remove legacy bfe intrinsicsMatt Arsenault2017-04-031-3/+7
| | | | llvm-svn: 299372
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+39
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU: Add all atomicrmw fields to atomic.inc/decMatt Arsenault2017-03-301-2/+5
| | | | | | Add scope, order, isVolatile llvm-svn: 299122
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-75/+72
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Implement f16 froundMatt Arsenault2017-03-241-0/+1
| | | | llvm-svn: 298730
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-1/+1
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Always use VGPR indexing on GFX9Marek Olsak2017-03-211-2/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
* AMDGPU: Fix asserting on 0 dmask for image intrinsicsMatt Arsenault2017-03-211-0/+58
| | | | | | Fold these to undef during lowering so users get eliminated. llvm-svn: 298387
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-25/+18
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Allow sinking of addressing modes for atomic_inc/decMatt Arsenault2017-03-151-5/+22
| | | | llvm-svn: 297913
* AMDGPU: Re-use TM.getNullPointerValueMatt Arsenault2017-03-131-10/+8
| | | | llvm-svn: 297662
* AMDGPU: Treat 0 as private null pointer in addrspacecast loweringMatt Arsenault2017-03-131-7/+14
| | | | llvm-svn: 297658
* AMDGPU: Remove packf16 intrinsicMatt Arsenault2017-03-111-5/+0
| | | | llvm-svn: 297557
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-17/+16
| | | | llvm-svn: 296401
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-6/+69
| | | | llvm-svn: 296396
* AMDGPU: Support inlineasm for packed instructionsMatt Arsenault2017-02-271-1/+42
| | | | | | | Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-221-5/+8
| | | | llvm-svn: 295908
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-221-0/+6
| | | | llvm-svn: 295899
* AMDGPU: Don't look at chain users when adjusting writemaskMatt Arsenault2017-02-221-0/+4
| | | | | | Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-221-1/+1
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-4/+41
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-3/+77
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Formatting fixesMatt Arsenault2017-02-211-4/+5
| | | | llvm-svn: 295783
* AMDGPU: Remove llvm.AMDGPU.flbit intrinsicMatt Arsenault2017-02-211-1/+0
| | | | llvm-svn: 295754
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-211-0/+2
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-1/+8
| | | | llvm-svn: 295554
* AMDGPU: Fix crashes on invalid icmp/fcmp intrinsicsMatt Arsenault2017-02-171-5/+9
| | | | llvm-svn: 295489
* AMDGPU: Remove llvm.AMDGPU.rsq intrinsicMatt Arsenault2017-02-161-1/+0
| | | | llvm-svn: 295358
* AMDGPU: Remove llvm.SI.sendmsgMatt Arsenault2017-02-161-4/+3
| | | | llvm-svn: 295270
* AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsicsMatt Arsenault2017-02-161-25/+0
| | | | | | Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
* AMDGPU: Consolidate sendmsg/sendmsghalt handling and testsMatt Arsenault2017-02-151-7/+4
| | | | llvm-svn: 295244
* AMDGPU: Fix trailing whitespaceMatt Arsenault2017-02-101-6/+5
| | | | llvm-svn: 294694
* AMDGPU : Add trap handler support.Wei Ding2017-02-101-18/+41
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-0/+3
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* AMDGPU: Make i32 uaddo/usubo legalMatt Arsenault2017-01-301-0/+3
| | | | llvm-svn: 293514
OpenPOWER on IntegriCloud