summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/setcc-opt.ll
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-091-8/+8
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-111-1/+1
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Add pass to lower kernel arguments to loadsMatt Arsenault2018-06-261-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-20/+20
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-241-1/+1
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* AMDGPU: Use i16 comparison instructionsMatt Arsenault2016-12-221-2/+6
| | | | llvm-svn: 290348
* AMDGPU: Use unsigned compare for eq/neMatt Arsenault2016-09-301-12/+12
| | | | | | | | | | For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832
* AMDGPU: Support commuting with immediate in src0Matt Arsenault2016-09-081-1/+1
| | | | llvm-svn: 280970
* AMDGPU: Run pointer optimization passesMatt Arsenault2016-06-151-51/+38
| | | | llvm-svn: 272736
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-301-6/+6
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU/SI: Enable lanemask tracking in mischedTom Stellard2016-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876
* AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputsTom Stellard2016-01-201-0/+54
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256
* DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> ↵Tom Stellard2015-11-061-6/+12
| | | | | | | | | | | | extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349
* AMDGPU/SI: Better handle s_wait insertionTom Stellard2015-08-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+236
llvm-svn: 239657
OpenPOWER on IntegriCloud