summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll
Commit message (Collapse)AuthorAgeFilesLines
* DAG: Handle odd vector sizes in calling conv splittingMatt Arsenault2018-09-101-4/+3
| | | | | | | | | | | | | | This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801
* AMDGPU: Use splat vectors for undefs when folding canonicalizeMatt Arsenault2018-08-121-8/+59
| | | | | | | | | | | If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512
* AMDGPU: Push fcanonicalize through partially constant build_vectorMatt Arsenault2018-08-061-0/+173
| | | | | | | This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072
* DAG: Fix vector widening fcanonicalizeMatt Arsenault2018-08-021-0/+20
| | | | llvm-svn: 338715
* AMDGPU: Fix scalarizing v4f16 fcanonicalizeMatt Arsenault2018-08-021-0/+19
| | | | llvm-svn: 338714
* DAG: Fix PromoteFloatResult for fcanonicalizeMatt Arsenault2018-07-311-83/+101
| | | | llvm-svn: 338382
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-301-44/+67
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-11/+6
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* [AMDGPU] Enabled v2.16 literals for VOP3PStanislav Mekhanoshin2018-04-171-1/+1
| | | | | | | | Literal encoding needs op_sel_hi to select low 16 bit in this case. Differential Revision: https://reviews.llvm.org/D45745 llvm-svn: 330230
* AMDGPU/GCN: Bring processors in sync with AMDGPUUsageKonstantin Zhuravlyov2017-12-081-1/+1
| | | | | | | | | | | | - Add gfx704 - Change bonaire to gfx704 - Remove gfx804 - Remove gfx901 - Remove gfx903 Differential Revision: https://reviews.llvm.org/D40046 llvm-svn: 320194
* [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.Sam Kolton2017-12-041-4/+3
| | | | | | | | | | | | Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662
* [AMDGPU] Use v_pk_max_f16 for fcanonicalizeStanislav Mekhanoshin2017-09-061-5/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
* [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalizeStanislav Mekhanoshin2017-09-061-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-301-21/+17
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-041-4/+14
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-041-14/+4
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-031-4/+14
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
* [AMDGPU] Untangle SDWA pass from SIShrinkInstructionsStanislav Mekhanoshin2017-06-031-3/+3
| | | | | | | | | | | | Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665
* [AMDGPU] Allow SDWA in instructions with immediates and SGPRsStanislav Mekhanoshin2017-05-301-10/+11
| | | | | | | | | | | | | | | | An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219
* AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16)Konstantin Zhuravlyov2017-04-211-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D32361 llvm-svn: 301028
* [AMDGPU] Resubmit SDWA peephole: enable by defaultSam Kolton2017-04-061-1/+1
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654
* Revert r299536. [AMDGPU] SDWA peephole: enable by default.Ivan Krasin2017-04-051-1/+1
| | | | | | | | | | | Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583
* [AMDGPU] SDWA peephole: enable by defaultSam Kolton2017-04-051-1/+1
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-10/+14
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-42/+42
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-21/+37
| | | | llvm-svn: 296396
* AMDGPU: Use source mods with fcanonicalizeMatt Arsenault2017-01-311-0/+35
| | | | llvm-svn: 293654
* Enable FeatureFlatForGlobal on Volcanic IslandsMatt Arsenault2017-01-241-1/+1
| | | | | | | | | | | This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
* AMDGPU: Combine fp16/fp64 subtarget featuresMatt Arsenault2017-01-231-10/+10
| | | | | | | The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
* DAG: Allow legalization of fcanonicalize vector typesMatt Arsenault2017-01-231-0/+214
| | | | llvm-svn: 292814
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-221-0/+172
llvm-svn: 290300
OpenPOWER on IntegriCloud