summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fix for vector element insertionTim Corringham2019-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-221-30/+47
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-111-4/+8
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Improve extract_vector_elt reduction combineMatt Arsenault2018-08-151-1/+1
| | | | | | | | | | | Handle fmul, fsub and preserve flags. Also really test minnum/maxnum reductions. The existing tests were only checking from minnum/maxnum matched from a fast math compare and select which is not the same. llvm-svn: 339820
* AMDGPU: More canonicalized operationsMatt Arsenault2018-08-101-0/+22
| | | | llvm-svn: 339464
* AMDGPU: cvt_pk_rtz_f16 canonicalizesMatt Arsenault2018-08-061-0/+11
| | | | llvm-svn: 339078
* AMDGPU: Handle some vector operations in isCanonicalizedMatt Arsenault2018-08-061-0/+84
| | | | llvm-svn: 339077
* AMDGPU: Conversions always produce canonical resultsMatt Arsenault2018-08-061-1/+35
| | | | | | | | | Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064
* AMDGPU: Fix implementation of isCanonicalizedMatt Arsenault2018-08-061-48/+238
| | | | | | | | | | | | | | | If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-031-1/+4
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-301-1/+1
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* [AMDGPU] adjusted test checks because minnum with NaN gets simplifiedSanjay Patel2018-07-151-4/+5
| | | | | | | | This was improved with rL337127, but I missed the failure in this test. I'm not sure what the expected result will be, so I've generalized it and added a FIXME comment. llvm-svn: 337128
* AMDGPU/GCN: Bring processors in sync with AMDGPUUsageKonstantin Zhuravlyov2017-12-081-2/+2
| | | | | | | | | | | | - Add gfx704 - Change bonaire to gfx704 - Remove gfx804 - Remove gfx901 - Remove gfx903 Differential Revision: https://reviews.llvm.org/D40046 llvm-svn: 320194
* AMDGPU: Fix -enable-var-scope violationsMatt Arsenault2017-11-121-13/+13
| | | | llvm-svn: 318004
* Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0Alexander Richardson2017-10-251-0/+4
| | | | | | | | | | | | | | | | | Summary: On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by `.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the file. Add a CHECK for that directive to avoid failing the test. Reviewers: rampitec, kzhuravl Reviewed By: rampitec, kzhuravl Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D39306 llvm-svn: 316616
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-301-5/+10
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-291-35/+35
| | | | llvm-svn: 309470
* [AMDGPU] fcaninicalize optimization for GFX9+Stanislav Mekhanoshin2017-07-131-6/+56
| | | | | | | | | | | | | | Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976
* [AMDGPU] fcanonicalize elimination optimizationStanislav Mekhanoshin2017-07-121-0/+487
We are using multiplication by 1.0 to flush denormals and quiet sNaNs. That is possible to omit this multiplication if source of the fcanonicalize instruction is known to be flushed/quieted, i.e. if it comes from another instruction known to do the normalization and we are using IEEE mode to quiet sNaNs. Differential Revision: https://reviews.llvm.org/D35218 llvm-svn: 307848
OpenPOWER on IntegriCloud