summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/R600
Commit message (Collapse)AuthorAgeFilesLines
...
* R600/SI: Fix broken testTom Stellard2014-08-111-3/+5
| | | | llvm-svn: 215395
* R600/SI: Custom lower CONCAT_VECTORSTom Stellard2014-08-091-2/+1
| | | | | | | This will lower them using register copies rather than loads and stores to the stack. llvm-svn: 215270
* R600/SI: Update concat_vectors.ll to check for scratch usageTom Stellard2014-08-091-0/+36
| | | | | | | | | | | | These tests were using SI-NOT: MOVREL to make sure concat vectors weren't being lowered to stack loads and stores, but we are using scratch buffers for the stack now instead of registers, so we need to add an additional SI-NOT check for scratch buffers. With this change I was able to uncover one broken test which will be fixed in a future commit. llvm-svn: 215269
* R600: Cleanup fadd and fsub testsMatt Arsenault2014-08-062-78/+100
| | | | llvm-svn: 214991
* R600: Increase nearby load scheduling threshold.Matt Arsenault2014-08-061-30/+35
| | | | | | | | | This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943
* R600/SI: Implement areLoadsFromSameBasePtrMatt Arsenault2014-08-0614-70/+118
| | | | | | | | This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942
* R600/SI: Update MUBUF assembly string to match AMD proprietary compilerTom Stellard2014-08-055-20/+20
| | | | llvm-svn: 214866
* R600/SI: Avoid generating REGISTER_LOAD instructions.Tom Stellard2014-08-051-1/+2
| | | | | | | SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865
* R600/SI: Fix extra whitespace in asm strMatt Arsenault2014-08-031-0/+15
| | | | | | | | | This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660
* R600: Cleanup fneg testsMatt Arsenault2014-08-022-94/+101
| | | | llvm-svn: 214612
* Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp"Tom Stellard2014-08-011-2/+1
| | | | | | | | This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572
* R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cppTom Stellard2014-08-011-1/+2
| | | | | | | SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566
* R600: Cleanup testMatt Arsenault2014-08-011-46/+78
| | | | | | | Remove -CHECKs, use multiple prefixes, name values, also test the @llvm.fabs version llvm-svn: 214525
* R600/SI: Do abs/neg folding with ComplexPatternsTom Stellard2014-08-015-15/+12
| | | | | | | | | | Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467
* R600/SI: Fold immediates when shrinking instructionsTom Stellard2014-08-011-0/+14
| | | | | | | This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464
* R600/SI: Fix incorrect commute operation in shrink instructions passTom Stellard2014-08-011-0/+41
| | | | | | | | We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463
* R600: Modernize work item intrinsics testJan Vesely2014-07-311-81/+82
| | | | | | Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 214451
* R600: Modernize testMatt Arsenault2014-07-281-22/+23
| | | | llvm-svn: 214108
* R600/SI: Implement getOptimalMemOpTypeMatt Arsenault2014-07-281-0/+358
| | | | | | | The default guess uses i32. This needs an address space argument to really do the right thing in all cases. llvm-svn: 214104
* Add alignment value to allowsUnalignedMemoryAccessMatt Arsenault2014-07-271-0/+17
| | | | | | | | | | Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055
* R600/SI: Fix broken test.Matt Arsenault2014-07-261-2/+18
| | | | | | | | There was no check prefix for the instruction lines. Match what is emitted though, although I'm pretty sure it is incorrect. llvm-svn: 214035
* [SDAG] When performing post-legalize DAG combining, run the legalizerChandler Carruth2014-07-263-30/+29
| | | | | | | | | | | | | | | | | | | | | | over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020
* [SDAG] Introduce a combined set to the DAG combiner which tracks nodesChandler Carruth2014-07-242-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are *introduced* during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898
* R600: Add FMA instructions for EvergreenMatt Arsenault2014-07-242-44/+47
| | | | llvm-svn: 213882
* R600: Match rcp node on pre-SIMatt Arsenault2014-07-241-29/+11
| | | | llvm-svn: 213844
* R600: Fix LowerSDIV24Matt Arsenault2014-07-241-0/+120
| | | | | | | | | | Use ComputeNumSignBits instead of checking for i8 / i16 which only worked when AMDIL was lying about having legal i8 / i16. If an integer is known to fit in 24-bits, we can do division faster with float ops. llvm-svn: 213843
* [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicateChandler Carruth2014-07-232-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727
* R600/SI: Add instruction shrinking passTom Stellard2014-07-215-5/+5
| | | | | | This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561
* R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} codeTom Stellard2014-07-212-17/+9
| | | | | | | | | There are a few more cleanups to do, but I ran into some problems with ext loads and trunc stores, when I tried to change some of the vector loads and stores from custom to legal, so I wasn't able to get rid of everything. llvm-svn: 213552
* R600/SI: Use scratch memory for large private arraysTom Stellard2014-07-215-37/+57
| | | | llvm-svn: 213551
* R600/SI: Store constant initializer data in constant memoryTom Stellard2014-07-212-4/+9
| | | | | | | | | | | | This implements a solution for constant initializers suggested by Vadim Girlin, where we store the data after the shader code and then use the S_GETPC instruction to compute its address. This saves use the trouble of creating a new buffer for constant data and then having to pass the pointer to the kernel via user SGPRs or the input buffer. llvm-svn: 213530
* R600/SI: Use VALU for i1 XORTom Stellard2014-07-211-1/+1
| | | | llvm-svn: 213528
* R600: Add missing test for concat_vectorsMatt Arsenault2014-07-201-0/+249
| | | | llvm-svn: 213473
* R600/SI: Remove dead code and add missing tests.Matt Arsenault2014-07-202-10/+39
| | | | | | | | This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. llvm-svn: 213471
* R600/SI: implement range reduction for sin/cosMatt Arsenault2014-07-191-2/+20
| | | | | | | | | | | | | | | | These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy llvm-svn: 213458
* R600: support fpext/fptrunc operations to and from f16.Tim Northover2014-07-181-0/+31
| | | | llvm-svn: 213376
* CodeGen: soften f16 type by default instead of marking legal.Tim Northover2014-07-181-0/+30
| | | | | | | | | | | | Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372
* R600: rename misleading fp16 test.Tim Northover2014-07-181-2/+2
| | | | | | | This test is actually going in the opposite direction to what the filename and function name suggested. llvm-svn: 213358
* R600: support f16 -> f64 conversion intrinsic.Tim Northover2014-07-181-0/+14
| | | | | | | | Unfortunately, we don't seem to have a direct truncation, but the extension can be legally split into two operations so we should support that. llvm-svn: 213357
* CodeGen: extend f16 conversions to permit types > float.Tim Northover2014-07-172-4/+4
| | | | | | | | | | | | | | | | | | | This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
* R600/SI: Allow using f32 rcp / rsq when denormals not handled.Matt Arsenault2014-07-153-17/+14
| | | | | | | These are precise enough to use for OpenCL unless denormals are handled. llvm-svn: 213107
* R600/SI: Fix select on i1Matt Arsenault2014-07-154-8/+48
| | | | llvm-svn: 213096
* R600/SI: Implement less wrong f32 fdivMatt Arsenault2014-07-153-35/+65
| | | | | | | Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test. llvm-svn: 213089
* R600: Implement zero undef variants of ctlz/cttzJan Vesely2014-07-152-0/+26
| | | | | | | | | v2: use ffbh/l if available v3: Rebase on top of Matt's SI patches Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 213072
* R600: Add dag combine for copy of an illegal type.Matt Arsenault2014-07-153-12/+184
| | | | | | | | | This helps avoid redundant instructions to unpack, and repack the vectors. Ideally we could recognize that pattern and eliminate it. Currently v4i8 and other small element type vectors are scalarized, so this has the added bonus of avoiding that. llvm-svn: 213031
* R600: Add denormal handling subtarget features.Matt Arsenault2014-07-141-4/+23
| | | | llvm-svn: 213018
* R600/SI: Default to no single precision denormals.Matt Arsenault2014-07-141-1/+1
| | | | llvm-svn: 213017
* R600: Run more tests with promote alloca disabled.Matt Arsenault2014-07-134-22/+57
| | | | | | | Re-run tests changed in r211110 to test both paths. Also fix broken check line. llvm-svn: 212895
* R600: Run private-memory test with and without alloca promoteMatt Arsenault2014-07-131-24/+33
| | | | | | | The unpromoted path still needs to be tested since we can't always promote to using LDS. llvm-svn: 212894
* R600: Add missing tests for some intrinsicsMatt Arsenault2014-07-127-5/+109
| | | | llvm-svn: 212870
OpenPOWER on IntegriCloud