summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/CodeGenPrepare/NVPTX
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-176-0/+396
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-176-396/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [BypassSlowDivision] Improve our handling of divisions by constantsSanjoy Das2017-12-041-0/+77
| | | | | | | | | | | | | | | | | | | (This reapplies r314253. r314253 was reverted on r314482 because of a correctness regression on P100, but that regression was identified to be something else.) Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 319677
* Revert "[BypassSlowDivision] Improve our handling of divisions by constants"Sanjoy Das2017-09-291-77/+0
| | | | | | | This reverts commit r314253. It causes a miscompile on P100 in an internal benchmark. Reverting while I investigate. llvm-svn: 314482
* [BypassSlowDivision] Improve our handling of divisions by constantsSanjoy Das2017-09-261-0/+77
| | | | | | | | | | | | | | | Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 314253
* [BypassSlowDivision] Do not bypass division of hash-like valuesNikolai Bozhenov2017-04-021-0/+121
| | | | | | | | | | | | | | | | | Disable bypassing if one of the operands looks like a hash value. Slow division often occurs in hashtable implementations and fast division is never taken there because a hash value is extremely unlikely to have enough upper bits set to zero. A value is considered to be hash-like if it is produced by 1) XOR operation 2) Multiplication by a constant wider than the shorter type 3) PHI node with all incoming values being hash-like Differential Revision: https://reviews.llvm.org/D28200 llvm-svn: 299329
* [BypassSlowDivision] Use ValueTracking to simplify run-time checksNikolai Bozhenov2017-03-021-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 llvm-svn: 296832
* [CodeGenPrepare] Don't sink non-cheap addrspacecasts.Justin Lebar2016-11-211-0/+21
| | | | | | | | | | | | | | | | | | | | Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591
* [BypassSlowDivision] Handle division by constant numerators better.Justin Lebar2016-11-161-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062
* Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX.Justin Lebar2016-10-281-0/+2
| | | | llvm-svn: 285464
* Don't leave unused divs/rems sitting around in BypassSlowDivision.Justin Lebar2016-10-281-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 llvm-svn: 285460
* Don't claim the udiv created in BypassSlowDivision is exact.Justin Lebar2016-10-281-0/+16
Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 llvm-svn: 285459
OpenPOWER on IntegriCloud