summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/fma-combine.ll
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix -enable-var-scope violationsMatt Arsenault2017-11-121-3/+5
| | | | llvm-svn: 318004
* DAG: Fold fma (fneg x), K, y -> fma x, -K, yMatt Arsenault2017-10-271-0/+46
| | | | llvm-svn: 316753
* AMDGPU: Allow SIShrinkInstructions to work in non-SSAMatt Arsenault2017-07-101-14/+14
| | | | | | | | Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-041-3/+3
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-041-3/+3
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-031-3/+3
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-26/+26
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* [DAGCombine] require UnsafeFPMath for re-association of additionNicolai Haehnle2017-01-311-7/+19
| | | | | | | | | | | | | | | | | | | Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635
* [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by defaultNicolai Haehnle2016-12-021-18/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU: Add volatile to test loads and storesMatt Arsenault2016-04-121-59/+59
| | | | | | | | When the memory vectorizer is enabled, these tests break. These tests don't really care about the memory instructions, and it's easier to write check lines with the unmerged loads. llvm-svn: 266071
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-111-13/+13
| | | | llvm-svn: 260493
* [DAGCombiner] Improve FMA support for interpolation patternsSimon Pilgrim2015-09-211-0/+200
| | | | | | | | | | This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+368
llvm-svn: 239657
OpenPOWER on IntegriCloud