summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/fp-fast.ll
Commit message (Collapse)AuthorAgeFilesLines
* Make utils/update_llc_test_checks.py note that the assertions areJames Y Knight2015-11-231-0/+1
| | | | | | | | | autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917
* [x86] generalize reassociation optimization in machine combiner to 2 ↵Sanjay Patel2015-06-231-78/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass to reassociate the following sequence to reduce the critical path: A = ? op ? B = A op X C = B op Y --> A = ? op ? B = X op Y C = A op B 'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could be any associative math/logic op (see TODO in code comment). This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth reassociating them. This generalization has a compile-time cost because we can now match more instruction sequences and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't improve the critical path. For example, in the new test case: A = M div N B = A add X C = B add Y We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path: A = M div N B = A add Y C = B add X We need the combiner to reject that pattern but select this: A = M div N B = X add Y C = B add A Differential Revision: http://reviews.llvm.org/D10460 llvm-svn: 240361
* [x86] Add a reassociation optimization to increase ILP via the ↵Sanjay Patel2015-06-101-0/+78
| | | | | | | | | | | | | | | | | | MachineCombiner pass This is a reimplementation of D9780 at the machine instruction level rather than the DAG. Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a starting point; see the TODO comments) to increase ILP when it's safe to do so. The code is closely based on the existing MachineCombiner optimization that is implemented for AArch64. This patch should not cause the kind of spilling tragedy that led to the reversion of r236031. Differential Revision: http://reviews.llvm.org/D10321 llvm-svn: 239486
* Semantically revert r236031, which is not a good idea for in-order targets.Owen Anderson2015-04-301-42/+0
| | | | | | | | | | | | At the least it should be guarded by some kind of target hook. It also introduced catastrophic compile time and code quality regressions on some out of tree targets (test case still being reduced/sanitized). Sanjay agreed with reverting this patch until these issues can be resolved. llvm-svn: 236199
* transform fadd chains to increase parallelismSanjay Patel2015-04-281-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a compromise: with this simple patch, we should always handle a chain of exactly 3 operations optimally, but we're not generating the optimal balanced binary tree for a longer sequence. In general, this transform will reduce the dependency chain for a sequence of instructions using N operands from a worst case N-1 dependent operations to N/2 dependent operations. The optimal balanced binary tree would reduce the chain to log2(N). The trade-off for not dealing with longer sequences is: (1) we have less complexity in the compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we don't need to worry about the increased register pressure required to parallelize longer sequences. It also seems unlikely that we would ever encounter really long strings of dependent ops like that in the wild, but I'm not sure how to verify that speculation. FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math and this patch. We can extend this patch to cover other associative operations such as fmul, fmax, fmin, integer add, integer mul. This is a partial fix for: https://llvm.org/bugs/show_bug.cgi?id=17305 and if extended: https://llvm.org/bugs/show_bug.cgi?id=21768 https://llvm.org/bugs/show_bug.cgi?id=23116 The issue also came up in: http://reviews.llvm.org/D8941 Differential Revision: http://reviews.llvm.org/D9232 llvm-svn: 236031
* use update_llc_test_checks.py to tighten checking; remove unnecessary CPU paramSanjay Patel2015-04-231-54/+43
| | | | llvm-svn: 235604
* Force CPU type to unbreak unit tests on Haswell machines.Juergen Ributzka2013-11-301-1/+1
| | | | llvm-svn: 195971
* Start using CHECK-LABEL in some tests.Stephen Lin2013-07-121-11/+11
| | | | llvm-svn: 186163
* SelectionDAG: Fix incorrect condition checks in some cases of folding ↵Stephen Lin2013-06-141-4/+73
| | | | | | FADD/FMUL combinations; also improve accuracy of comments llvm-svn: 183993
* Test case hygiene.Benjamin Kramer2013-03-091-1/+1
| | | | llvm-svn: 176772
* test/CodeGen/X86/fp-fast.ll: Add +avx.NAKAMURA Takumi2012-11-011-1/+1
| | | | llvm-svn: 167207
* Add a few more simple fast-math constant propagations and cancellations.Owen Anderson2012-11-011-0/+20
| | | | llvm-svn: 167200
* llvm/test/CodeGen/X86/fp-fast.ll: Suppress FMA4 on AMD Bulldozer host, ↵NAKAMURA Takumi2012-09-011-1/+1
| | | | | | corresponding to r162999. llvm-svn: 163041
* Try to make this test more generic to unbreak buildbots.Owen Anderson2012-08-301-9/+9
| | | | llvm-svn: 162958
* Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by ↵Owen Anderson2012-08-301-0/+37
constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants. llvm-svn: 162956
OpenPOWER on IntegriCloud