| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).
This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.
Fast mode - allows formation of fused FP ops whenever they're profitable.
Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.
Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.
Note: This option only controls formation of fused ops by the optimizers. Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.
Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.
llvm-svn: 158956
|
| |
|
|
| |
llvm-svn: 158927
|
| |
|
|
|
|
| |
node is removed. Sorry, no test case. Foudn it by inspection of the code
llvm-svn: 158839
|
| |
|
|
|
|
| |
The test case for this will come with the PPC indexed preinc loads commit.
llvm-svn: 158822
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
OR
UnsafeFPMath option (-enable-unsafe-fp-math)
are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).
If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.
llvm-svn: 158757
|
| |
|
|
| |
llvm-svn: 158467
|
| |
|
|
|
|
| |
wrote and the usual LLVM convention.
llvm-svn: 157708
|
| |
|
|
|
|
| |
operands of an FMA node.
llvm-svn: 157707
|
| |
|
|
|
|
|
|
|
|
| |
When a combine twiddles an extract_vector, care should be take to preserve
the type of the index operand. No luck extracting a reasonable testcase,
unfortunately.
rdar://11391009
llvm-svn: 156419
|
| |
|
|
| |
llvm-svn: 156324
|
| |
|
|
|
|
| |
just like it now knows for FMULs.
llvm-svn: 156029
|
| |
|
|
| |
llvm-svn: 156023
|
| |
|
|
| |
llvm-svn: 155309
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
llvm-svn: 155248
|
| |
|
|
| |
llvm-svn: 154786
|
| |
|
|
|
|
|
|
| |
Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.
llvm-svn: 154490
|
| |
|
|
|
|
| |
multiplication by a denormal, and some tests checking that.
llvm-svn: 154431
|
| |
|
|
| |
llvm-svn: 154414
|
| |
|
|
|
|
|
|
|
| |
always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.
llvm-svn: 154397
|
| |
|
|
|
|
| |
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)
llvm-svn: 154394
|
| |
|
|
|
|
| |
not fit in a i64.
llvm-svn: 154364
|
| |
|
|
| |
llvm-svn: 154322
|
| |
|
|
|
|
| |
some checks to allow better early out.
llvm-svn: 154309
|
| |
|
|
| |
llvm-svn: 154308
|
| |
|
|
| |
llvm-svn: 154307
|
| |
|
|
|
|
| |
happen.
llvm-svn: 154305
|
| |
|
|
| |
llvm-svn: 154297
|
| |
|
|
|
|
|
|
| |
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
llvm-svn: 154296
|
| |
|
|
|
|
|
|
|
|
| |
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
|
| |
|
|
|
|
|
|
| |
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
|
| |
|
|
|
|
|
|
| |
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
|
| |
|
|
|
|
| |
operations, and prevent the DAGCombiner from turning them into bitwise operations if they do.
llvm-svn: 153901
|
| |
|
|
|
|
|
|
|
| |
shuffles.
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
llvm-svn: 153864
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
(and also scalar_to_vector).
2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))
3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y).
4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.
Code which was previously compiled to this:
movd (%rsi), %xmm0
movdqa .LCPI0_0(%rip), %xmm2
pshufb %xmm2, %xmm0
movd (%rdi), %xmm1
pshufb %xmm2, %xmm1
pxor %xmm0, %xmm1
pshufb .LCPI0_1(%rip), %xmm1
movd %xmm1, (%rdi)
ret
Now compiles to this:
movl (%rsi), %eax
xorl %eax, (%rdi)
ret
llvm-svn: 153848
|
| |
|
|
| |
llvm-svn: 153513
|
| |
|
|
|
|
| |
users of the final load to the worklist too. Needed by changes I'm preparing to make to X86 backend.
llvm-svn: 153078
|
| |
|
|
| |
llvm-svn: 153035
|
| |
|
|
|
|
| |
add the new node into the work list because there is a potential for further optimizations.
llvm-svn: 152784
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Transform:
(fsub x, (fadd x, y)) -> (fneg y) and
(fsub x, (fadd y, x)) -> (fneg y)
if 'unsafe math' is specified.
<rdar://problem/7540295>
llvm-svn: 152777
|
| |
|
|
|
|
| |
that would trigger the truncation case.
llvm-svn: 152678
|
| |
|
|
|
|
|
|
|
| |
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
llvm-svn: 152675
|
| |
|
|
| |
llvm-svn: 152454
|
| |
|
|
|
|
| |
performance regression (due to increased register pressure from overly aggressive pre-inc formation).
llvm-svn: 152162
|
| |
|
|
|
|
| |
providing a default expansion (FADD+FNEG), and teaching DAGCombine not to form FSUBs post-legalize if they are not legal.
llvm-svn: 152079
|
| |
|
|
|
|
| |
converted to zeroexts.
llvm-svn: 150957
|
| |
|
|
| |
llvm-svn: 150670
|
| |
|
|
|
|
| |
N) for all operations. This fixes a horrible worst case with lots of nodes where 99% of the time was being spent in std::remove.
llvm-svn: 150669
|
| |
|
|
|
|
| |
generate a shuffle node from two vectors of different types.
llvm-svn: 150383
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.
In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.
This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.
llvm-svn: 150340
|
| |
|
|
| |
llvm-svn: 149823
|