| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
single/double adds
llvm-svn: 244403
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the initial serialization of the machine operand target
flags. It extends the 'TargetInstrInfo' class to add two new methods that help
to provide text based serialization for the target flags.
This commit can serialize only the X86 target flags, and the target flags for
the other targets will be serialized in the follow-up commits.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244185
|
| |
|
|
|
|
|
|
| |
Problem pointed out by Michael Hordijk."
I mistakenly committed the patch for D6629, and was trying to commit another. Reverting until it gets proper signoff.
llvm-svn: 244121
|
| |
|
|
|
|
| |
pointed out by Michael Hordijk.
llvm-svn: 244120
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the commentary for D11660, I wasn't sure if it was alright to create new
integer machine instructions without also creating the implicit EFLAGS operand.
From what I can see, the implicit operand is always created by the MachineInstrBuilder
based on the instruction type, so we don't have to do that explicitly. However, in
reviewing the debug output, I noticed that the operand was not marked as 'dead'.
The machine combiner should do that to preserve future optimization opportunities
that may be checking for that dead EFLAGS operand themselves.
Differential Revision: http://reviews.llvm.org/D11696
llvm-svn: 243990
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add i16, i32, i64 imul machine instructions to the list of reassociation
candidates.
A new bit of logic is needed to handle integer instructions: they have an
implicit EFLAGS operand, so we have to make sure it's dead in order to do
any reassociation with integer ops.
Differential Revision: http://reviews.llvm.org/D11660
llvm-svn: 243756
|
| |
|
|
|
|
|
|
| |
instruction-type check; NFC
This makes it simpler to add instruction types that don't depend on fast-math.
llvm-svn: 243596
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
llvm-svn: 243361
|
| |
|
|
|
|
|
|
|
|
| |
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of
the push instructions did not have the right mayLoad/mayStore flags.
Differential Revision: http://reviews.llvm.org/D11340
llvm-svn: 243010
|
| |
|
|
|
|
|
|
|
|
| |
canFoldMemoryOperand is not actually used anywhere in the codebase - all existing users instead call foldMemoryOperand directly when they wish to fold and can correctly deduce what they need from the return value.
This patch removes the canFoldMemoryOperand base function and the target implementations; only x86 had a real (bit-rotted) implementation, although AMDGPU had a preparatory stub that had never needed to be completed.
Differential Revision: http://reviews.llvm.org/D11331
llvm-svn: 242638
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MOVSDto64rr and MOV64toSDrr are defined to convert between FR64 (%xmm)
<-> GR64 registers, not VR64 (%mm) <-> GR64. This is wrong.
I found this by inspection and could not find a suitable testcase for it
since (1) we don't handle MMX bitcasts in Peephole optimizer as to
generate COPYs that (2) could be expanded back to the appropriate x86
instruction in ExpandPostRA.
Switch to use the appropriate instructions: MMX_MOVD64from64rr and
MMX_MOVD64to64rr here.
llvm-svn: 242191
|
| |
|
|
|
|
| |
multiplies
llvm-svn: 241873
|
| |
|
|
| |
llvm-svn: 241871
|
| |
|
|
|
|
| |
multiplies
llvm-svn: 241752
|
| |
|
|
| |
llvm-svn: 241671
|
| |
|
|
| |
llvm-svn: 241592
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.
With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.
Differential Revision: http://reviews.llvm.org/D10975
llvm-svn: 241515
|
| |
|
|
|
|
| |
Apparently, the style needs to be agreed upon first.
llvm-svn: 240390
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass
to reassociate the following sequence to reduce the critical path:
A = ? op ?
B = A op X
C = B op Y
-->
A = ? op ?
B = X op Y
C = A op B
'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could
be any associative math/logic op (see TODO in code comment).
This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of
a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth
reassociating them.
This generalization has a compile-time cost because we can now match more instruction sequences
and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't
improve the critical path.
For example, in the new test case:
A = M div N
B = A add X
C = B add Y
We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path:
A = M div N
B = A add Y
C = B add X
We need the combiner to reject that pattern but select this:
A = M div N
B = X add Y
C = B add A
Differential Revision: http://reviews.llvm.org/D10460
llvm-svn: 240361
|
| |
|
|
| |
llvm-svn: 240342
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The _Int instructions are special, in that they operate on the full
VR128 instead of FR32. The load folding then looks at MOVSS, at the
user, and bails out when it sees a size mismatch.
What we really know is that the rm_Int instructions don't load the
higher lanes, so folding is fine.
This happens for the straightforward intrinsic code, e.g.:
_mm_add_ss(a, _mm_load_ss(p));
Fixes PR23349.
Differential Revision: http://reviews.llvm.org/D10554
llvm-svn: 240326
|
| |
|
|
|
|
|
| |
This was suggested as part of D10460, but it's independent of
any functional change.
llvm-svn: 240192
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
NFC: no one uses AnalyzeBranchPredicate yet.
Add TargetInstrInfo::AnalyzeBranchPredicate and implement for x86. A
later change adding support for page-fault based implicit null checks
depends on this.
Reviewers: reames, ab, atrick
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10200
llvm-svn: 239742
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.
This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.
Reviewers: reames, ab, MatzeB, atrick
Reviewed By: MatzeB, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10199
llvm-svn: 239741
|
| |
|
|
|
|
|
|
|
| |
This will use Itinieraries if available, but will also work if just a
MCSchedModel is available.
Differential Revision: http://reviews.llvm.org/D10428
llvm-svn: 239658
|
| |
|
|
| |
llvm-svn: 239553
|
| |
|
|
| |
llvm-svn: 239497
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineCombiner pass
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.
Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.
The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.
This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.
Differential Revision: http://reviews.llvm.org/D10321
llvm-svn: 239486
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.
Reviewers: pete, ributzka, uweigand, mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, ted, llvm-commits
Differential Revision: http://reviews.llvm.org/D10174
llvm-svn: 239336
|
| |
|
|
|
|
|
|
|
| |
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.
Differential Revision: http://reviews.llvm.org/D10310
llvm-svn: 239300
|
| |
|
|
| |
llvm-svn: 239257
|
| |
|
|
| |
llvm-svn: 237726
|
| |
|
|
|
|
| |
MCOperand::Create*() methods renamed to MCOperand::create*().
llvm-svn: 237275
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:
1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).
So instead of this:
movaps %xmm0, %xmm1
rcpss %xmm1, %xmm1
movss %xmm1, %xmm0
We should now get:
rcpss %xmm0, %xmm0
And instead of this:
vsqrtss %xmm0, %xmm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]
We should now get:
vsqrtss %xmm0, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D9504
llvm-svn: 236740
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need codegen-only intrinsic instructions for the vector forms of these instructions.
This makes the reciprocal estimate instruction lowering identical to how we handle normal
square roots: (V)SQRTPS / (V)SQRTPD.
No existing regression tests fail with this patch.
Differential Revision: http://reviews.llvm.org/D9301
llvm-svn: 236013
|
| |
|
|
|
|
|
|
|
|
| |
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014
Review:
http://reviews.llvm.org/D8691
llvm-svn: 235210
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.
It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.
Differential Revision: http://reviews.llvm.org/D8691
llvm-svn: 235014
|
| |
|
|
| |
llvm-svn: 234013
|
| |
|
|
| |
llvm-svn: 234008
|
| |
|
|
|
|
|
| |
classes. Use a Triple instead and simplify a lot of the querying
logic to use lookups on the Triple.
llvm-svn: 232071
|
| |
|
|
|
|
| |
No functionality change intended.
llvm-svn: 230849
|
| |
|
|
| |
llvm-svn: 230846
|
| |
|
|
|
|
|
|
|
| |
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.
llvm-svn: 230845
|
| |
|
|
|
|
|
|
|
|
| |
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
llvm-svn: 230499
|
| |
|
|
|
|
| |
This reverts commit r230226 since it breaks win buildbots.
llvm-svn: 230248
|
| |
|
|
|
|
|
|
| |
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
llvm-svn: 230226
|
| |
|
|
|
|
| |
Suggestion by Simon Pilgrim
llvm-svn: 229574
|
| |
|
|
|
|
|
| |
This is paraphrased from Simon Pilgrim's comment in:
http://reviews.llvm.org/D7492
llvm-svn: 229566
|