| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332648
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332640
|
| |
|
|
| |
llvm-svn: 332616
|
| |
|
|
|
|
| |
For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX)
llvm-svn: 332605
|
| |
|
|
| |
llvm-svn: 332540
|
| |
|
|
|
|
|
|
|
|
| |
As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.
Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.
Differential Revision: https://reviews.llvm.org/D46959
llvm-svn: 332524
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332501
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332500
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332499
|
| |
|
|
|
|
|
|
|
|
|
|
| |
32-bit targets
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.
Fixes PR3163.
Differential Revision: https://reviews.llvm.org/D43441
llvm-svn: 332498
|
| |
|
|
| |
llvm-svn: 332486
|
| |
|
|
| |
llvm-svn: 332484
|
| |
|
|
|
|
| |
We weren't correctly splatting the offset shift
llvm-svn: 332435
|
| |
|
|
| |
llvm-svn: 332410
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
specially handle SETB_C* pseudo instructions.
Summary:
While the logic here is somewhat similar to the arithmetic lowering, it
is different enough that it made sense to have its own function.
I actually tried a bunch of different optimizations here and none worked
well so I gave up and just always do the arithmetic based lowering.
Looking at code from the PR test case, we actually pessimize a bunch of
code when generating these. Because SETB_C* pseudo instructions clobber
EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple
SETB_C* pseudos from a single set of EFLAGS. This in turn causes the
lowering code to ruin all the clever code generation that SETB_C* was
hoping to achieve. None of this is needed. Whenever we're generating
multiple SETB_C* instructions from a single set of EFLAGS we should
instead generate a single maximally wide one and extract subregs for all
the different desired widths. That would result in substantially better
code generation. But this patch doesn't attempt to address that.
The test case from the PR is included as well as more directed testing
of the specific lowering pattern used for these pseudos.
Reviewers: craig.topper
Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46799
llvm-svn: 332389
|
| |
|
|
|
|
|
|
| |
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups.
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.
It should provide a superset of the functionality from D46563 - the extra tests show propagation and
codegen diffs for fcmp, vecreduce, and FP libcalls.
The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently,
so that shouldn't make any difference.
Differential Revision: https://reviews.llvm.org/D46854
llvm-svn: 332358
|
| |
|
|
|
|
|
|
|
| |
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:
(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.
(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:
(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)
Reviewers: RKSimon
Subscribers: llvm-commits, a.elovikov
Differential Revision: https://reviews.llvm.org/D45315
llvm-svn: 332336
|
| |
|
|
|
|
| |
match current clang codegen.
llvm-svn: 332326
|
| |
|
|
|
|
| |
intrinsics.
llvm-svn: 332271
|
| |
|
|
|
|
| |
MMX was missing and YMM was tagged as a fp nt store
llvm-svn: 332269
|
| |
|
|
|
|
| |
related intrinsics.
llvm-svn: 332214
|
| |
|
|
|
|
| |
uitofp+insertelement instead.
llvm-svn: 332206
|
| |
|
|
|
|
|
|
| |
instructions under AVX512.
This matches what we do for sint_to_fp.
llvm-svn: 332205
|
| |
|
|
|
|
| |
_mm_cvtu32_ss, and _mm_cvtu64_ss.
llvm-svn: 332204
|
| |
|
|
| |
llvm-svn: 332198
|
| |
|
|
|
|
| |
Noticed by Simon Pilgrim.
llvm-svn: 332197
|
| |
|
|
|
|
| |
instructions.
llvm-svn: 332189
|
| |
|
|
| |
llvm-svn: 332188
|
| |
|
|
| |
llvm-svn: 332187
|
| |
|
|
|
|
| |
clang has used for a very long time.
llvm-svn: 332186
|
| |
|
|
|
|
|
|
| |
with an older intrinsic and a select.
This is what clang already uses.
llvm-svn: 332170
|
| |
|
|
|
|
| |
used by clang.
llvm-svn: 332146
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).
Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.
ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.
This was suggested by Adrian in https://reviews.llvm.org/D45995.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46216
llvm-svn: 332119
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.
The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46158
llvm-svn: 332118
|
| |
|
|
|
|
| |
We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions
llvm-svn: 332109
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This recursive step can overflow the stack.
Reviewers: djokov, petarj
Subscribers: mcrosier, jlebar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D46671
llvm-svn: 332101
|
| |
|
|
|
|
| |
Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions
llvm-svn: 332094
|
| |
|
|
|
|
|
|
|
|
|
|
| |
from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
|
| |
|
|
|
|
|
|
| |
WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.
llvm-svn: 331999
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).
Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.
Note: it's possible that other targets have similar problems
as seen here.
Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403
The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.
llvm-svn: 331992
|
| |
|
|
| |
llvm-svn: 331989
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Clang 6.0 was updated to create these intrinsics rather than
libcalls or fcmp/select, but the backend wasn't prepared to
handle that optimally.
This bug is not the primary reason for PR37403:
https://bugs.llvm.org/show_bug.cgi?id=37403
...but it's probably more important for x86 perf.
llvm-svn: 331988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.
Fixes PR36602.
Reviewers: dbabokin, RKSimon, eli.friedman, davide
Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D46404
llvm-svn: 331985
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, RKSimon
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D46539
llvm-svn: 331961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.
G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).
Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.
Reviewers: bogner, javed.absar, aivchenk, rovka
Reviewed By: bogner
Subscribers: igorb, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46413
llvm-svn: 331926
|
| |
|
|
|
|
|
|
|
|
| |
call getBitcast if its an fp->int or int->fp conversion even when before legalize ops.
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.
This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.
llvm-svn: 331896
|
| |
|
|
|
|
|
|
| |
MOVNTPD/MOVNTPS should be WriteFStore
Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....
llvm-svn: 331864
|
| |
|
|
|
|
| |
all zeros vXi1 vector.
llvm-svn: 331847
|