| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
| |
Some more refactoring, like registering the IT Block pass, less cryptic
variable names, and some simplification of loops.
Differential Revision: https://reviews.llvm.org/D63419
llvm-svn: 363666
|
| |
|
|
|
|
|
| |
Do not emit a copy if the source and destination registers are the same.
Review: Ulrich Weigand
llvm-svn: 363665
|
| |
|
|
|
|
|
|
| |
GCNScheduleDAGMILive.
Differential revision: https://reviews.llvm.org/D62401
llvm-svn: 363661
|
| |
|
|
|
|
|
|
|
|
| |
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.
In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......
Differential Revision: https://reviews.llvm.org/D63326
llvm-svn: 363655
|
| |
|
|
|
|
|
|
| |
function. NFCI
Preliminary step for D59909
llvm-svn: 363645
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.
For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.
llvm-svn: 363644
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.
Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.
I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.
I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.
llvm-svn: 363643
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
|
| |
|
|
|
|
|
|
| |
VMOVSSZmrk/VMOVSDZmrk.
Removes COPY_TO_REGCLASS from some patterns.
llvm-svn: 363630
|
| |
|
|
|
|
|
|
| |
types are allowed here. NFC
Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code.
llvm-svn: 363629
|
| |
|
|
|
|
|
|
|
| |
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.
llvm-svn: 363625
|
| |
|
|
|
|
|
|
|
| |
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.
Differential Revision: https://reviews.llvm.org/D63455
llvm-svn: 363614
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
This also fixes multi-part shaders for Mesa.
Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82
Reviewers: arsenm, rampitec, t-tye
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63427
llvm-svn: 363602
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.
When we have something like this:
```
lhs = G_SUB 0, y
...
G_ICMP lhs, rhs
```
We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.
Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.
Also factor out the CSet emission into emitCSetForICMP.
Differential Revision: https://reviews.llvm.org/D63163
llvm-svn: 363596
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
register form requires 64-bit mode, but the memory form does not.
We don't know if its safe to unfold if we're in 32-bit mode.
This is simlar to what was done to some load opcodes in r363523.
I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.
llvm-svn: 363595
|
| |
|
|
|
|
| |
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).
llvm-svn: 363592
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63206
llvm-svn: 363588
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Differential Revision: https://reviews.llvm.org/D63208
llvm-svn: 363586
|
| |
|
|
|
|
| |
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.
llvm-svn: 363582
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
This fixes https://llvm.org/PR40759
Differential Revision: https://reviews.llvm.org/D61764
llvm-svn: 363581
|
| |
|
|
|
|
|
|
| |
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
|
| |
|
|
| |
llvm-svn: 363578
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D63207
llvm-svn: 363577
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60640
llvm-svn: 363576
|
| |
|
|
|
|
|
|
| |
This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.
This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.
llvm-svn: 363570
|
| |
|
|
|
|
|
|
|
|
| |
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.
First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.
Differential Revision: https://reviews.llvm.org/D63246
llvm-svn: 363564
|
| |
|
|
|
|
|
| |
Even if the target doesn't have flat instructions, addrspace(0) is
still flat. It just happens to not work.
llvm-svn: 363561
|
| |
|
|
|
|
| |
Tests will be in future commits when new intrinsics are handled here.
llvm-svn: 363559
|
| |
|
|
|
|
|
| |
Use separate enums for each kind, avoid repeating overloads, and add
missing classof implementation.
llvm-svn: 363558
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.
Differential Revision: https://reviews.llvm.org/D63336
llvm-svn: 363556
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
|
| |
|
|
| |
llvm-svn: 363543
|
| |
|
|
|
|
| |
after D63265
llvm-svn: 363535
|
| |
|
|
| |
llvm-svn: 363534
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
|
| |
|
|
|
|
| |
Forgot to remove file!
llvm-svn: 363532
|
| |
|
|
|
|
| |
Forgot to add file!
llvm-svn: 363531
|
| |
|
|
|
|
|
|
|
| |
Create the ARMBasicBlockUtils class for tracking and querying basic
blocks sizes so we can use them when generating low-overhead loops.
Differential Revision: https://reviews.llvm.org/D63265
llvm-svn: 363530
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
from uses the REX prefix, but the memory form does not.
It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.
This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.
llvm-svn: 363523
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We will use absolute relocations for LDS symbols.
Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61492
llvm-svn: 363517
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.
The generated assembly is now more explicit about the kind of relocation
that is to be used.
Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61491
llvm-svn: 363516
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62486
Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0
llvm-svn: 363514
|
| |
|
|
|
|
|
|
|
| |
This is cpp source part of wave32 support, excluding overriden
getRegClass().
Differential Revision: https://reviews.llvm.org/D63351
llvm-svn: 363513
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar logic/motivation to the select splitting in D62969.
In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.
The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.
In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).
Differential Revision: https://reviews.llvm.org/D63364
llvm-svn: 363508
|
| |
|
|
|
|
|
|
| |
sources
Insert the shorter vector source into an undef vector of the longer vector source's type.
llvm-svn: 363507
|
| |
|
|
|
|
| |
rootsize. NFCI.
llvm-svn: 363501
|
| |
|
|
|
|
|
|
|
|
| |
shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.
Fixes PR34380
llvm-svn: 363500
|
| |
|
|
|
|
| |
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)
llvm-svn: 363499
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
llvm-svn: 363495
|