| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.
llvm-svn: 362943
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arm v8.1-M supports the VMOV instructions that move a half-precision
value to and from a GPR, but not if the GPR is SP or PC.
To fix this, I've changed those instructions to use the rGPR register
class instead of GPR. rGPR always excludes PC, and it excludes SP
except in the presence of the HasV8Ops target feature (i.e. Arm v8-A).
So the effect is that VMOV.F16 to and from PC is now illegal
everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A
cores (which I believe is all as it should be).
Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60704
llvm-svn: 362942
|
| |
|
|
|
|
|
|
| |
This is to support the unary FNeg instruction.
Differential Revision: https://reviews.llvm.org/D62881
llvm-svn: 362941
|
| |
|
|
|
|
|
| |
PR42179:
https://bugs.llvm.org/show_bug.cgi?id=42179
llvm-svn: 362937
|
| |
|
|
| |
llvm-svn: 362933
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RawContentSection::Size optional
This is a follow-up for D62809.
Content and Size fields should be optional as was discussed in comments
of the D62809's thread. With that, we can describe a specific string table and
symbol table sections in a more correct way and also show appropriate errors.
The patch adds lots of test cases where the behavior is described in details.
Differential revision: https://reviews.llvm.org/D62957
llvm-svn: 362931
|
| |
|
|
|
|
|
|
|
|
|
| |
This option allows loops with small max trip counts to be fully unrolled. This
can help with code like the remainder loops from manually unrolled loops like
those that appear in the cmsis dsp library. We would apparently previously
runtime unroll them with the default unroll count (4).
Differential Revision: https://reviews.llvm.org/D63064
llvm-svn: 362928
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Variable's stack location can stretch longer than it should. If a
variable is placed at the stack in a some nested basic block its range
can be calculated to be up to the next occurrence of the variable's
DBG_VALUE, or up to the end of the function, thus covering a basic
blocks that should not be included in the variable’s location range.
This happens because the DbgEntityHistoryCalculator ends register
locations at the end of a basic block only if the variable’s location
register has been changed throughout the function, which is not the
case for the register used to reference stack objects.
This patch also tries to produce a single value location if the location
list builder managed to merge all the locations into one.
Reviewers: aprantl, dstenb, jmorse
Reviewed By: aprantl, dstenb, jmorse
Subscribers: djtodoro, ivanbaev, asowda
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D61600
llvm-svn: 362923
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sign_extend for eq/ne if the input is truncated from a type with enough sign its.
Summary:
Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes.
This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix.
Reviewers: RKSimon, spatel, xbolva00
Reviewed By: xbolva00
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63032
llvm-svn: 362920
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize.
This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62710
llvm-svn: 362919
|
| |
|
|
|
|
|
|
|
|
|
| |
This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336
Reviewers: jdoerfert
Reviewed by: jdoerfert
Differential Revision: https://reviews.llvm.org/D63045
llvm-svn: 362918
|
| |
|
|
| |
llvm-svn: 362915
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
scalar_to_vector/extract_vector_elts to reduce isel patterns.
Previously we did the equivalent operation in isel patterns with
COPY_TO_REGCLASS operations to transition. By inserting
scalar_to_vetors and extract_vector_elts before isel we can
allow each piece to be selected individually and accomplish the
same final result.
I ideally we'd use vector operations earlier in lowering/combine,
but that looks to be more difficult.
The scalar-fp-to-i64.ll changes are because we have a pattern for
using movlpd for store+extract_vector_elt. While an f64 store
uses movsd. The encoding sizes are the same.
llvm-svn: 362914
|
| |
|
|
|
|
|
| |
This reverts commit f4fc01f8dd3a5dfd2060d1ad0df6b90e8351ddf7.
It caused a 3-4x slowdown when doing thinlto links, PR42210.
llvm-svn: 362913
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, bkramer
Reviewed By: spatel
Subscribers: javed.absar, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63004
llvm-svn: 362912
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
A precondition 'x != 0' was forgotten by me:
https://rise4fun.com/Alive/JFNP
https://rise4fun.com/Alive/jHvL
These 4 folds with non-constants could be re-enabled,
but for now let's go for the simplest solution.
https://bugs.llvm.org/show_bug.cgi?id=42198
llvm-svn: 362911
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.
This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
llvm-svn: 362909
|
| |
|
|
| |
llvm-svn: 362904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is another step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
This is a continuation of D62979 / rL362879.
llvm-svn: 362903
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
This is fixed recommit of r361356 after PowerPC64 multistage build failure.
llvm-svn: 362901
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pointers that are in-bounds (either through dereferenceable_or_null or
thorough a getelementptr inbounds) cannot be captured with a comparison
against null. There is no way to construct a pointer that is still in
bounds but also NULL.
This helps safe languages that insert null checks before load/store
instructions. Without this patch, almost all pointers would be
considered captured even for simple loads. With this patch, an icmp with
null will not be seen as escaping as long as certain conditions are met.
There was a lot of discussion about this patch. See the Phabricator
thread for detals.
Differential Revision: https://reviews.llvm.org/D60047
llvm-svn: 362900
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: llvm-commits, jbhateja
Reviewed By: jbhateja
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63055
llvm-svn: 362898
|
| |
|
|
| |
llvm-svn: 362897
|
| |
|
|
|
|
|
|
| |
A simple re-use of the immediate operand matcher and renderer functions.
rdar://43795178
llvm-svn: 362896
|
| |
|
|
|
|
|
| |
We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from
selecting the store and extractelt independently.
llvm-svn: 362895
|
| |
|
|
|
|
|
|
| |
X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns.
Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table.
llvm-svn: 362894
|
| |
|
|
|
|
|
|
| |
combines. NFCI.
Same codegen, only differ by the oneuse limit for the sextload case.
llvm-svn: 362880
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
I'll update the 'ult' case below here as a follow-up assuming no problems here.
Differential Revision: https://reviews.llvm.org/D62979
llvm-svn: 362879
|
| |
|
|
|
|
|
|
|
|
|
| |
Types such as float and i64's do not have legal loads in Thumb1, but will still
be loaded with a LDR (or potentially multiple LDR's). As such we can treat the
cost of addressing mode calculations the same as an i32 and get some optimisation
benefits.
Differential Revision: https://reviews.llvm.org/D62968
llvm-svn: 362874
|
| |
|
|
|
|
|
|
|
|
| |
Now with MVE being added, we can add the vector addressing mode costs for it.
These are generally imm7 multiplied by the size of the type being loaded /
stored.
Differential Revision: https://reviews.llvm.org/D62967
llvm-svn: 362873
|
| |
|
|
|
|
|
|
|
|
| |
The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs
to account for those, and adds extra testing. It is dependant upon hasFPRegs16
as this is what the load/store instructions require.
Differential Revision: https://reviews.llvm.org/D62966
llvm-svn: 362872
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are starting to add an entirely separate vector architecture to the ARM
backend. To do that we need at least some separation between the existing NEON
and the new MVE code. This patch just goes through the Neon patterns and
ensures that they are predicated on HasNEON, giving MVE a stable place to start
from.
No tests yet as this is largely an NFC, and we don't have the other target that
will treat any of these intructions as legal.
Differential Revision: https://reviews.llvm.org/D62945
llvm-svn: 362870
|
| |
|
|
| |
llvm-svn: 362869
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
component.
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Move some code around, in preparation for later fixes
to the non-integral addrspace handling (D59661)
Patch By Jameson Nash <jameson@juliacomputing.com>
Reviewed By: reames, loladiro
Differential Revision: https://reviews.llvm.org/D59729
llvm-svn: 362853
|
| |
|
|
| |
llvm-svn: 362852
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time.
Summary:
The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed.
Add all insert edges then all delete edges in DTU to match the previous compile time.
Compile time on the test provided by @mstorsjo before and after this patch on my machine:
113.046s vs 35.649s
Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c.
Reviewers: kuhar, NutshellySima, mstorsjo
Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62981
llvm-svn: 362839
|
| |
|
|
| |
llvm-svn: 362837
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing
as they have no effect on CodeGen.
Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm
Reviewed By: arsenm
Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63022
llvm-svn: 362834
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Re-land r360675 after it was reverted in r360770.
This was reported in:
https://llvm.org/reports/scan-build/
Based on feedback in:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190513/652286.html
Reviewers: RKSimon, efriedma
Reviewed By: RKSimon, efriedma
Subscribers: eli.friedman, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62767
llvm-svn: 362833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rather than two callbacks.
The asynchronous lookup API (which the synchronous lookup API wraps for
convenience) used to take two callbacks: OnResolved (called once all requested
symbols had an address assigned) and OnReady to be called once all requested
symbols were safe to access). This patch updates the asynchronous lookup API to
take a single 'OnComplete' callback and a required state (SymbolState) to
determine when the callback should be made. This simplifies the common use case
(where the client is interested in a specific state) and will generalize neatly
as new states are introduced to track runtime initialization of symbols.
Clients who were making use of both callbacks in a single query will now need to
issue two queries (one for SymbolState::Resolved and another for
SymbolState::Ready). Synchronous lookup API clients who were explicitly passing
the WaitOnReady argument will now need neeed to pass a SymbolState instead (for
'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use
SymbolState::Resolved). Synchronous lookup API clients who were using default
arugment values should see no change.
llvm-svn: 362832
|
| |
|
|
| |
llvm-svn: 362825
|
| |
|
|
|
|
| |
NFCI.
llvm-svn: 362820
|
| |
|
|
|
|
|
|
|
| |
AFAIK, this is only currently called by TTI, but it could be
used from instcombine or CGP to help solve problems like:
https://bugs.llvm.org/show_bug.cgi?id=37428
https://bugs.llvm.org/show_bug.cgi?id=42174
llvm-svn: 362810
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
llvm-svn: 362805
|
| |
|
|
| |
llvm-svn: 362802
|
| |
|
|
|
|
|
|
|
|
|
| |
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.
Fixes bug 42162.
llvm-svn: 362800
|
| |
|
|
| |
llvm-svn: 362799
|