| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Variable's stack location can stretch longer than it should. If a
variable is placed at the stack in a some nested basic block its range
can be calculated to be up to the next occurrence of the variable's
DBG_VALUE, or up to the end of the function, thus covering a basic
blocks that should not be included in the variable’s location range.
This happens because the DbgEntityHistoryCalculator ends register
locations at the end of a basic block only if the variable’s location
register has been changed throughout the function, which is not the
case for the register used to reference stack objects.
This patch also tries to produce a single value location if the location
list builder managed to merge all the locations into one.
Reviewers: aprantl, dstenb, jmorse
Reviewed By: aprantl, dstenb, jmorse
Subscribers: djtodoro, ivanbaev, asowda
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D61600
llvm-svn: 362923
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, bkramer
Reviewed By: spatel
Subscribers: javed.absar, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63004
llvm-svn: 362912
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
This is fixed recommit of r361356 after PowerPC64 multistage build failure.
llvm-svn: 362901
|
| |
|
|
| |
llvm-svn: 362897
|
| |
|
|
|
|
|
|
| |
combines. NFCI.
Same codegen, only differ by the oneuse limit for the sextload case.
llvm-svn: 362880
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
component.
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing
as they have no effect on CodeGen.
Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm
Reviewed By: arsenm
Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63022
llvm-svn: 362834
|
| |
|
|
| |
llvm-svn: 362825
|
| |
|
|
|
|
| |
NFCI.
llvm-svn: 362820
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
llvm-svn: 362805
|
| |
|
|
|
|
|
|
|
|
|
| |
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.
Fixes bug 42162.
llvm-svn: 362800
|
| |
|
|
|
|
| |
Removed unused (in non-debug builds) variable.
llvm-svn: 362775
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage.
Range for Debug Variable("i") computed according to current state of instructions
inside of basic block. But Register Allocator creates new instructions which were not taken
into account when Live Debug Variables computed. In the result DBG_VALUE instruction for
the "i" variable was put after these newly inserted instructions. This is incorrect.
Debug Value for the loop counter should be inserted before any loop instruction.
Differential Revision: https://reviews.llvm.org/D62650
llvm-svn: 362750
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
handling (PR42123)
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:
1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.
2 - The merged load\store node must retain the non-temporal flag.
Differential Revision: https://reviews.llvm.org/D62910
llvm-svn: 362723
|
| |
|
|
|
|
| |
Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s
llvm-svn: 362695
|
| |
|
|
|
|
|
|
| |
Select G_FFLOOR and G_FCEIL for MIPS32.
Differential Revision: https://reviews.llvm.org/D62901
llvm-svn: 362688
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
|
| |
|
|
|
|
|
|
|
|
|
| |
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
|
| |
|
|
| |
llvm-svn: 362622
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.
The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)
Differential Revision: https://reviews.llvm.org/D62272
llvm-svn: 362612
|
| |
|
|
|
|
| |
Will be used more in an upcoming patch.
llvm-svn: 362595
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
|
| |
|
|
|
|
|
| |
The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots.
Reverting it to bring the bots back to green.
llvm-svn: 362539
|
| |
|
|
|
|
|
|
|
|
| |
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
|
| |
|
|
|
|
|
|
|
|
|
| |
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR41952)
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M
The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.
Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma
Reviewed By: efriedma
Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62266
llvm-svn: 362487
|
| |
|
|
|
|
|
|
|
|
|
|
| |
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
|
| |
|
|
|
|
|
|
| |
comments. NFCI.
Pre-commit requested for D62777.
llvm-svn: 362485
|
| |
|
|
|
|
|
|
| |
Follow up to D62807.
Differential Revision: https://reviews.llvm.org/D62811
llvm-svn: 362483
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
|
| |
|
|
| |
llvm-svn: 362448
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
|
| |
|
|
|
|
|
|
|
|
| |
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
|
| |
|
|
|
|
|
|
| |
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
newly deduced location
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.
Reviewers: aprantl, jmorse, wolfgangp, dstenb
Reviewed By: aprantl
Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D62196
llvm-svn: 362373
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
|
| |
|
|
|
|
| |
Similar to what was done for masked load and gather.
llvm-svn: 362342
|
| |
|
|
|
|
|
|
|
| |
mask. Fix bug in ScalarizeMaskedMemIntrin
Need to cast only to Constant instead of ConstantVector to allow
ConstantAggregateZero.
llvm-svn: 362341
|
| |
|
|
|
|
| |
Similar to what was recently done for gathers in r362015.
llvm-svn: 362337
|
| |
|
|
|
|
|
|
|
|
| |
bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
|
| |
|
|
|
|
|
|
|
|
|
|
| |
undefs + truncation (PR41020)
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
|
| |
|
|
|
|
|
|
| |
in analysis.
These might have been replaced in multiple use cases.
llvm-svn: 362322
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedBits. NFCI.
Helps with debugging as we recurse between them.
llvm-svn: 362321
|
| |
|
|
|
|
|
|
|
|
| |
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
|
| |
|
|
|
|
|
|
|
|
| |
MachineInstr::print
Over a year ago, MachineInstr gained a fourth boolean parameter that occurs
before the TII pointer. When this happened, several places started accidentally
passing TII into this boolean parameter instead of the TII parameter.
llvm-svn: 362312
|