| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
IntrusiveRefCntPtr"
Breaks Clang's use of bitcode. Reverting until I have a fix to go with
it there.
This reverts commit r291006.
llvm-svn: 291007
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IntrusiveRefCntPtr
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).
llvm-svn: 291006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
|
| |
|
|
|
|
| |
we can pinpoint performance issues.
llvm-svn: 291002
|
| |
|
|
|
|
|
| |
Reported by David Binderman and ack'ed by Teresa on IRC.
PR: 31527
llvm-svn: 291000
|
| |
|
|
|
|
| |
instead. NFC.
llvm-svn: 290999
|
| |
|
|
|
|
| |
Fixes PR31529.
llvm-svn: 290998
|
| |
|
|
|
|
| |
Fixes PR31528
llvm-svn: 290995
|
| |
|
|
|
|
|
|
| |
v2: expose using amdgcn prefix
Differential Revision: https://reviews.llvm.org/D23511
llvm-svn: 290977
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
|
| |
|
|
|
|
| |
Actual codegen is much better than the extract+insert patterns that was assumed.
llvm-svn: 290962
|
| |
|
|
|
|
|
| |
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.
llvm-svn: 290961
|
| |
|
|
|
|
| |
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.
llvm-svn: 290956
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: aprantl, MatzeB, mkuper
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290955
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.
The fault was introduced in r266162.
I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().
Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>
Reviewers: wmi, qcolombet
Subscribers: llvm-commits, bjope, uabelho
Differential Revision: https://reviews.llvm.org/D27983
llvm-svn: 290950
|
| |
|
|
|
|
|
|
|
| |
Reviewers: sdardis, vkalintiris
Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841
llvm-svn: 290949
|
| |
|
|
|
|
|
|
|
|
| |
INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.
Differential Revision: https://reviews.llvm.org/D28138
llvm-svn: 290948
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.
This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).
Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.
I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.
Differential Revision: https://reviews.llvm.org/D28219
llvm-svn: 290947
|
| |
|
|
|
|
|
|
| |
subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.
llvm-svn: 290946
|
| |
|
|
|
|
| |
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.
llvm-svn: 290942
|
| |
|
|
|
|
|
|
| |
This will be used to YAMLify parts of the module summary.
Differential Revision: https://reviews.llvm.org/D28014
llvm-svn: 290935
|
| |
|
|
|
|
|
| |
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.
llvm-svn: 290928
|
| |
|
|
|
|
|
|
|
| |
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
|
| |
|
|
|
|
| |
warnings; other minor fixes (NFC).
llvm-svn: 290925
|
| |
|
|
|
|
|
|
| |
As per post-commit review for r289993 (D27775), we can only safely
import a type as a decl if it has an Identifier, as the Name alone
is not enough to be unique across modules.
llvm-svn: 290915
|
| |
|
|
| |
llvm-svn: 290913
|
| |
|
|
|
|
|
|
|
|
|
| |
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.
So just in case we decide this is the right way, we might as well
have a cleaner implementation.
llvm-svn: 290912
|
| |
|
|
|
|
| |
As Pete points out in r290905, CallSite lets us avoid duplicating this!
llvm-svn: 290909
|
| |
|
|
|
|
|
|
| |
Allows LLVM to build with LLVM_USE_OPROFILE=True.
Patch by Mark Dewing. Thanks Mark!
llvm-svn: 290908
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use getReturnedArgOperand() instead of rolling our own. Note that it's
equivalent because there can only be one 'returned' operand.
The existing code was also incorrect: there already was awkward logic to
ignore callee/EH blocks, but operands can now also be operand bundles,
in which case we'll look for non-existent parameter attributes.
Unfortunately, this isn't observable in-tree, as it only crashes when
exercising the regular call lowering logic with operand bundles.
Still, this is a nice small cleanup anyway.
llvm-svn: 290905
|
| |
|
|
| |
llvm-svn: 290899
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a distinct contents for semBogus and semPPCDoubleDouble in order
to prevent compilers from collapsing them to a single memory address,
while we heavily rely on every semantic having distinct address.
This happens if insecure optimization collapsing identical values is
enabled. As a result, APFloats of semBogus are indistinguishable from
semPPCDoubleDouble -- and whenever the move constructor is used, the old
value beings being incorrectly recognized as a semPPCDoubleDouble.
Since the values in semPPCDoubleDouble are not used anywhere,
we can easily solve this issue via altering the value of one of the
fields and therefore ensuring that the collapse can not occur.
Differential Revision: https://reviews.llvm.org/D28112
llvm-svn: 290896
|
| |
|
|
|
|
| |
to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
|
| |
|
|
|
|
| |
inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
|
| |
|
|
|
|
| |
corresponding to 256-bit subvector inserts.
llvm-svn: 290870
|
| |
|
|
|
|
| |
instructions.
llvm-svn: 290869
|
| |
|
|
| |
llvm-svn: 290867
|
| |
|
|
| |
llvm-svn: 290866
|
| |
|
|
|
|
| |
select a masked operation.
llvm-svn: 290865
|
| |
|
|
|
|
| |
shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
|
| |
|
|
|
|
|
|
| |
shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.
llvm-svn: 290863
|
| |
|
|
|
|
| |
DAGCombine already does these.
llvm-svn: 290860
|
| |
|
|
|
|
|
| |
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z
llvm-svn: 290859
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
No need to have this per-architecture. While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards. Individual entry emission code goes to the
entry's own class.
Fully tested on amd64, cross-builds on both ARMs and PowerPC.
Reviewers: dberris
Subscribers: aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D28209
llvm-svn: 290858
|
| |
|
|
| |
llvm-svn: 290848
|
| |
|
|
| |
llvm-svn: 290844
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
|
| |
|
|
| |
llvm-svn: 290828
|
| |
|
|
| |
llvm-svn: 290827
|
| |
|
|
|
|
|
| |
The checks were improved with:
https://reviews.llvm.org/rL290194
llvm-svn: 290826
|