| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IntrusiveRefCntPtr""
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).
This recommits 291006, reverted in r291007.
llvm-svn: 291016
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:
fp-to-uint16: 65534.0 -> 0xfffe
With the legalization:
fp-to-sint32: 65534.0 -> 0x0000fffe
Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.
Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.
This patch reverts r279223 behavior and adds more tests and
documentations.
In PR29041's context, James Molloy mentioned that:
We don't need to mask because conversion from float->uint8_t is
undefined if the integer part of the float value is not representable in
uint8_t. Therefore we can assume this doesn't happen!
which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.
Reviewers: jmolloy, nadav, echristo, kbarton
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28284
llvm-svn: 291015
|
| |
|
|
| |
llvm-svn: 291013
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of matching:
(a + i) + 1 -> (a + i, undef, 1)
Now it matches:
(a + i) + 1 -> (a, i, 1)
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D26367
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
|
| |
|
|
| |
llvm-svn: 291010
|
| |
|
|
| |
llvm-svn: 291009
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28251
llvm-svn: 291008
|
| |
|
|
|
|
|
|
|
|
|
| |
IntrusiveRefCntPtr"
Breaks Clang's use of bitcode. Reverting until I have a fix to go with
it there.
This reverts commit r291006.
llvm-svn: 291007
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IntrusiveRefCntPtr
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).
llvm-svn: 291006
|
| |
|
|
|
|
|
|
|
|
| |
std::shared_ptr/make_shared
The intrusive nature of the reference counting is not required/used
here, so simplify the ownership model to make the code easier to
understand.
llvm-svn: 291005
|
| |
|
|
| |
llvm-svn: 291004
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
|
| |
|
|
|
|
| |
we can pinpoint performance issues.
llvm-svn: 291002
|
| |
|
|
| |
llvm-svn: 291001
|
| |
|
|
|
|
|
| |
Reported by David Binderman and ack'ed by Teresa on IRC.
PR: 31527
llvm-svn: 291000
|
| |
|
|
|
|
| |
instead. NFC.
llvm-svn: 290999
|
| |
|
|
|
|
| |
Fixes PR31529.
llvm-svn: 290998
|
| |
|
|
|
|
| |
Fixes PR31528
llvm-svn: 290995
|
| |
|
|
|
|
|
| |
Without this CHECK line, we may not detect incorrectly detected additional
regions at the end of the region tree.
llvm-svn: 290994
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RefCountedBase
This roughly matches the semantics of std::enable_shared_from_this - that it
does not dictate the ownership model of all users, but constrains those users
taking advantage of the intrusive nature to do so only when there's a guarantee
that that's the ownership model being used for the object being passed.
Reviewers: jlebar
Differential Revision: https://reviews.llvm.org/D28245
llvm-svn: 290987
|
| |
|
|
| |
llvm-svn: 290980
|
| |
|
|
|
|
|
|
| |
v2: expose using amdgcn prefix
Differential Revision: https://reviews.llvm.org/D23511
llvm-svn: 290977
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This test case has been reduced from test/Analysis/RegionInfo/mix_1.ll and
provides us with a minimal example of a test case which caused problems while
working on an improved version of the RegionInfo analysis. We upstream this
test case, as it certainly can be helpful in future debugging and optimization
tests.
Test case reduced by Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 290974
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
|
| |
|
|
|
|
| |
The check script will use var names before they are declared, which filecheck doesn't like.
llvm-svn: 290971
|
| |
|
|
|
|
| |
Missed var name
llvm-svn: 290970
|
| |
|
|
| |
llvm-svn: 290969
|
| |
|
|
|
|
|
|
|
|
| |
These tests are missing a target triple and the -m elf_x86_64 gold option,
which makes them fail on non-x86 targets.
Differential revision: https://reviews.llvm.org/D28285
Reviewers: tejohnson
llvm-svn: 290965
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Change llvm-link to use the FunctionImporter handling, instead of
manually invoking the Linker. We still need to load the module
in llvm-link to do the desired testing for invalid import requests
(weak functions), and to get the GUID (in case the function is local).
Also change the drop-debug-info test to use llvm-link so that importing
is forced (in order to test debug info handling) and independent of
import logic changes.
Reviewers: mehdi_amini
Subscribers: mgorny, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D28277
llvm-svn: 290964
|
| |
|
|
|
|
| |
Before it wasn't checking anything.
llvm-svn: 290963
|
| |
|
|
|
|
| |
Actual codegen is much better than the extract+insert patterns that was assumed.
llvm-svn: 290962
|
| |
|
|
|
|
|
| |
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.
llvm-svn: 290961
|
| |
|
|
|
|
| |
Inspired by r290953 + grep -R 'CHCEK'.
llvm-svn: 290958
|
| |
|
|
|
|
| |
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.
llvm-svn: 290956
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: aprantl, MatzeB, mkuper
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290955
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This just removes the usage of llvm::reverse and llvm::seq. That makes
it harder to handle the empty case correctly and so I've also added
a test there.
This is just a shot in the dark at what might be behind the buildbot
failures. I can't reproduce any issues locally including with ASan...
I feel like I'm missing something...
llvm-svn: 290954
|
| |
|
|
|
|
|
|
|
|
| |
to FileCheck.
Fortunately, it passes. =]
Spotted in review by Bob Wilson!
llvm-svn: 290953
|
| |
|
|
|
|
|
|
|
|
|
| |
This is both convenient and more efficient as we can skip any
intermediate reallocation of the vector.
This usage pattern came up in a subsequent patch on the pass manager,
but it seems generically useful so I factored it out and added unittests
here.
llvm-svn: 290952
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.
The fault was introduced in r266162.
I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().
Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>
Reviewers: wmi, qcolombet
Subscribers: llvm-commits, bjope, uabelho
Differential Revision: https://reviews.llvm.org/D27983
llvm-svn: 290950
|
| |
|
|
|
|
|
|
|
| |
Reviewers: sdardis, vkalintiris
Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841
llvm-svn: 290949
|
| |
|
|
|
|
|
|
|
|
| |
INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.
Differential Revision: https://reviews.llvm.org/D28138
llvm-svn: 290948
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.
This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).
Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.
I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.
Differential Revision: https://reviews.llvm.org/D28219
llvm-svn: 290947
|
| |
|
|
|
|
|
|
| |
subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.
llvm-svn: 290946
|
| |
|
|
|
|
| |
subvector insert instructions.
llvm-svn: 290945
|
| |
|
|
| |
llvm-svn: 290944
|
| |
|
|
|
|
| |
in preparation for a future change that needs these features.
llvm-svn: 290943
|
| |
|
|
|
|
| |
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.
llvm-svn: 290942
|
| |
|
|
|
|
|
|
| |
This will be used to YAMLify parts of the module summary.
Differential Revision: https://reviews.llvm.org/D28014
llvm-svn: 290935
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
has the following byte offsets:
0-7: Address
8-11: Line
12-13: Column
14-15: File
16-19: Isa
20-23: Discriminator
24+: bit fields
The packing is fine until the "Isa" field, which is an 8-bit int that occupies 4 bytes. We can instead move Discriminator into the 16-19 slot, and pack Isa into the 20-23 range along with the bit fields:
0-7: Address
8-11: Line
12-13: Column
14-15: File
16-19: Discriminator
20-23: Isa + bit fields
This layout is only 24 bytes. This 25% reduction in size may seem small but a large binary can have line tables with thousands of rows stored in a vector.
Patch by Simon Que!
Differential Revision: https://reviews.llvm.org/D27961
llvm-svn: 290931
|
| |
|
|
| |
llvm-svn: 290929
|