| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 289911
|
| |
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 289907
|
| |
|
|
|
|
| |
This reverts commit 289902 while investigating bot berakage.
llvm-svn: 289906
|
| |
|
|
| |
llvm-svn: 289903
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289902
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.
All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.
Differential Revision: https://reviews.llvm.org/D27324
llvm-svn: 289899
|
| |
|
|
|
|
|
| |
This feature now gates such stores after r289845. Thus the Exynos
processors now need this feature.
llvm-svn: 289898
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27687
llvm-svn: 289896
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D14590
llvm-svn: 289894
|
| |
|
|
|
|
|
|
| |
Post-commit review feedback from Adrian Prantl.
Hopefully this fixes that up :)
llvm-svn: 289892
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.
Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).
llvm-svn: 289891
|
| |
|
|
|
|
|
|
|
| |
locations on any instructions
This seems more consistent, and helps tidy up/simplify some other code
in this change.
llvm-svn: 289889
|
| |
|
|
|
|
|
|
| |
Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.
llvm-svn: 289888
|
| |
|
|
|
|
|
|
|
|
|
| |
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .
Differential Revision: https://reviews.llvm.org/D27793
llvm-svn: 289882
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: The relocation is missing mask so an address that has non-zero bits in 47:43 may overwrite the register number. (Frequently shows up as target register changed to `xzr`....)
Reviewers: t.p.northover, lhames
Subscribers: davide, aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D27609
llvm-svn: 289880
|
| |
|
|
| |
llvm-svn: 289877
|
| |
|
|
|
|
|
|
| |
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.
llvm-svn: 289875
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.
Differential Revision: https://reviews.llvm.org/D27787
llvm-svn: 289874
|
| |
|
|
|
|
|
|
|
| |
This used to be allowed before r289402 by default (before r289402 you
could have TBAA metadata on any instruction), and while I'm not sure
that it helps, it does sound reasonable enough to not fail the verifier
and we have out-of-tree users who use this.
llvm-svn: 289872
|
| |
|
|
|
|
| |
This should have been removed with r288446.
llvm-svn: 289871
|
| |
|
|
| |
llvm-svn: 289868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
threshold (NFC)
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.
This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27696
llvm-svn: 289867
|
| |
|
|
| |
llvm-svn: 289866
|
| |
|
|
|
|
|
|
|
| |
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".
Differential Revision: https://reviews.llvm.org/D27814
llvm-svn: 289863
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
then/else branch. NFC.
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.
Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.
Differential Revision: https://reviews.llvm.org/D27804
llvm-svn: 289862
|
| |
|
|
|
|
|
|
|
|
| |
The assert could potentially fire (though no cases have been
encountered), so just check that the instruction we're handling
specially for rematerialization only has one def to begin with.
Reviewed by Wei Mi over email.
llvm-svn: 289861
|
| |
|
|
|
|
|
|
| |
It seems pointless to add a resource to an archive because it won't have
any symbols to link against (and link.exe doesn't have an equivalent of
--whole-archive), but lib.exe allows it for some reason.
llvm-svn: 289859
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.
The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.
Corresponding changes for smax/umin/umax to follow.
Differential Revision: https://reviews.llvm.org/D27531
llvm-svn: 289855
|
| |
|
|
|
|
|
|
| |
MachineLegalizer used to be the name of both the class and the member,
causing GCC errors. r276522 fixed that by renaming the member to just
'Legalizer'. The 'class' workaround isn't necessary anymore; drop it.
llvm-svn: 289848
|
| |
|
|
|
|
|
|
|
|
|
| |
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837
This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
llvm-svn: 289846
|
| |
|
|
|
|
|
|
|
| |
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D27677
llvm-svn: 289845
|
| |
|
|
|
|
|
| |
It's brittle, and Doxygen already picks the overriden method's comment
anyway.
llvm-svn: 289844
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.
Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.
llvm-svn: 289843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
My motivating case looks like this:
- vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
- vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
- vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
+ vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.
So the test case diffs all appear to be improvements except one test in
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.
Differential Revision: https://reviews.llvm.org/D27692
llvm-svn: 289837
|
| |
|
|
|
|
| |
As discussed on D27692
llvm-svn: 289834
|
| |
|
|
|
|
|
|
| |
common inst"
Reverting as it is causing buildbot failures (address sanitizer).
llvm-svn: 289833
|
| |
|
|
|
|
|
|
| |
sections specially.
Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation).
llvm-svn: 289832
|
| |
|
|
|
|
| |
Avoid duplicating instructions in the int32/int64 domains.
llvm-svn: 289830
|
| |
|
|
|
|
|
|
|
|
|
| |
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
|
| |
|
|
|
|
|
|
| |
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
|
| |
|
|
| |
llvm-svn: 289822
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.
Differential Revision: https://reviews.llvm.org/D27454
llvm-svn: 289820
|
| |
|
|
| |
llvm-svn: 289819
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
|
| |
|
|
|
|
| |
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles
llvm-svn: 289811
|
| |
|
|
|
|
|
|
|
|
| |
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.
Fixes PR31340.
llvm-svn: 289804
|
| |
|
|
|
|
| |
This also refactors some common code into the 'GetTypeName' method.
llvm-svn: 289803
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:
bic r5, r7, #31
cbz r5, .LBB2_10
got rewritten into:
lsrs r5, r7, #5
beq .LBB2_10
The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.
For completeness, this was the original commit message:
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.
Differential Revision: https://reviews.llvm.org/D27761
llvm-svn: 289794
|
| |
|
|
|
|
|
| |
This allows the instrumention hook functions to do better
pretty-printing.
llvm-svn: 289793
|
| |
|
|
| |
llvm-svn: 289788
|