| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
idiom.
r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...
This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.
However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.
Sorry for any trouble or disruption!
llvm-svn: 289916
|
|
|
|
|
|
| |
This reverts commit 289902 while investigating bot berakage.
llvm-svn: 289906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289902
|
|
|
|
|
|
|
|
|
| |
Removing sensitivity to scheduling (by using CHECK-DAG instead of CHECK) and
some other minor corrections.
In preparation to commit Power9 processor model.
llvm-svn: 289900
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.
All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.
Differential Revision: https://reviews.llvm.org/D27324
llvm-svn: 289899
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27687
llvm-svn: 289896
|
|
|
|
| |
llvm-svn: 289895
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D14590
llvm-svn: 289894
|
|
|
|
| |
llvm-svn: 289893
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.
Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).
llvm-svn: 289891
|
|
|
|
|
|
|
|
|
| |
locations on any instructions
This seems more consistent, and helps tidy up/simplify some other code
in this change.
llvm-svn: 289889
|
|
|
|
|
|
|
|
|
|
|
| |
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .
Differential Revision: https://reviews.llvm.org/D27793
llvm-svn: 289882
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The relocation is missing mask so an address that has non-zero bits in 47:43 may overwrite the register number. (Frequently shows up as target register changed to `xzr`....)
Reviewers: t.p.northover, lhames
Subscribers: davide, aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D27609
llvm-svn: 289880
|
|
|
|
| |
llvm-svn: 289877
|
|
|
|
|
|
|
|
|
| |
Needed due to change to require datalayout (r289719).
Found this in my own testing, maybe there aren't any bots using a v1.12
gold yet.
llvm-svn: 289876
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.
Differential Revision: https://reviews.llvm.org/D27787
llvm-svn: 289874
|
|
|
|
|
|
|
|
|
| |
This used to be allowed before r289402 by default (before r289402 you
could have TBAA metadata on any instruction), and while I'm not sure
that it helps, it does sound reasonable enough to not fail the verifier
and we have out-of-tree users who use this.
llvm-svn: 289872
|
|
|
|
|
|
|
|
|
| |
This test is currently sensitive to scheduling. Using CHECK-DAG allows us to
preserve the main purpose of the test and remove this sensivity.
In preparation to commit Power9 processor model.
llvm-svn: 289869
|
|
|
|
| |
llvm-svn: 289868
|
|
|
|
| |
llvm-svn: 289866
|
|
|
|
|
|
|
|
|
| |
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".
Differential Revision: https://reviews.llvm.org/D27814
llvm-svn: 289863
|
|
|
|
|
|
|
|
| |
It seems pointless to add a resource to an archive because it won't have
any symbols to link against (and link.exe doesn't have an equivalent of
--whole-archive), but lib.exe allows it for some reason.
llvm-svn: 289859
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.
The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.
Corresponding changes for smax/umin/umax to follow.
Differential Revision: https://reviews.llvm.org/D27531
llvm-svn: 289855
|
|
|
|
|
|
|
|
|
|
|
| |
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837
This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
llvm-svn: 289846
|
|
|
|
|
|
|
|
|
| |
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D27677
llvm-svn: 289845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.
Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.
llvm-svn: 289843
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
My motivating case looks like this:
- vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
- vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
- vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
+ vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.
So the test case diffs all appear to be improvements except one test in
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.
Differential Revision: https://reviews.llvm.org/D27692
llvm-svn: 289837
|
|
|
|
|
|
| |
As discussed on D27692
llvm-svn: 289834
|
|
|
|
|
|
|
|
| |
common inst"
Reverting as it is causing buildbot failures (address sanitizer).
llvm-svn: 289833
|
|
|
|
|
|
|
|
| |
sections specially.
Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation).
llvm-svn: 289832
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
|
|
|
|
|
|
|
|
| |
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
|
|
|
|
| |
llvm-svn: 289822
|
|
|
|
| |
llvm-svn: 289819
|
|
|
|
| |
llvm-svn: 289817
|
|
|
|
|
|
| |
This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347.
llvm-svn: 289814
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
|
|
|
|
|
|
| |
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles
llvm-svn: 289811
|
|
|
|
| |
llvm-svn: 289807
|
|
|
|
|
|
|
|
|
|
| |
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.
Fixes PR31340.
llvm-svn: 289804
|
|
|
|
|
|
| |
This also refactors some common code into the 'GetTypeName' method.
llvm-svn: 289803
|
|
|
|
| |
llvm-svn: 289800
|
|
|
|
| |
llvm-svn: 289798
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:
bic r5, r7, #31
cbz r5, .LBB2_10
got rewritten into:
lsrs r5, r7, #5
beq .LBB2_10
The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.
For completeness, this was the original commit message:
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.
Differential Revision: https://reviews.llvm.org/D27761
llvm-svn: 289794
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GAS already allows flags for sections to be specified directly as a
numeric value. This functionality is particularly useful for setting
processor or application-specific values that may not be directly
supported or understood by LLVM. This patch allows LLVM to use numeric
section flag values verbatim if specified by the assembly file.
Reviewers: grosbach, rafael, t.p.northover, rengolin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27451
llvm-svn: 289785
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements execute-only support for ARM code generation, which
prevents the compiler from generating data accesses to code sections.
The following changes are involved:
* Add the CodeGen option "-arm-execute-only" to the ARM code generator.
* Add the clang flag "-mexecute-only" as well as the GCC-compatible
alias "-mpure-code" to enable this option.
* When enabled, literal pools are replaced with MOVW/MOVT instructions,
with VMOV used in addition for floating-point literals. As the MOVT
instruction is required, execute-only support is only available in
Thumb mode for targets supporting ARMv8-M baseline or Thumb2.
* Jump tables are placed in data sections when in execute-only mode.
* The execute-only text section is assigned section ID 0, and is
marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'.
This also overrides selection of ELF sections for globals.
llvm-svn: 289784
|
|
|
|
| |
llvm-svn: 289779
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true. The timeline is as
follows:
1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
`TailDup.shouldTailDuplicate` on its argument, which in turn calls
`analyzeBranch` with `AllowModify` set to true.
2. This `analyzeBranch` call edits the terminator sequence of the block
based on the physical layout of the machine function, turning an
unanalyzable non-fallthrough block to a unanalyzable fallthrough
block. Normally MBP bails out of rearranging such blocks, but this
block was unanalyzable non-fallthrough (and thus rearrangeable) the
first time MBP looked at it, and so it goes ahead and decides where
it should be placed in the function.
3. When placing this block MBP fails to analyze and thus update the
block in keeping with the new physical layout.
Concretely, before (1) we have something like:
```
LBL0:
< unknown terminator op that may branch to LBL1 >
jmp LBL1
LBL1:
... A
LBL2:
... B
```
In (2), analyze branch simplifies this to
```
LBL0:
< unknown terminator op that may branch to LBL2 >
;; jmp LBL1 <- redundant jump removed
LBL1:
... A
LBL2:
... B
```
In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.
```
LBL0:
< unknown terminator op that may branch to LBL2 >
;; jmp LBL1 <- redundant jump
LBL2:
... B
LBL1:
... A
```
and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.
There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator. That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.
I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.
Reviewers: chandlerc, gberry, MatzeB, iteratee
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27783
llvm-svn: 289764
|
|
|
|
|
|
| |
SimplifyDemandedVectorElts.
llvm-svn: 289759
|
|
|
|
|
|
|
|
|
| |
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
|