| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
addresses are comparable. NFCI.
llvm-svn: 307055
|
|
|
|
|
|
|
|
| |
The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265.
Differential Revision: https://reviews.llvm.org/D31946
llvm-svn: 307053
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.
Example:
v8i32 V = ...
v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)
Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.
Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena
Reviewed By: delena
Subscribers: guyblank, delena, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D34077
llvm-svn: 307036
|
|
|
|
| |
llvm-svn: 307004
|
|
|
|
|
|
| |
prevent a crash if the type isn't a simple VT.
llvm-svn: 306950
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
removePartialredunduncy and in WorkList
Summary:
removePartialRedundency optimization introduces a state in the
RegisterCoalescer where an instruction pointed to in the WorkList
is deleted from the MBB and then removed from the ErasedList.
This patch updates the ErasedList to be used globally by not erasing
erased Instructions from it to solve the problem.
The patch also accounts for the case where an Instruction was previously
deleted and the same memory was reused by BuildMI to create a new instruction.
Reviewers: kparzysz, qcolombet
Reviewed By: qcolombet
Subscribers: MatzeB, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D34902
llvm-svn: 306915
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
Reviewers: anemet, davidxl, hfinkel
Reviewed By: anemet
Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34867
llvm-svn: 306912
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the instructions at the beginning of the block have no location,
we're better off using the location of the first instruction in the
current basic block. At the very least, that instruction post-dominates
this one, whereas if we don't emit a .cv_loc directive, we end up using
the potentially invalid location that falls through from the previous
block.
We could probably do better here by emitting some kind of ".cv_loc end"
directive that stops the line table entry of the previous .cv_loc
directive from bleeding out of its basic block. This would improve the
line table when an entire MBB has no valid location info.
llvm-svn: 306889
|
|
|
|
|
|
|
|
|
| |
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.
llvm-svn: 306875
|
|
|
|
|
|
|
| |
I'm tired of seeing this:
.globl "?Test@@YAXXZ" # -- Begin function ^A?Test@@YAXXZ
llvm-svn: 306855
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Reviewers: anemet, davidxl
Reviewed By: anemet
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D34864
llvm-svn: 306848
|
|
|
|
|
|
|
| |
This reverts commit r306819 which appears be exposing underlying
issues in a stage1 ppc64be build
llvm-svn: 306820
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.
Tests of note:
* test/CodeGen/X86/build-vector* - Improved.
* test/CodeGen/BPF/undef.ll - Improved store alignment allows an
additional store merge
* test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
case we already do not handle well. Here, the DAG is improved, but
scheduling causes a code size degradation.
Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D34472
llvm-svn: 306819
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r301116, a custom lowering needed to be introduced to be able to
legalize 8 and 16-bit divisions on ARM targets without a division
instruction, since 2-step legalization (WidenScalar from 8 bit to 32
bit, then Libcall the 32-bit division) doesn't work.
This fixes this and makes this kind of multi-step legalization, where
first the size of the type needs to be changed and then some action is
needed that doesn't require changing the size of the type,
straighforward to specify.
Differential Revision: https://reviews.llvm.org/D32529
llvm-svn: 306806
|
|
|
|
|
|
|
|
| |
Reviewer: dblaikie
Differential revision: https://reviews.llvm.org/D34765
llvm-svn: 306771
|
|
|
|
|
|
| |
https://reviews.llvm.org/D34837
llvm-svn: 306766
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,
```
BB#1: ; Loop Header
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6: ; Loop Latch
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
%vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
```
Existing two-address generation pass generates following code:
```
BB#1:
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6:
Predecessors according to CFG: BB#5 BB#4
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
%vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
%vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
%vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
JMP_1 <BB#7>
```
This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:
```
.LBB0_6:
addl %esi, %edi
addl %ebx, %edi
cmpl $10, %edi
movl %edi, %esi
jl .LBB0_1
```
This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes
```
BB#1:
%vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
...
BB#6: derived from LLVM BB %bb7
Predecessors according to CFG: BB#5 BB#4
%vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
%vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
%vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
%vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
%vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
%vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
JL_1 <BB#1>, %EFLAGS<imp-use,kill>
JMP_1 <BB#7>
```
and the final assembly does not have redundant copy:
```
.LBB0_6:
addl %edi, %eax
addl %ebx, %eax
cmpl $10, %eax
jl .LBB0_1
```
Reviewers: qcolombet, MatzeB, wmi
Reviewed By: wmi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31821
llvm-svn: 306758
|
|
|
|
|
|
|
| |
This reverts commit r305389. This broke chromium builds, so reverting
while I investigate further.
llvm-svn: 306741
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Arguably non-integral pointers probably shouldn't show up here at all,
but since the backend doesn't complain and this takes valid (according
to the Verifier) IR and makes it invalid, make sure not to introduce
any inttoptr instructions if we're dealing with non-integral pointers.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D33110
llvm-svn: 306737
|
|
|
|
| |
llvm-svn: 306716
|
|
|
|
|
|
|
|
|
|
|
| |
Relanding after restricting equalBaseIndex to not erroneuosly consider
a FrameIndices stemming from alloca from being comparable as its
offset is set post-selectionDAG.
Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.
llvm-svn: 306688
|
|
|
|
|
|
|
|
|
|
| |
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio
If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).
llvm-svn: 306676
|
|
|
|
|
|
|
|
|
|
|
| |
Given no NaNs and no signed zeroes it folds:
(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)
Differential Revision: https://reviews.llvm.org/D34579
llvm-svn: 306592
|
|
|
|
| |
llvm-svn: 306557
|
|
|
|
| |
llvm-svn: 306553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.
Majority of the changes in this patch:
1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.
These changes are target independent and described below.
Changed CFI instructions so that they:
1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal
Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.
Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D18046
llvm-svn: 306529
|
|
|
|
|
|
| |
This reverts commit r306498 which appears to cause a compilrt-rt test failures
llvm-svn: 306501
|
|
|
|
|
|
|
|
|
|
|
| |
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.
Differential Revision: https://reviews.llvm.org/D34723
llvm-svn: 306499
|
|
|
|
|
|
|
| |
Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.
llvm-svn: 306498
|
|
|
|
| |
llvm-svn: 306485
|
|
|
|
| |
llvm-svn: 306481
|
|
|
|
|
|
|
| |
Also add IRTranslator support.
https://reviews.llvm.org/D34710
llvm-svn: 306475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
llvm-svn: 306471
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this check, COPY instructions can actually be one of the generic casts
in disguise. That's confusing and bad.
At some point during ISel this restriction has to be relaxed since the fully
selected instructions will usually use COPY for those purposes. Right now I
think it's possible that relaxation occurs during RegBankSelect (hence the
change there). I'm not convinced that's where it belongs long-term though.
llvm-svn: 306470
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34640
llvm-svn: 306466
|
|
|
|
| |
llvm-svn: 306452
|
|
|
|
|
|
|
|
| |
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.
llvm-svn: 306441
|
|
|
|
|
|
|
| |
This was a clean-up suggestion from:
https://reviews.llvm.org/D34005
llvm-svn: 306438
|
|
|
|
|
|
|
|
|
| |
- DenseMap should be faster than std::map
- Use the `InsertRes = insert() if (!InsertRes.inserted)` pattern rather
than the `if (!X.contains(...)) { X.insert(...); }` to save one map
lookup.
llvm-svn: 306436
|
|
|
|
|
|
|
|
|
|
|
|
| |
assetion failure
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.
This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.
Differential Revision: https://reviews.llvm.org/D34679
llvm-svn: 306404
|
|
|
|
|
|
| |
succesor -> successor
llvm-svn: 306393
|
|
|
|
|
|
|
|
|
| |
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.
llvm-svn: 306352
|
|
|
|
|
|
|
|
|
|
|
|
| |
truncation and extension match.
This fixes PR33368.
Reviewer: rksimon
Differential Revision: https://reviews.llvm.org/D34069
llvm-svn: 306345
|
|
|
|
|
|
| |
warnings; other minor fixes (NFC).
llvm-svn: 306341
|
|
|
|
|
|
|
|
|
|
| |
Fixes bug 33597.
Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.
llvm-svn: 306333
|
|
|
|
|
|
|
| |
This is the dual problem to legalizing G_INSERTs so most of the code and
testing was cribbed from there.
llvm-svn: 306328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also added a comment.
Pulled out of https://reviews.llvm.org/D34099.
Reviewers: iteratee
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34388
llvm-svn: 306279
|
|
|
|
|
|
|
|
| |
Revert "[MBP] do not rotate loop if it creates extra branch"
It breaks the sanitizer build bots. Need to fix this.
llvm-svn: 306276
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.
Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271
llvm-svn: 306272
|
|
|
|
|
|
|
|
|
| |
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.
Differential Revision: https://reviews.llvm.org/D34503
llvm-svn: 306242
|