| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext
Differential Revision: https://reviews.llvm.org/D27706
llvm-svn: 293035
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.
Note that this does not include support for other extensions (i8 to i16), those
will be added later.
Differential Revision: https://reviews.llvm.org/D27705
llvm-svn: 293034
|
|
|
|
|
|
|
|
|
|
|
| |
If dominator tree has no roots, the pass that calculates it is
likely to be skipped. It occures, for instance, in the case of
entities with linkage available_externally. Do not run tree
verification in such case.
Differential Revision: https://reviews.llvm.org/D28767
llvm-svn: 293033
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A patch to enable the llvm-xray graph subcommand to color edges and
vertices based on statistics and to annotate vertices with statistics.
Depends on D27243
Reviewers: dblaikie, dberris
Reviewed By: dberris
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D28225
llvm-svn: 293031
|
|
|
|
|
|
|
|
|
|
|
| |
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer
Differential Revision: https://reviews.llvm.org/D28988
llvm-svn: 293030
|
|
|
|
|
|
| |
Thumb is not supported yet, so bail out early.
llvm-svn: 293029
|
|
|
|
| |
llvm-svn: 293028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.
For example, SimplifyCFG used to do the following transformation:
(before)
if.then:
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
br label %if.end
if.else:
%1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
br label %if.end
(after)
%.sink = select i1 %tobool, i32 6, i32 8
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)
This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".
rdar://problem/30110806
Differential Revision: https://reviews.llvm.org/D29111
llvm-svn: 293025
|
|
|
|
|
|
|
|
| |
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.
llvm-svn: 293024
|
|
|
|
| |
llvm-svn: 293023
|
|
|
|
|
|
|
| |
FIXME: Demangler could behave along not host but target.
For example, assume host=mingw, target=msc.
llvm-svn: 293021
|
|
|
|
| |
llvm-svn: 293019
|
|
|
|
| |
llvm-svn: 293018
|
|
|
|
|
|
|
|
|
|
| |
r291662.
I found root class should be instantiated for variadic tempate to instantiate static member explicitly.
This will fix failures in mingw DLL build.
llvm-svn: 293017
|
|
|
|
|
|
|
|
|
|
|
|
| |
Leave early ifcvt disabled for now since there are some
shader-db regressions.
This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.
llvm-svn: 293016
|
|
|
|
| |
llvm-svn: 293013
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like our cmake goop for handling .inc->td dependencies doesn't
track the .td files.
This manifests as cmake complaining about missing files since r293009.
Force a rerun to avoid that.
llvm-svn: 293012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loops.
We do this by reconstructing the newly added loops after the unroll
completes to avoid threading pass manager details through all the mess
of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues
here and enabled a bunch of unroller tests to try and make sure this is
sane.
Currently, I'm manually running loop-simplify when needed. That should
go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
|
|
|
|
|
|
|
|
| |
This surprisingly isn't NFC because there are patterns to select GPR
sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to
SUB if NZCV is dead. From ISel's perspective, both are fine.
llvm-svn: 293010
|
|
|
|
| |
llvm-svn: 293009
|
|
|
|
|
|
|
|
| |
and UNSUPPORTED"
This reverts the revert in r292942.
llvm-svn: 293007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction).
In the following example, we will split the edge going into %cont and insert the spill there.
```
%r = invoke double @print(double 0.0) to label %cont unwind label %pad
cont:
%0 = call i8 @llvm.coro.suspend(token none, i1 false)
switch i8 %0, label %suspend [i8 0, label %resume
i8 1, label %cleanup]
resume:
call double @print(double %r)
```
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: mehdi_amini, llvm-commits, EricWF
Differential Revision: https://reviews.llvm.org/D29102
llvm-svn: 293006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].
Patch By: Dave Airlie
Reviewers: nhaehnle, arsenm, tstellarAMD
Reviewed By: arsenm
Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D25428
llvm-svn: 293000
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 292996
|
|
|
|
|
|
|
|
|
|
|
| |
There was a bug here where we were using p0 instead of s32 for the
selector type in the landingpad. Instead of hardcoding these types we
should get the types from the landingpad instruction directly.
Note that we replicate an assert from SDAG here to only support
two-valued landingpads.
llvm-svn: 292995
|
|
|
|
|
|
|
|
|
|
|
| |
for CPU_SUBTYPE_ARM_V7S and CPU_SUBTYPE_ARM_V7K.
For these two cpusubtypes they should default to a cortex-a7 CPU
to give proper disassembly without a -mcpu= flag.
rdar://27431703
llvm-svn: 292993
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 292988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sequence like this:
v_cmpx_le_f32_e32 vcc, 0, v0
s_branch BB0_30
s_cbranch_execnz BB0_30
; BB#29:
exp null off, off, off, off done vm
s_endpgm
BB0_30:
; %endif110
is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.
The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.
Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.
This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 292985
|
|
|
|
| |
llvm-svn: 292984
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 292983
|
|
|
|
|
|
|
|
|
|
|
| |
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.
Reviewers: xur, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29040
llvm-svn: 292979
|
|
|
|
|
|
|
|
|
|
|
| |
When demangling a CV-qualified function type with a final reference type
parameter, we would treat the reference type parameter as a r-value ref
accidentally. This would result in the improper decoration of the
function type itself.
Resolves PR31741!
llvm-svn: 292976
|
|
|
|
|
|
|
| |
Rather than hard-coding magic values of 1, 2, 4 (bit-field), use an enum
to name the values. NFC.
llvm-svn: 292975
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reason: broke ASAN bots with a global buffer overflow.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/2291
Each test contains 20-30K test cases but takes only several (from 4 to 10)
seconds to complete on average machine. The tests cover the majority of
AMDGPU Gfx7/Gfx8 instructions, including many dark corners, and intended
to quickly find out if something is broken.
llvm-svn: 292974
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GVNHoist performs all the optimizations that MLSM does to loads, in a
more general way, and in a faster time bound (MLSM is N^3 in most
cases, N^4 in a few edge cases).
This disables the load portion.
Note that the way ld_hoist_st_sink.ll is written makes one think that
the loads should be moved to the while.preheader block, but
1. Neither MLSM nor GVNHoist do it (they both move them to identical places).
2. MLSM couldn't possibly do it anyway, as the while.preheader block
is not the head of the diamond, while.body is. (GVNHoist could do it
if it was legal).
3. At a glance, it's not legal anyway because the in-loop load
conflict with the in-loop store, so the loads must stay in-loop.
I am happy to update the test to use update_test_checks so that
checking is tighter, just was going to do it as a followup.
Note that i can find no particular benefit to the store portion on any
real testcase/benchmark i have (even size-wise). If we really still
want it, i am happy to commit to writing a targeted store sinker, just
taking the code from the MemorySSA port of MergedLoadStoreMotion
(which is N^2 worst case, and N most of the time).
We can do what it does in a much better time bound.
We also should be both hoisting and sinking stores, not just sinking
them, anyway, since whether we should hoist or sink to merge depends
basically on luck of the draw of where the blockers are placed.
Nonetheless, i have left it alone for now.
Reviewers: chandlerc, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29079
llvm-svn: 292971
|
|
|
|
|
|
|
|
|
|
| |
Differential Revision:
http://reviews.llvm.org/D28970
Reviewer:
Matt
llvm-svn: 292966
|
|
|
|
|
|
|
|
|
|
| |
When demangling a CV-qualified function type with a final parameter with
a reference type, we would insert the CV qualification on the parameter
rather than the function, and in the process adjust the insertion point
by one extra, splitting the type name. This avoids doing so, even
though the attribution is still incorrect.
llvm-svn: 292965
|
|
|
|
| |
llvm-svn: 292959
|
|
|
|
|
|
|
|
|
|
| |
Summary: As per title. This will add the instructiions we are interested in in the worklist.
Reviewers: mehdi_amini, majnemer, andreadb
Differential Revision: https://reviews.llvm.org/D29081
llvm-svn: 292957
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.
This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.
Differential Revision: https://reviews.llvm.org/D28874
llvm-svn: 292956
|
|
|
|
|
|
|
| |
In order to follow the pattern of the existing 'slow-misaligned-128store'
option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'.
llvm-svn: 292954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is in keeping with LLVM convention. The classes are InstPrinters, but the library is ${target}AsmPrinter.
This patch is in response to bryant pointing out to me that Lanai was the only backend deviating from convention here. Thanks!
Reviewers: jpienaar, bryant
Subscribers: mgorny, jgosnell, llvm-commits
Differential Revision: https://reviews.llvm.org/D29043
llvm-svn: 292953
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was surprised to see that we're missing icmp folds based on 'add nsw' in InstCombine,
but we should handle the InstSimplify cases first because that could make the InstCombine
code simpler.
Here are Alive-based proofs for the logic:
Name: add_neg_constant
Pre: C1 < 0 && (C2 > ((1<<(width(C1)-1)) + C1))
%a = add nsw i7 %x, C1
%b = icmp sgt %a, C2
=>
%b = false
Name: add_pos_constant
Pre: C1 > 0 && (C2 < ((1<<(width(C1)-1)) + C1 - 1))
%a = add nsw i6 %x, C1
%b = icmp slt %a, C2
=>
%b = false
Name: nuw
Pre: C1 u>= C2
%a = add nuw i11 %x, C1
%b = icmp ult %a, C2
=>
%b = false
Differential Revision: https://reviews.llvm.org/D29053
llvm-svn: 292952
|
|
|
|
| |
llvm-svn: 292950
|
|
|
|
|
|
|
|
|
| |
Also fixes a much worse bug where we emitted the wrong gap size for the
def range uncovered by the test for this issue.
Fixes PR31726.
llvm-svn: 292949
|
|
|
|
|
|
| |
The comment talked about replacing vpmovzxwd+vpslld+vpsrad with vpmovsxwd - which isn't valid as we're sign extending a <8 x i1> bool vector not an all/nobits <8 x i16>
llvm-svn: 292948
|
|
|
|
| |
llvm-svn: 292946
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions. These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use. This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.
For example codegen X & !(Y | Z) as:
jmp_if_X TmpBB
jmp FBB
TmpBB:
jmp_if_notY Tmp2BB
jmp FBB
Tmp2BB:
jmp_if_notZ TBB
jmp FBB
Reviewers: bogner, MatzeB, qcolombet
Subscribers: llvm-commits, hiraditya, mcrosier, sebpop
Differential Revision: https://reviews.llvm.org/D28380
llvm-svn: 292944
|
|
|
|
|
|
|
|
|
|
|
| |
and UNSUPPORTED"
After r292904 llvm-lit fails to emit the test results in the XML format for
Apple's internal buildbots.
rdar://30164800
llvm-svn: 292942
|