| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
for now.
llvm-svn: 295327
|
|
|
|
| |
llvm-svn: 295326
|
|
|
|
| |
llvm-svn: 295321
|
|
|
|
|
|
|
|
| |
load combine"
This change causes some of AMDGPU and PowerPC tests to fail.
llvm-svn: 295316
|
|
|
|
|
|
|
|
|
|
| |
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314
|
|
|
|
|
|
|
|
| |
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.
llvm-svn: 295310
|
|
|
|
|
|
| |
Just use VADDD if available, bail out if not.
llvm-svn: 295309
|
|
|
|
| |
llvm-svn: 295308
|
|
|
|
|
|
|
| |
Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating
point values in the soft-fp float mode.
llvm-svn: 295306
|
|
|
|
|
|
|
| |
Also add mappings for single and double precision FP, and use them for G_FADD
and G_LOAD.
llvm-svn: 295302
|
|
|
|
|
|
|
|
| |
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.
llvm-svn: 295300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
|
|
|
|
|
|
| |
intrinsics like it does 128/256-bit.
llvm-svn: 295294
|
|
|
|
|
|
|
|
| |
intrinsics with select instructions. For 512-bit add new unmasked intrinsics.
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.
llvm-svn: 295290
|
|
|
|
| |
llvm-svn: 295276
|
|
|
|
| |
llvm-svn: 295273
|
|
|
|
| |
llvm-svn: 295270
|
|
|
|
|
|
| |
Update test uses with expansion in terms of new intrinsics.
llvm-svn: 295269
|
|
|
|
| |
llvm-svn: 295268
|
|
|
|
| |
llvm-svn: 295265
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.
Differential Revision: https://reviews.llvm.org/D29856
llvm-svn: 295262
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D30008
llvm-svn: 295260
|
|
|
|
|
|
|
|
| |
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.
llvm-svn: 295255
|
|
|
|
|
|
|
| |
Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.
llvm-svn: 295254
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes PR 31921
Summary:
Predicateinfo requires an ugly workaround to try to avoid literal
struct types due to the intrinsic mangling not being implemented.
This workaround actually does not work in all cases (you can hit the
assert by bootstrapping with -print-predicateinfo), and can't be made
to work without DFS'ing the type (IE copying getMangledStr and using a
version that detects if it would crash).
Rather than do that, i just implemented the mangling. It seems
simple, since they are unified structurally.
Looking at the overloaded-mangling testcase we have, it actually turns
out the gc intrinsics will *also* crash if you try to use a literal
struct. Thus, the testcase added fails before this patch, and works
after, without needing to resort to predicateinfo.
Reviewers: chandlerc, davide
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D29925
llvm-svn: 295253
|
|
|
|
| |
llvm-svn: 295247
|
|
|
|
| |
llvm-svn: 295246
|
|
|
|
| |
llvm-svn: 295244
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 295243
|
|
|
|
|
|
| |
Tests will be included with future commit.
llvm-svn: 295242
|
|
|
|
| |
llvm-svn: 295241
|
|
|
|
|
|
| |
Also use a more refined condition.
llvm-svn: 295239
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In rL291613, the section name was interned in LLVMContext. However,
this broke the ability to remove the section from a GlobalObject,
because it tried to intern empty strings, which is not allowed.
Fix that and add an appropriate regression test.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D29795
llvm-svn: 295238
|
|
|
|
| |
llvm-svn: 295237
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cl::opt and verify from releaseMemory().
This is a short term solution to the problem that many passes currently fail
to update the assumption cache. In the long term the verifier should not
be controllable with a flag. We should either fix all passes to correctly
update the assumption cache and enable the verifier unconditionally or
somehow arrange for the assumption list to be updated automatically by passes.
Differential Revision: https://reviews.llvm.org/D30003
llvm-svn: 295236
|
|
|
|
|
|
| |
Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow.
llvm-svn: 295235
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They are register promoted by ISel and so it makes no sense to treat them as
memory.
Inserting calls to the thread sanitizer would also generate invalid IR.
You would hit:
"swifterror value can only be loaded and stored from, or as a swifterror
argument!"
llvm-svn: 295230
|
|
|
|
|
|
|
|
|
|
|
| |
am_ldrlit diverged from am_brcond in r207105, but kept the OtherVT
operand type. It made sense for branch targets, as those are
represented as MVT::Other in SDAG. But loads operate on pointers.
This shouldn't have an observable effect on any in-tree code, but helps
make the patterns consistent for external users.
llvm-svn: 295229
|
|
|
|
| |
llvm-svn: 295228
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add a field to LTO::Config, CGFileType, to select the file type to emit (object
or assembly). This is useful for testing and to implement -save-temps.
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: mehdi_amini
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D29475
llvm-svn: 295226
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:
A B
|\ /|
| \ / |
| X |
| / \ |
|/ \|
C D
would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.
because of this we can tail duplicate to extend existing trellises.
As an example consider the following CFG:
B D F H
/ \ / \ / \ / \
A---C---E---G---Ret
Where A,C,E,G are all small (Currently 2 instructions).
The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.
The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret
define void @straight_test(i32 %tag) {
entry:
br label %test1
test1: ; A
%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
call void @a()
br label %test2
test2: ; C
%tagbit2 = and i32 %tag, 2
%tagbit2eq0 = icmp eq i32 %tagbit2, 0
br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
call void @b()
br label %test3
test3: ; E
%tagbit3 = and i32 %tag, 4
%tagbit3eq0 = icmp eq i32 %tagbit3, 0
br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
call void @c()
br label %test4
test4: ; G
%tagbit4 = and i32 %tag, 8
%tagbit4eq0 = icmp eq i32 %tagbit4, 0
br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
call void @d()
br label %exit
exit:
ret void
}
here is the layout after D27742:
straight_test: # @straight_test
; ... Prologue elided
; BB#0: # %entry ; A (merged with test1)
; ... More prologue elided
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_2
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_3
b .LBB0_4
.LBB0_2: # %optional1 ; B (copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_4
.LBB0_3: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_5
b .LBB0_6
.LBB0_4: # %optional2 ; D (copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_5: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
b .LBB0_7
.LBB0_6: # %optional3 ; F (copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit ; Ret
ld 30, 96(1) # 8-byte Folded Reload
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr
The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.
This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.
Here is the resulting concrete layout:
straight_test: # @straight_test
; BB#0: # %entry ; A (merged with test1)
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_4
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_5
.LBB0_2: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_3: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
bne 0, .LBB0_7
b .LBB0_8
.LBB0_4: # %optional1 ; B (Copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_2
.LBB0_5: # %optional2 ; D (Copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_3
.LBB0_6: # %optional3 ; F (Copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit
Differential Revision: https://reviews.llvm.org/D28522
llvm-svn: 295223
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D29975
llvm-svn: 295220
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They are register promoted by ISel and so it makes no sense to treat them as
memory.
Inserting calls to the thread sanitizer would also generate invalid IR.
You would hit:
"swifterror value can only be loaded and stored from, or as a swifterror
argument!"
llvm-svn: 295215
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.
This fixes PR311956.
Differential Revision: https://reviews.llvm.org/D29961
llvm-svn: 295213
|
|
|
|
|
|
|
|
| |
shuffle combining
Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this.
llvm-svn: 295208
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.
In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.
Differential Revision: https://reviews.llvm.org/D29971
llvm-svn: 295206
|
|
|
|
|
|
|
|
|
| |
This reverts commit r294617.
We fail on an assert while trying to get a condition from an
unconditional branch.
llvm-svn: 295200
|
|
|
|
| |
llvm-svn: 295185
|
|
|
|
| |
llvm-svn: 295181
|
|
|
|
| |
llvm-svn: 295179
|