| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
If we have a callee cleanup convention, the callee is going to pop the
arguments off the stack, not push them on.
llvm-svn: 200566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC always places the 'this' parameter for a method first. The
implicit 'sret' pointer for methods always comes second. We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.
Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.
Fixes PR15768 with the corresponding change in Clang.
Reviewers: ributzka, majnemer
Differential Revision: http://llvm-reviews.chandlerc.com/D2663
llvm-svn: 200561
|
| |
|
|
|
|
| |
This instruction is only available on Mips64 cores that implement the MSA ASE.
llvm-svn: 200543
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.
I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.
llvm-svn: 200530
|
| |
|
|
|
|
| |
TSFlags. This greatly simplifies the switch statements in the disassembler tables and the code emitters.
llvm-svn: 200522
|
| |
|
|
|
|
| |
had special handling anyway and this enables a future patch.
llvm-svn: 200520
|
| |
|
|
|
|
| |
for VEX encoded instructions too. This allows 32-bit addressing to work in 64-bit mode.
llvm-svn: 200517
|
| |
|
|
|
|
| |
for VEX encoded instructions too. This allows 32-bit addressing to work in 64-bit mode.
llvm-svn: 200516
|
| |
|
|
|
|
|
|
|
|
| |
The entry block of a function starts with all the static allocas. The change
in r195513 splits the block before those allocas, which has the effect of
turning them into dynamic allocas. That breaks all sorts of things. Change to
split after the initial allocas, and also add a comment explaining why the
block is split.
llvm-svn: 200515
|
| |
|
|
| |
llvm-svn: 200509
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when the input is a concat_vectors and the insert replaces one of the
concat halves:
Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)
This can be seen with the following IR:
define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
%1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)
The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.
Using AVX, without the patch this generates two vinsertf128 instructions:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0
With the patch this is optimized into:
vinsertf128 $1, %xmm1, %ymm2, %ymm0
Patch by Robert Lougher.
llvm-svn: 200506
|
| |
|
|
|
|
| |
they're not legal.
llvm-svn: 200503
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
llvm-svn: 200502
|
| |
|
|
|
|
|
|
| |
This doesn't set errno, so this should be OK.
Also update the documentation to explicitly state
that errno are not set.
llvm-svn: 200501
|
| |
|
|
|
|
|
|
|
| |
These should end up (in ELF) as R_X86_64_32S relocs, not R_X86_64_32.
Kill the horrid and incomplete special case and FIXME in
EncodeInstruction() and set things up so it can infer the signedness
from the ImmType just like it can the size and whether it's PC-relative.
llvm-svn: 200495
|
| |
|
|
|
|
| |
v8i16, v16i8 types.
llvm-svn: 200491
|
| |
|
|
|
|
| |
.secidx target
llvm-svn: 200490
|
| |
|
|
|
|
|
|
|
| |
COFF has only one symbol table.
MachO has a LC_DYSYMTAB, but that is not a symbol table, just extra info about
the one symbol table (LC_SYMTAB).
IR (coming soon) also has only one table.
llvm-svn: 200488
|
| |
|
|
|
|
|
|
|
|
| |
stackmap/patchpoint intrinsic.
Re-applying the patch, but this time without using AsmPrinter methods.
Reviewed by Andy
llvm-svn: 200481
|
| |
|
|
|
|
| |
Broken in r200388.
llvm-svn: 200466
|
| |
|
|
| |
llvm-svn: 200465
|
| |
|
|
| |
llvm-svn: 200461
|
| |
|
|
|
|
| |
SSE. Use predicates instead.
llvm-svn: 200458
|
| |
|
|
| |
llvm-svn: 200455
|
| |
|
|
|
|
|
|
| |
The SWAP instruction only exists in a 32-bit variant, but the 64-bit
atomic swap can be implemented in terms of CASX, like the other atomic
rmw primitives.
llvm-svn: 200453
|
| |
|
|
|
|
|
|
|
|
| |
The .object_arch directive indicates an alternative architecture to be specified
in the object file. The directive does *not* effect the enabled feature bits
for the object file generation. This is particularly useful when the code
performs runtime detection and would like to indicate a lower architecture as
the requirements than the actual instructions used.
llvm-svn: 200451
|
| |
|
|
|
|
|
|
| |
.movsp is an ARM unwinding directive that indicates to the unwinder that a
register contains an offset from the current stack pointer. If the offset is
unspecified, it defaults to zero.
llvm-svn: 200449
|
| |
|
|
|
|
|
|
|
|
|
| |
This enhances the ARMAsmParser to handle .tlsdescseq directives. This is a
slightly special relocation. We must be able to generate them, but not consume
them in assembly. The relocation is meant to assist the linker in generating a
TLS descriptor sequence. The ELF target streamer is enhanced to append
additional fixups into the current segment and that is used to emit the new
R_ARM_TLS_DESCSEQ relocations.
llvm-svn: 200448
|
| |
|
|
|
|
|
|
| |
Add support for tlsdesc relocations which are part of the ABI, marked as
experimental. These relocations permit the linker to perform TLS reference
optimizations.
llvm-svn: 200447
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for TLS CALL relocations. TLS CALL relocations are used to
indicate to the linker to generate appropriate entries to resolve TLS references
via an appropriate function invocation (e.g. __tls_get_addr(PLT)).
In order to accomodate the linker relaxation of the TLS access model for the
references (GD/LD -> IE, IE -> LE), the relocation addend must be incomplete.
This requires that the partial inplace value is also incomplete (i.e. 0). We
simply avoid the offset value calculation at the time of the fixup adjustment in
the ARM assembler backend.
llvm-svn: 200446
|
| |
|
|
|
|
|
|
| |
stackmap/patchpoint intrinsic."
This reverts commit r200444 to unbreak buildbots.
llvm-svn: 200445
|
| |
|
|
|
|
|
|
| |
stackmap/patchpoint intrinsic.
Reviewed by Andy
llvm-svn: 200444
|
| |
|
|
|
|
|
|
|
|
|
|
| |
None of the object file formats reported error on iterator increment. In
retrospect, that is not too surprising: no object format stores symbols or
sections in a linked list or other structure that requires chasing pointers.
As a consequence, all error checking can be done on begin() and end().
This reduces the text segment of bin/llvm-readobj in my machine from 521233 to
518526 bytes.
llvm-svn: 200442
|
| |
|
|
|
|
|
|
| |
triple'
This incorporates a couple of fixes reviewed at http://llvm-reviews.chandlerc.com/D2651
llvm-svn: 200440
|
| |
|
|
| |
llvm-svn: 200434
|
| |
|
|
|
|
|
|
| |
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
llvm-svn: 200431
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)
An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.
llvm-svn: 200428
|
| |
|
|
|
|
| |
when we create the subprogram DIE.
llvm-svn: 200426
|
| |
|
|
|
|
|
|
| |
are relative to in the compile unit. Currently let's just use 0...
Thanks to Greg Clayton for the catch!
llvm-svn: 200425
|
| |
|
|
|
|
|
|
| |
module since there's no range guarantee that we could make given
output order. This also fixes up the testcases that have multiple
CUs to have the correct range offset.
llvm-svn: 200422
|
| |
|
|
|
|
| |
output ordering.
llvm-svn: 200421
|
| |
|
|
| |
llvm-svn: 200420
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The linux kernel makes uses of a GAS `feature' which substitutes nothing
for macro arguments which aren't specified.
Proper support for these kind of macro arguments necessitated a cleanup of
differences between `GAS' and `Darwin' dialect macro processing.
Differential Revision: http://llvm-reviews.chandlerc.com/D2634
llvm-svn: 200409
|
| |
|
|
|
|
|
|
|
| |
This can still be overridden by explicitly setting a value requirement on the
alias option, but by default it should be the same.
PR18649
llvm-svn: 200407
|
| |
|
|
|
|
|
|
|
| |
Also replaces testcase for r180790 (support for absolute non-externs relocs)
with a more robust version.
<rdar://problem/15864721>
llvm-svn: 200404
|
| |
|
|
| |
llvm-svn: 200403
|
| |
|
|
| |
llvm-svn: 200401
|
| |
|
|
|
|
|
| |
This instruction is only available on Mips64 cores
that implement the MSA ASE.
llvm-svn: 200400
|
| |
|
|
|
|
|
| |
These instructions are only available on Mips64 cores
that implement the MSA ASE.
llvm-svn: 200398
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
preserve loop simplify of enclosing loops.
The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.
The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.
Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.
Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.
llvm-svn: 200393
|