| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
| |
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.
llvm-svn: 248857
|
| |
|
|
|
|
|
|
|
|
| |
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow. However, we'd insert code to
initialize the return address after stack adjustments. Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.
llvm-svn: 248839
|
| |
|
|
|
|
|
|
|
| |
Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.
Patch by Betul Buyukkurt, with minor modifications.
llvm-svn: 248833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.
In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.
When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.
One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)
Differential Revision: http://reviews.llvm.org/D12681
llvm-svn: 248832
|
| |
|
|
| |
llvm-svn: 248825
|
| |
|
|
|
|
|
|
|
|
|
| |
Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.
Differential Revision: http://reviews.llvm.org/D13261
llvm-svn: 248824
|
| |
|
|
|
|
| |
first letter into upper case. NFC.
llvm-svn: 248821
|
| |
|
|
|
|
| |
Change lookup functions to const functions.
llvm-svn: 248818
|
| |
|
|
| |
llvm-svn: 248817
|
| |
|
|
| |
llvm-svn: 248814
|
| |
|
|
|
|
| |
Change lookup functions to const functions.
llvm-svn: 248810
|
| |
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D13191
Back end portion of the fifth round of additions to altivec.h.
llvm-svn: 248809
|
| |
|
|
|
|
|
|
|
| |
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored. This change gets
us one step closer to forming LDPSW instructions. This change also enables
pre- and post-indexing for halfword and byte loads and stores.
llvm-svn: 248804
|
| |
|
|
|
|
|
|
|
|
| |
thresholds
On some of our benchmarks this change shows about 50% compile time improvement without any noticeable performance difference.
Differential Revision: http://reviews.llvm.org/D13248
llvm-svn: 248801
|
| |
|
|
| |
llvm-svn: 248800
|
| |
|
|
|
|
|
|
| |
If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.
llvm-svn: 248796
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.
Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have
%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)
the x86 back-end emits:
movaps 32(%ecx), %xmm2
movaps (%ecx), %xmm0
movaps 16(%ecx), %xmm1
movaps 48(%ecx), %xmm3
subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards
movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl $0, 16(%esp)
calll __bar
To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:
subl $32, %esp
movaps %xmm3, (%esp)
Differential Revision: http://reviews.llvm.org/D12337
llvm-svn: 248786
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.
This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).
This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.
Differential Revision: http://reviews.llvm.org/D12935
llvm-svn: 248784
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.
Reviewers: broune, silvas, dnovillo, reames
Subscribers: davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D11605
llvm-svn: 248777
|
| |
|
|
|
|
| |
It is described in LLVMBuild.txt.
llvm-svn: 248771
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Place new and update dbg.declare calls immediately after the
corresponding alloca.
Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).
llvm-svn: 248769
|
| |
|
|
|
|
|
| |
There are always more physical registers and register units so the
previous behaviour was correct but we can do with less memory.
llvm-svn: 248767
|
| |
|
|
|
|
|
| |
Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.
llvm-svn: 248763
|
| |
|
|
| |
llvm-svn: 248754
|
| |
|
|
| |
llvm-svn: 248750
|
| |
|
|
| |
llvm-svn: 248746
|
| |
|
|
| |
llvm-svn: 248745
|
| |
|
|
|
|
|
|
|
|
|
|
| |
`ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the
operand type of the comparison it is given to an `IntegerType`. This is
incorrect because it could actually be simplifying a comparison between
two pointers. Switch it to using `getTypeSizeInBits` instead, which
does the right thing for both pointers and integers.
Fixed PR24956.
llvm-svn: 248743
|
| |
|
|
| |
llvm-svn: 248742
|
| |
|
|
|
|
|
|
| |
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.
llvm-svn: 248741
|
| |
|
|
| |
llvm-svn: 248740
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.
Test will be included in a following commit
which fixes a related issue.
llvm-svn: 248739
|
| |
|
|
|
|
|
|
|
|
| |
mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)
Differential Revision: http://reviews.llvm.org/D11370
llvm-svn: 248735
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch by Jake VanAdrighem!
Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.
This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.
Reviewers: silvas
Subscribers: silvas, llvm-commits
Differential Revision: http://reviews.llvm.org/D12851
llvm-svn: 248728
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Use a worklist, not a recursive approach, to avoid needless
revisitation and being repeatedly forced to jump back to the
start of the BB if a handle is invalidated.
2. Only insert operands to the worklist if they become unused
after a dead instruction is removed, so we don’t have to
visit them again in most cases.
3. Use a SmallSetVector to track the worklist.
4. Instead of pre-initting the SmallSetVector like in
DeadCodeEliminationPass, only put things into the worklist
if they have to be revisited after the first run-through.
This minimizes how much the actual SmallSetVector gets used,
which saves a lot of time.
llvm-svn: 248727
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The P5600 is an out-of-order, superscalar implementation of the MIPS32R5
architecture.
The scheduler has a few missing details (see the 'Tricky Instructions'
section and some quirks of the P5600 are deliberately omitted due to
implementation difficulty and low chance of significant benefit (e.g. the
predicate on P5600WriteEitherALU). However, testing on SingleSource is
showing significant performance benefits on some apps (seven in the 10-30%
range) and only one significant regression (12%) when
-pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the
results are more variable. Some do even better (up to 55% improvement) but
increased numbers of copies are slowing others down (up to 12%).
Overall, the scheduler as it currently stands is a 2.4% win with
-pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize.
I'm sure we can improve on this further.
For completeness, the FPGA this was tested on shows some failures with and
without the P5600 scheduler. These appear to be scheduling related since
the two test runs have fairly different sets of failing tests even after
accounting for other factors (e.g. spurious connection failures) however
it's not P5600 specific since we also get some for the generic scheduler.
Reviewers: vkalintiris
Subscribers: mpf, llvm-commits, atrick, vkalintiris
Differential Revision: http://reviews.llvm.org/D12193
llvm-svn: 248725
|
| |
|
|
|
|
|
|
| |
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D12853
llvm-svn: 248721
|
| |
|
|
|
|
|
|
|
|
| |
This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place.
At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense?
Differential Revision: http://reviews.llvm.org/D13074
llvm-svn: 248719
|
| |
|
|
|
|
|
|
|
| |
Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.
Differential Revision: http://reviews.llvm.org/D13150
llvm-svn: 248718
|
| |
|
|
| |
llvm-svn: 248716
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D10539
llvm-svn: 248706
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.
Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.
NFC.
llvm-svn: 248703
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.
Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.
llvm-svn: 248698
|
| |
|
|
| |
llvm-svn: 248693
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When llvm declarations have argument names, it's helpful to actually
print those names when debugging. Arguably, it'd be nice to print them
all the time, but that would mean the IR we output wouldn't round trip
through bitcode, which doesn't store the names.
Make the varous print() methods in AsmWriter optionally print "for
debug" and set that flag in the dump() methods. The only thing this
does differently for now is print the argument names in declarations.
llvm-svn: 248692
|
| |
|
|
| |
llvm-svn: 248691
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Before this change `HasSameValue` would return true for distinct
`alloca` instructions if they happened to be allocating the same
type (`alloca` instructions are not specified as reading memory). This
change adds an explicit whitelist of instruction types for which
"identical" instructions compute the same value.
Fixes PR24952.
llvm-svn: 248690
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766
We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:
#include <stdbool.h>
bool switchy(char x1, char x2, char condition) {
bool conditionMet = false;
switch (condition) {
case 0: conditionMet = (x1 == x2); break;
case 1: conditionMet = (x1 <= x2); break;
}
return conditionMet;
}
bool switchy2(char x1, char x2, char condition) {
switch (condition) {
case 0: return (x1 == x2);
case 1: return (x1 <= x2);
}
return false;
}
As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.
Differential Revision: http://reviews.llvm.org/D12866
llvm-svn: 248689
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.
Reviewers: andrew.w.kaylor, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13152
llvm-svn: 248677
|
| |
|
|
|
|
|
|
|
|
|
| |
llvm::format compiles down to snprintf which has no defined rounding for
floating point arguments, and MSVC has implemented it differently from
what the BSD libcs and glibc do. Try to emulate the glibc rounding
behavior to avoid changing tests.
While there simplify code a bit and move trivial methods inline.
llvm-svn: 248665
|