| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 248933
|
| |
|
|
|
|
|
|
| |
Previously, the index was constrained to the size of the memory operation for
no apparent reason. This change removes that constraint so that we can form
pre-index instructions with any valid offset.
llvm-svn: 248931
|
| |
|
|
|
|
|
|
|
|
| |
Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.
llvm-svn: 248927
|
| |
|
|
|
|
|
| |
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.
llvm-svn: 248924
|
| |
|
|
|
|
| |
This is an addition to rL248917.
llvm-svn: 248923
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.
In particular, there were no tests for TrustZone being supported in these
architectures.
The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.
Differential Revision: http://reviews.llvm.org/D13236
llvm-svn: 248921
|
| |
|
|
|
|
|
|
|
|
|
| |
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).
llvm-svn: 248917
|
| |
|
|
| |
llvm-svn: 248914
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.
Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.
We may end up with byte shuffles with constant masks as the result of inlining.
Differential Revision: http://reviews.llvm.org/D13252
llvm-svn: 248913
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When building a plugin against an installed LLVM toolchain using
add_llvm_loadable_module (in the documented manner) doesn't work as nothing sets
the *_OUTPUT_INTDIR variables causing an error when set_output_directory is
called. Making those arguments optional (causing the default output directory
to be used) fixes this.
Differential Revision: http://reviews.llvm.org/D13215
llvm-svn: 248911
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As per Duncan's review for D12536, I extracted the sub-byte bit aligned
reading and writing code into lib/Support, and generalized it. Added calls from
BackpatchWord. Also added unittests.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13189
llvm-svn: 248897
|
| |
|
|
|
|
|
|
| |
Reviewed By: reames, hfinkel
Differential Revision: http://reviews.llvm.org/D12958
llvm-svn: 248892
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,
<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)
to
<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)
This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.
Differential Revision: http://reviews.llvm.org/D12985
llvm-svn: 248887
|
| |
|
|
|
|
|
|
|
|
|
| |
When using LLVMConfig.cmake from an installed toolchain in order to build a
loadable pass using add_llvm_loadable_module LLVM_ENABLE_PLUGINS and
LLVM_PLUGIN_EXT must be set. Also make LLVM_DEFINITIONS be set to what it
actually is.
Differential Revision: http://reviews.llvm.org/D13214
llvm-svn: 248884
|
| |
|
|
|
|
|
|
|
|
|
|
| |
shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.
Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.
Differential Revision: http://reviews.llvm.org/D8690
llvm-svn: 248878
|
| |
|
|
| |
llvm-svn: 248872
|
| |
|
|
|
|
| |
http://reviews.llvm.org/D13145
llvm-svn: 248870
|
| |
|
|
|
|
| |
Support hierarachical sample profile format.
llvm-svn: 248865
|
| |
|
|
| |
llvm-svn: 248863
|
| |
|
|
| |
llvm-svn: 248859
|
| |
|
|
|
|
|
|
|
|
| |
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.
This is a candidate for stable llvm 3.7.
Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
|
| |
|
|
|
|
|
| |
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.
llvm-svn: 248857
|
| |
|
|
|
|
|
|
|
|
| |
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow. However, we'd insert code to
initialize the return address after stack adjustments. Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.
llvm-svn: 248839
|
| |
|
|
|
|
|
|
|
| |
Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.
Patch by Betul Buyukkurt, with minor modifications.
llvm-svn: 248833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.
In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.
When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.
One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)
Differential Revision: http://reviews.llvm.org/D12681
llvm-svn: 248832
|
| |
|
|
| |
llvm-svn: 248827
|
| |
|
|
| |
llvm-svn: 248825
|
| |
|
|
|
|
|
|
|
|
|
| |
Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.
Differential Revision: http://reviews.llvm.org/D13261
llvm-svn: 248824
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PDB files have a lot of noise in them, with hundreds (or thousands)
of symbols from system libraries and compiler generated types. If
you're only looking for a specific type, this can be problematic.
This CL allows you to display *only* types, variables, or compilands
matching a particular pattern. These filters can even be combined
with exclude filters. Include-only filters are given priority, so
that first the set of items to display is limited only to those that
match the include filters, and then the set of exclude filters is
applied to those. If there are no include filters specified, then
it means "display everything".
llvm-svn: 248822
|
| |
|
|
|
|
| |
first letter into upper case. NFC.
llvm-svn: 248821
|
| |
|
|
|
|
| |
Change lookup functions to const functions.
llvm-svn: 248818
|
| |
|
|
| |
llvm-svn: 248817
|
| |
|
|
| |
llvm-svn: 248814
|
| |
|
|
|
|
|
|
|
|
| |
directories; other minor cleanups.
Patch by Eugene Zelenko!
Differential Revision: http://reviews.llvm.org/D13172
llvm-svn: 248811
|
| |
|
|
|
|
| |
Change lookup functions to const functions.
llvm-svn: 248810
|
| |
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D13191
Back end portion of the fifth round of additions to altivec.h.
llvm-svn: 248809
|
| |
|
|
|
|
|
|
|
| |
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored. This change gets
us one step closer to forming LDPSW instructions. This change also enables
pre- and post-indexing for halfword and byte loads and stores.
llvm-svn: 248804
|
| |
|
|
|
|
|
|
|
|
| |
thresholds
On some of our benchmarks this change shows about 50% compile time improvement without any noticeable performance difference.
Differential Revision: http://reviews.llvm.org/D13248
llvm-svn: 248801
|
| |
|
|
| |
llvm-svn: 248800
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently LLVM_COMPILER_IS_GCC_COMPATIBLE is set as a side-effect of determining
the stdlib to use in HandleLLVMStdlib, which causes problems when attempting to
use AddLLVM from an installed LLVM toolchain, as HandleLLVMStdlib is not used.
Move the setting of this variable into DetermineGCCCompatible and include that
from both AddLLVM and HandleLLVMStdlib.
Differential Revision: http://reviews.llvm.org/D13216
llvm-svn: 248798
|
| |
|
|
|
|
|
|
| |
If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.
llvm-svn: 248796
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.
Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have
%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)
the x86 back-end emits:
movaps 32(%ecx), %xmm2
movaps (%ecx), %xmm0
movaps 16(%ecx), %xmm1
movaps 48(%ecx), %xmm3
subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards
movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl $0, 16(%esp)
calll __bar
To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:
subl $32, %esp
movaps %xmm3, (%esp)
Differential Revision: http://reviews.llvm.org/D12337
llvm-svn: 248786
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.
This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).
This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.
Differential Revision: http://reviews.llvm.org/D12935
llvm-svn: 248784
|
| |
|
|
| |
llvm-svn: 248783
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.
Reviewers: broune, silvas, dnovillo, reames
Subscribers: davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D11605
llvm-svn: 248777
|
| |
|
|
|
|
| |
It is described in LLVMBuild.txt.
llvm-svn: 248771
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Place new and update dbg.declare calls immediately after the
corresponding alloca.
Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).
llvm-svn: 248769
|
| |
|
|
|
|
|
| |
There are always more physical registers and register units so the
previous behaviour was correct but we can do with less memory.
llvm-svn: 248767
|
| |
|
|
|
|
|
| |
Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.
llvm-svn: 248763
|
| |
|
|
| |
llvm-svn: 248754
|