| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
In particular, this means we emit non-external symbols defined to
variables, such as aliases or absolute addresses.
This is needed to implement /safeseh, and it appears there was some
confusion about what symbols to emit previously.
llvm-svn: 190888
|
| |
|
|
| |
llvm-svn: 190887
|
| |
|
|
| |
llvm-svn: 190886
|
| |
|
|
|
|
|
|
|
|
| |
Some of this code is no longer necessary since int<->ptr casts are no
longer occur as of r187444.
This also fixes handling vectors of pointers, and adds a bunch of new
testcases for vectors and address spaces.
llvm-svn: 190885
|
| |
|
|
|
|
|
|
| |
Documenting a design choice to generate only medium model sequences for TLS
addresses at this time. Small and large code models could be supported if
necessary.
llvm-svn: 190883
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing. This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two
test cases are also modified to reflect this requirement.
Fast-isel was not creating correct code for loading floating-point constants
using large code model. This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model. One test case is modified to reflect this requirement.
llvm-svn: 190882
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
|
| |
|
|
|
|
|
|
|
| |
We can't insert an insertelement after an invoke. We would have to split a
critical edge. So when we see a phi node that uses an invoke we just give up.
radar://14990770
llvm-svn: 190871
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other in memory.
The motivation was to get rid of truncate and shift right instructions that get
in the way of paired load or floating point load.
E.g.,
Consider the following example:
struct Complex {
float real;
float imm;
};
When accessing a complex, llvm was generating a 64-bits load and the imm field
was obtained by a trunc(lshr) sequence, resulting in poor code generation, at
least for x86.
The idea is to declare that two load instructions is the canonical form for
loading two arithmetic type, which are next to each other in memory.
Two scalar loads at a constant offset from each other are pretty
easy to detect for the sorts of passes that like to mess with loads.
<rdar://problem/14477220>
llvm-svn: 190870
|
| |
|
|
| |
llvm-svn: 190869
|
| |
|
|
| |
llvm-svn: 190866
|
| |
|
|
|
|
|
|
| |
Add llvm.x86.* intrinsics for all of the Intel SHA Extensions instructions, as
well as tests. Also remove mayLoad and hasSideEffects, which can be inferred
from the instruction patterns.
llvm-svn: 190864
|
| |
|
|
|
|
| |
10%-20% speedup for use-after-return
llvm-svn: 190863
|
| |
|
|
| |
llvm-svn: 190862
|
| |
|
|
|
|
|
|
|
|
|
| |
Wrong cast operation.
MergeFunctions emits Bitcast instead of pointer-to-integer operation.
Patch fixes MergeFunctions::writeThunk function. It replaces
unconditional Bitcast creation with "Value* createCast(...)" method, that
checks operand types and selects proper instruction.
See unit-test as example.
llvm-svn: 190859
|
| |
|
|
| |
llvm-svn: 190851
|
| |
|
|
| |
llvm-svn: 190850
|
| |
|
|
| |
llvm-svn: 190849
|
| |
|
|
| |
llvm-svn: 190839
|
| |
|
|
|
|
|
|
|
|
|
| |
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.
<rdar://problem/14989896>
llvm-svn: 190830
|
| |
|
|
|
|
|
| |
A DBG_VALUE is register-indirect iff the first operand is a register
_and_ the second operand is an immediate.
llvm-svn: 190821
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If there are no legal integers, assume 1 byte.
This makes more sense than using the pointer size as
a guess for the maximum GPR width.
It is conceivable to want to use some 64-bit pointers
on a target where 64-bit integers aren't legal.
llvm-svn: 190817
|
| |
|
|
| |
llvm-svn: 190813
|
| |
|
|
|
|
|
|
|
| |
Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which
is a nop on PPC64. This is needed to keep the register class system
happy, but on the fast-isel path it is not removed before emit as it
is for DAG select. Ignore this op when emitting instructions.
llvm-svn: 190795
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We would have to compute the pre increment value, either by computing it on
every loop iteration or by splitting the edge out of the loop and inserting a
computation for it there.
For now, just give up vectorizing such loops.
Fixes PR17179.
llvm-svn: 190790
|
| |
|
|
| |
llvm-svn: 190782
|
| |
|
|
|
|
| |
Test cases are added.
llvm-svn: 190780
|
| |
|
|
| |
llvm-svn: 190779
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The port originally had special patterns for extload, mapping them to the
same instructions as sextload. It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".
This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.
llvm-svn: 190777
|
| |
|
|
| |
llvm-svn: 190775
|
| |
|
|
|
|
|
|
|
| |
Previous discussion:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html
Differential Revision: http://llvm-reviews.chandlerc.com/D1191
llvm-svn: 190773
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190771
|
| |
|
|
| |
llvm-svn: 190770
|
| |
|
|
| |
llvm-svn: 190769
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The '?' flag uses the last section group if the last had a section
group. We treat combining an explicit section group and the '?' as a
hard error.
This fixes PR17198.
Reviewers: rafael, bkramer
Reviewed By: bkramer
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1686
llvm-svn: 190768
|
| |
|
|
|
|
|
|
|
|
|
| |
For alignment purposes, the instruction array will always have an even
number of entries, with the final entry potentially unused (in which
case the array will be one longer than indicated by the count of unwind
codes field).
Reviewed by Anton Korobeynikov, Charles Davis and Nico Rieck.
llvm-svn: 190767
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
data structures.
The Win64 EH data structures must be of type IMAGE_REL_AMD64_ADDR32NB
instead of IMAGE_REL_AMD64_ADDR32. This is easiely achieved by adding
the VK_COFF_IMGREL32 modifier to the symbol reference.
Change also references to start and end of the SEH range of a function
as offsets to start of the function.
Reviewed by Jim Grosbach, Charles Davis and Nico Rieck.
llvm-svn: 190766
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is causing test-suite failures.
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190765
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190764
|
| |
|
|
|
|
|
| |
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
|
| |
|
|
|
|
|
| |
so it can be better used for general interoperability testing between mips32
and mips16.
llvm-svn: 190762
|
| |
|
|
| |
llvm-svn: 190759
|
| |
|
|
|
|
|
| |
Also assembly/disassembly tests, and for sha256rnds2, aliases with an explicit
xmm0 dependency.
llvm-svn: 190754
|
| |
|
|
| |
llvm-svn: 190750
|
| |
|
|
| |
llvm-svn: 190749
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass was based on the previous (essentially unused) profiling
infrastructure and the assumption that by ordering the basic blocks at
the IR level in a particular way, the correct layout would happen in the
end. This sometimes worked, and mostly didn't. It also was a really
naive implementation of the classical paper that dates from when branch
predictors were primarily directional and when loop structure wasn't
commonly available. It also didn't factor into the equation
non-fallthrough branches and other machine level details.
Anyways, for all of these reasons and more, I wrote
MachineBlockPlacement, which completely supercedes this pass. It both
uses modern profile information infrastructure, and actually works. =]
llvm-svn: 190748
|
| |
|
|
| |
llvm-svn: 190746
|
| |
|
|
| |
llvm-svn: 190745
|
| |
|
|
| |
llvm-svn: 190744
|
| |
|
|
|
|
|
|
|
|
| |
This was somewhat tricky because ~PrettyStackTraceEntry() may run after
llvm_shutdown() has been called. This is rare and only happens for a common idiom
used in the main() functions of command-line tools. This works around the idiom by
skipping the stack clean-up if the PrettyStackTraceHead ManagedStatic is not
constructed (i.e. llvm_shutdown() has been called).
llvm-svn: 190730
|