| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
The loop optimizers were assuming that scales > 1 were OK. I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2. Implementing
the hook for z means that z can no longer test any change there though.
llvm-svn: 187497
|
| |
|
|
|
|
|
|
| |
Extend r187495 to conditional loads. I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.
llvm-svn: 187496
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
llvm-svn: 187495
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare). It turns out that even
this is a bit too early. Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed. They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).
Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch. This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.
I've added a test for the AnalyzeBranch problem. A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.
llvm-svn: 187494
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source. I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead. The AND IMMEDIATE form is shorter
and is less likely to be cracked.
This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register. The patch uses the z196 instruction RISBLG for this instead.
This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now. Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.
llvm-svn: 187492
|
| |
|
|
|
|
|
|
|
| |
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.
llvm-svn: 187491
|
| |
|
|
|
|
|
|
| |
Intel X86 assembler syntax.
Patch by Richard Mitton.
llvm-svn: 187476
|
| |
|
|
| |
llvm-svn: 187442
|
| |
|
|
| |
llvm-svn: 187435
|
| |
|
|
| |
llvm-svn: 187421
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
|
| |
|
|
|
|
|
| |
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.
llvm-svn: 187397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
llvm-svn: 187396
|
| |
|
|
| |
llvm-svn: 187375
|
| |
|
|
| |
llvm-svn: 187362
|
| |
|
|
|
|
|
| |
Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like
other 64-bit targets.
llvm-svn: 187355
|
| |
|
|
|
|
| |
patch also adds the VFP4 feature to Cortex-A15 and fixes the DontUseFusedMAC predicate so that we can still generate vmla.f32 instructions on non-darwin targets with VFP4.
llvm-svn: 187349
|
| |
|
|
|
|
|
|
| |
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.
llvm-svn: 187285
|
| |
|
|
| |
llvm-svn: 187260
|
| |
|
|
|
|
| |
Patch by Sasa Stankovic.
llvm-svn: 187244
|
| |
|
|
|
|
|
|
| |
We used to call Verify before adding DICompileUnit to the list, and now we
remove the check and always add DICompileUnit to the list in DebugInfoFinder,
so we can verify them later on.
llvm-svn: 187237
|
| |
|
|
|
|
|
|
|
|
| |
instructions "beqz", "bnez" and "move", when possible.
beq $2, $zero, $L1 => beqz $2, $L1
bne $2, $zero, $L1 => bnez $2, $L1
or $2, $3, $zero => move $2, $3
llvm-svn: 187229
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit 187198. It broke the bots.
The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".
llvm-svn: 187201
|
| |
|
|
|
|
|
|
|
|
|
|
| |
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
llvm-svn: 187198
|
| |
|
|
|
|
|
| |
structure not just a pointer. This implements that and thus fixes va_copy
on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz!
llvm-svn: 187158
|
| |
|
|
|
|
|
| |
Make sure the context field of DIType is MDNode.
Fix testing cases to make them pass the verifier.
llvm-svn: 187150
|
| |
|
|
|
|
| |
Approval in here http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064169.html
llvm-svn: 187145
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous change to local live range allocation also suppressed
eviction of local ranges. In rare cases, this could result in more
expensive register choices. This commit actually revives a feature
that I added long ago: check if live ranges can be reassigned before
eviction. But now it only happens in rare cases of evicting a local
live range because another local live range wants a cheaper register.
The benefit is improved code size for some benchmarks on x86 and armv7.
I measured no significant compile time increase and performance
changes are noise.
llvm-svn: 187140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also avoid locals evicting locals just because they want a cheaper register.
Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.
A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.
Other beneficial side effects:
It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.
Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).
Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.
llvm-svn: 187139
|
| |
|
|
| |
llvm-svn: 187130
|
| |
|
|
|
|
| |
Better to have tests run even on non-AArch64 platforms.
llvm-svn: 187128
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them. This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value. This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.
No behavioral change intended, but it paves the way for a later patch.
llvm-svn: 187116
|
| |
|
|
| |
llvm-svn: 187113
|
| |
|
|
|
|
|
| |
As with the stores, these instructions can trap when the condition is false,
so they are only used for things like (cond ? x : *ptr).
llvm-svn: 187112
|
| |
|
|
|
|
|
|
| |
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.
llvm-svn: 187111
|
| |
|
|
|
|
|
|
| |
Make sure the context and type fields are MDNodes. We will generate
verification errors if those fields are non-empty strings.
Fix testing cases to make them pass the verifier.
llvm-svn: 187106
|
| |
|
|
|
|
|
|
| |
There's no need to specify a flag to omit frame pointer elimination on non-leaf
nodes...(Honestly, I can't parse that option out.) Use the function attribute
stuff instead.
llvm-svn: 187093
|
| |
|
|
| |
llvm-svn: 187083
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this patch, IfConverter may widen the cases where a sequence of
instructions were executed because of the way it uses nested predicates. This
result in incorrect execution.
For instance, Let A be a basic block that flows conditionally into B and B be a
predicated block.
B can be predicated with A.BrToBPredicate into A iff B.Predicate is less
"permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes
B.Predicate.
The IfConverter was checking the opposite: B.Predicate subsumes
A.BrToBPredicate.
<rdar://problem/14379453>
llvm-svn: 187071
|
| |
|
|
|
|
|
| |
Improve the Finder to handle context of a DIVariable used by DbgValueInst.
Fix testing cases to make them pass the verifier.
llvm-svn: 187052
|
| |
|
|
| |
llvm-svn: 187049
|
| |
|
|
| |
llvm-svn: 187016
|
| |
|
|
|
|
|
| |
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.
llvm-svn: 187007
|
| |
|
|
|
|
|
|
|
|
| |
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
|
| |
|
|
|
|
|
| |
Improve the Finder to handle context of a DIVariable.
If Scope is a DICompileUnit, add it to the list of CUs.
llvm-svn: 187003
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
llvm-svn: 186999
|
| |
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186923
|
| |
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186922
|
| |
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186921
|