| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Extend r187495 to conditional loads. I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.
llvm-svn: 187496
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
llvm-svn: 187495
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare). It turns out that even
this is a bit too early. Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed. They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).
Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch. This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.
I've added a test for the AnalyzeBranch problem. A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.
llvm-svn: 187494
|
| |
|
|
| |
llvm-svn: 187493
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source. I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead. The AND IMMEDIATE form is shorter
and is less likely to be cracked.
This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register. The patch uses the z196 instruction RISBLG for this instead.
This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now. Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.
llvm-svn: 187492
|
| |
|
|
|
|
|
|
|
| |
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.
llvm-svn: 187491
|
| |
|
|
|
|
| |
The next patch will make use of RISBLG for codegen.
llvm-svn: 187490
|
| |
|
|
| |
llvm-svn: 187477
|
| |
|
|
|
|
|
|
| |
Intel X86 assembler syntax.
Patch by Richard Mitton.
llvm-svn: 187476
|
| |
|
|
| |
llvm-svn: 187472
|
| |
|
|
|
|
| |
Patch by Richard Mitton.
llvm-svn: 187471
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 187469
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 187468
|
| |
|
|
| |
llvm-svn: 187443
|
| |
|
|
|
|
| |
turns "bal" into "bgezal".
llvm-svn: 187440
|
| |
|
|
|
|
|
|
| |
register i7 as a live-in if current function's return address is taken.
This revision fixes PR16269.
llvm-svn: 187433
|
| |
|
|
| |
llvm-svn: 187421
|
| |
|
|
|
|
| |
instructions have been added to test files.
llvm-svn: 187410
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
|
| |
|
|
| |
llvm-svn: 187402
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first of many upcoming patches for PowerPC fast
instruction selection support. This patch implements the minimum
necessary for a functional (but extremely limited) FastISel pass. It
allows the table-generated portions of the selector to be created and
used, but in most cases selection will fall back to the DAG selector.
None of the block terminator instructions are implemented yet, and
most interesting instructions require some special handling.
Therefore there aren't any new test cases with this patch. There will
be quite a few tests coming with future patches.
This patch adds the make/CMake support for the new code (including
tablegen -gen-fast-isel) and creates the FastISel object for PPC64 ELF
only. It instantiates the necessary virtual functions
(TargetSelectInstruction, TargetMaterializeConstant,
TargetMaterializeAlloca, tryToFoldLoadIntoMI, and FastLowerArguments),
but of these, only TargetMaterializeConstant contains any useful
implementation. This is present since the table-generated code
requires the ability to materialize integer constants for some
instructions.
This patch has been tested by building and running the
projects/test-suite code with -O0. All tests passed with the
exception of a couple of long-running tests that time out using -O0
code generation.
llvm-svn: 187399
|
| |
|
|
|
|
|
| |
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.
llvm-svn: 187397
|
| |
|
|
| |
llvm-svn: 187371
|
| |
|
|
|
|
|
|
|
| |
32-bit symbols have "_" as global prefix, but when forming the name of
COMDAT sections this prefix is ignored. The current behavior assumes that
this prefix is always present which is not the case for 64-bit and names
are truncated.
llvm-svn: 187356
|
| |
|
|
|
|
|
| |
Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like
other 64-bit targets.
llvm-svn: 187355
|
| |
|
|
|
|
| |
patch also adds the VFP4 feature to Cortex-A15 and fixes the DontUseFusedMAC predicate so that we can still generate vmla.f32 instructions on non-darwin targets with VFP4.
llvm-svn: 187349
|
| |
|
|
| |
llvm-svn: 187348
|
| |
|
|
|
|
|
| |
Added 512-bit operands printing.
Added instruction formats for KNL instructions.
llvm-svn: 187324
|
| |
|
|
| |
llvm-svn: 187320
|
| |
|
|
| |
llvm-svn: 187319
|
| |
|
|
|
|
|
|
|
| |
The tests !defined(__ppc__) && !defined(__powerpc__) are not needed
or helpful when verifying that code is being compiled for a 64-bit
target. The simpler test provided by this revision is sufficient to
tell if the target is 64-bit.
llvm-svn: 187318
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
do in the SDag when lowering references to the GOT: use
ARMConstantPoolSymbol rather than creating a dummy global variable. The
computation of the alignment still feels weird (it uses IR types and
datalayout) but it preserves the exact previous behavior. This change
fixes the memory leak of the global variable detected on the valgrind
leak checking bot.
Thanks to Benjamin Kramer for pointing me at ARMConstantPoolSymbol to
handle this use case.
llvm-svn: 187303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
me) should start watching this bot more as its catching lots of bugs.
The fix here is to not construct the global if we aren't going to need
it. That's cheaper anyways, and globals have highly predictable types in
practice. I've added an assert to catch skew between our manual testing
of the type and the actual type just for paranoia's sake.
Note that this pattern is actually fine in most globals because when you
build a global with a module it automatically is moved to be owned by
that module. But here, we're in isel and don't really want to do that.
The solution of not creating a global is simpler anyways.
llvm-svn: 187302
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
than once, and the second time through we leaked memory. Found thanks to
the vg-leak bot, but I can't locally reproduce it with valgrind. The
debugger confirms that it is in fact leaking here.
This whole code is totally gross. Why is initialize being called on each
runOnFunction??? Why aren't these OwningPtr<>s, and why aren't their
lifetimes better defined? Anyways, this is just a surgical change to
help out the leak checking bots.
llvm-svn: 187299
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
|
| |
|
|
|
|
| |
This reverts commit r187248. It broke many bots.
llvm-svn: 187254
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for
all PowerPC targets, whether 32- or 64-bit. They will both implicitly
define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not
for 32-bit targets. We cannot be sure that all other possible
compilers used to compile Clang/LLVM define both __ppc__ and
__powerpc__, for example, so it is best to check for both when relying
on either inside the Clang/LLVM code base.
This patch makes sure we always check for both variants. In addition,
it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp.
(At least one of __ppc__ and __powerpc__ should always be defined when
compiling for a PowerPC target, no matter which compiler is used, so
testing for them is unnecessary.)
There are some places in the compiler that check for other variants,
like __POWERPC__ and _POWER, and I have left those in place. There is
no need to add them elsewhere. This seems to be in Apple-specific
code, and I won't take a chance on breaking it.
There is no intended change in behavior; thus, no test cases are
added.
llvm-svn: 187248
|
| |
|
|
|
|
| |
Patch by Sasa Stankovic.
llvm-svn: 187244
|
| |
|
|
|
|
| |
register operands.
llvm-svn: 187242
|
| |
|
|
|
|
| |
operands.
llvm-svn: 187238
|
| |
|
|
| |
llvm-svn: 187234
|
| |
|
|
|
|
|
|
|
| |
to have register FCC0 (the first floating point condition code register) in
their Uses/Defs list.
No intended functionality change.
llvm-svn: 187233
|
| |
|
|
|
|
|
|
| |
needed. The generic method printOperand will do.
No functionality change.
llvm-svn: 187231
|
| |
|
|
|
|
|
|
|
|
| |
instructions "beqz", "bnez" and "move", when possible.
beq $2, $zero, $L1 => beqz $2, $L1
bne $2, $zero, $L1 => bnez $2, $L1
or $2, $3, $zero => move $2, $3
llvm-svn: 187229
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit 187198. It broke the bots.
The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".
llvm-svn: 187201
|
| |
|
|
|
|
|
|
|
|
|
|
| |
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
llvm-svn: 187198
|
| |
|
|
| |
llvm-svn: 187195
|
| |
|
|
| |
llvm-svn: 187193
|
| |
|
|
|
|
| |
in a subsequent patch.
llvm-svn: 187187
|