| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.
Reviewers: qcolombet
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18525
llvm-svn: 265313
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tail call branch instruction might disappear
Bug Pattern:
# BB#0: # %entry
cmpldi 3, 0
beq- 0, .LBB0_2
# BB#1: # %exit
lwz 4, 0(3)
#TC_RETURNd8 LVComputationKind 0
.LBB0_2: # %cond.false
mflr 0
std 0, 16(1)
stdu 1, -96(1)
.Ltmp0:
.cfi_def_cfa_offset 96
.Ltmp1:
.cfi_offset lr, 16
bl __assert_fail
nop
The branch instruction for tail call return is not generated, because the
shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit
block is not sent to emitEpilogue, that's why the branch is not generated.
Thanks Kit's opinions!
Reviewers: nemanjai hfinkel tjablin kbarton
http://reviews.llvm.org/D17606
llvm-svn: 265112
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).
This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).
llvm-svn: 265047
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
|
| |
|
|
|
|
|
|
| |
http://reviews.llvm.org/D18097
Initial support does not include any patterns to generate this instructions
llvm-svn: 265031
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCSimplifyAddress contains this code:
IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
: Type::getInt64Ty(*Context));
to determine the type to be used for an index register, if one needs
to be created. However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address. This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.
Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64. Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.
llvm-svn: 265023
|
| |
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D18032
This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx
llvm-svn: 265022
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.
This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.
However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.
This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.
Differential Revision: http://reviews.llvm.org/D18605
llvm-svn: 265020
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:
t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
t16: i64 = srl t13, Constant:i32<32>
t17: i32 = truncate t16
t18: f32 = bitcast t17
t19: i32 = truncate t13
t20: f32 = bitcast t19
The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:
ld 3, 0(5)
stw 3, -8(1)
rldicl 3, 3, 32, 32
stw 3, -4(1)
lfs 3, -4(1)
lfs 0, -8(1)
into:
lfs 3, 4(5)
lfs 0, 0(5)
llvm-svn: 264988
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These checks are redundant and can be removed
Reviewers: hans
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D18564
llvm-svn: 264872
|
| |
|
|
|
|
|
| |
After the previous change, this can now be overridden centrally in the
pass.
llvm-svn: 264807
|
| |
|
|
|
|
|
|
|
| |
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.
llvm-svn: 264690
|
| |
|
|
|
|
|
| |
This should say that we could do unaligned vector loads on the P7 using VSX
instructions, not that we should.
llvm-svn: 264683
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.
I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
1. This allows us to return more accurate information via the TTI interface
(I recognize that this currently makes no practical difference)
2. Is hopefully easier to understand (it allows the core's features to match
its manual while still having the desired effect).
llvm-svn: 264600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the following altivec instructions:
- Decimal Convert From/to National/Zoned/Signed-QWord:
bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.
- Decimal Copy-Sign/Set-Sign:
bcdcpsgn. bcdsetsgn.
- Decimal Shift/Unsigned-Shift/Shift-and-Round:
bcds. bcdus. bcdsr.
- Decimal (Unsigned) Truncate:
bcdtrunc. bcdutrunc.
Total 13 instructions
Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D17838
llvm-svn: 264568
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min/max, reverse, permute, splat
This change implements the following vsx instructions:
- Scalar Insert/Extract
xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp
- Vector Insert/Extract
xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
xxextractuw xxinsertw
- Scalar/Vector Test Data Class
xststdcdp xststdcsp xststdcqp
xvtstdcdp xvtstdcsp
- Maximum/Minimum
xsmaxcdp xsmaxjdp
xsmincdp xsminjdp
- Vector Byte-Reverse/Permute/Splat
xxbrd xxbrh xxbrq xxbrw
xxperm xxpermr
xxspltib
30 instructions
Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D16842
llvm-svn: 264567
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements the following vsx instructions:
- quad-precision move
xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp
- quad-precision fp-arithmetic
xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)
22 instructions
Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
http://reviews.llvm.org/D16110
llvm-svn: 264565
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.
It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.
This fixes the latter problem and adds the FMAX/MINNUM mappings.
llvm-svn: 264532
|
| |
|
|
|
|
|
|
|
| |
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.
This fixes PR27083.
llvm-svn: 264508
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq
28 instructions
Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan
Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
|
| |
|
|
|
|
| |
making sure we give it a register and mark it as a register constraint.
llvm-svn: 264340
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D17711
It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.
llvm-svn: 264282
|
| |
|
|
|
|
|
| |
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.
llvm-svn: 264183
|
| |
|
|
| |
llvm-svn: 263775
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
llvm-svn: 263753
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D17600
llvm-svn: 263727
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
'__sync' libcalls and '__atomic' libcalls, and this function is for
the '__sync' ones.
- getInsertFencesForAtomic() has been replaced with
shouldInsertFencesForAtomic(Instruction), so that the decision can be
made per-instruction. This functionality will be used soon.
- emitLeadingFence/emitTrailingFence are no longer called if
shouldInsertFencesForAtomic returns false, and thus don't need to
check the condition themselves.
llvm-svn: 263665
|
| |
|
|
| |
llvm-svn: 263453
|
| |
|
|
| |
llvm-svn: 263448
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
llvm-svn: 263338
|
| |
|
|
|
|
|
| |
This has to be committed before the FE changes
Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.
This patch implements the following vsx instructions:
Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions
Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
|
| |
|
|
| |
llvm-svn: 262891
|
| |
|
|
| |
llvm-svn: 262573
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements the following vector operations:
- Vector Compare Not Equal
- vcmpneb(.) vcmpneh(.) vcmpnew(.)
- vcmpnezb(.) vcmpnezh(.) vcmpnezw(.)
- Vector Extract Unsigned
- vextractub vextractuh vextractuw vextractd
- vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx
- Vector Insert
- vinsertb vinserth vinsertw vinsertd
26 instructions.
Phabricator: http://reviews.llvm.org/D15916
llvm-svn: 262392
|
| |
|
|
| |
llvm-svn: 262386
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
|
| |
|
|
|
|
|
|
|
|
| |
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
|
| |
|
|
|
|
|
|
| |
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
|
| |
|
|
|
|
|
|
| |
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode
Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements the following vsx instructions:
Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions
Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068
|
| |
|
|
| |
llvm-svn: 261640
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
|
| |
|
|
|
|
|
|
| |
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.
llvm-svn: 261546
|
| |
|
|
| |
llvm-svn: 261527
|
| |
|
|
|
|
|
| |
I mistook BitVector::empty() to mean BitVector::count() == 0 and it does
not. Corrected the issue with the fix for PR26500.
llvm-svn: 261525
|
| |
|
|
| |
llvm-svn: 261524
|
| |
|
|
|
|
|
|
| |
The patch has a necessary call to a function inside an assert. Which is fine
when you have asserts turned on. Not so much when they're off. Sorry about
the regression.
llvm-svn: 261447
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D17294
It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.
llvm-svn: 261441
|