| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 196362
|
| |
|
|
| |
llvm-svn: 196360
|
| |
|
|
| |
llvm-svn: 196334
|
| |
|
|
|
|
|
|
|
| |
this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.
llvm-svn: 196331
|
| |
|
|
|
|
|
|
|
|
| |
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)
This fixes, for example, calling stringstream::str.
llvm-svn: 196312
|
| |
|
|
|
|
| |
Testcase added.
llvm-svn: 196269
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.
Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy. The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.
llvm-svn: 196267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
not be issued before 'call' insn. There're other cases where helper
calls will be inserted not limited to epilog. These helper calls do
not follow the standard calling convention and won't clobber any YMM
registers. (So far, all call conventions will clobber any or part of
YMM registers.)
This patch enhances the previous fix to cover more cases 'vzerosupper' should
not be inserted by checking if that function call won't clobber any YMM
registers and skipping it if so.
llvm-svn: 196261
|
| |
|
|
|
|
| |
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn
llvm-svn: 196210
|
| |
|
|
|
|
|
|
| |
from VFP instructions.
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.
llvm-svn: 196208
|
| |
|
|
| |
llvm-svn: 196203
|
| |
|
|
|
|
| |
E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".
llvm-svn: 196198
|
| |
|
|
|
|
| |
and friends.
llvm-svn: 196192
|
| |
|
|
|
|
| |
vmull_high_n_s16 and friends.
llvm-svn: 196190
|
| |
|
|
|
|
| |
These targets have special asm printers that don't use these.
llvm-svn: 196187
|
| |
|
|
|
|
|
|
|
|
| |
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.
llvm-svn: 196171
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 196170
|
| |
|
|
| |
llvm-svn: 196169
|
| |
|
|
| |
llvm-svn: 196168
|
| |
|
|
|
|
| |
Patch by Ana Pazos!
llvm-svn: 196151
|
| |
|
|
| |
llvm-svn: 196121
|
| |
|
|
|
|
| |
This allows it to be used in TargetLoweringObjectFileImpl.cpp.
llvm-svn: 196117
|
| |
|
|
|
|
|
|
| |
This makes the code a little more idiomatic.
No change in behaviour.
llvm-svn: 196113
|
| |
|
|
|
|
|
|
|
| |
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.
Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.
llvm-svn: 196111
|
| |
|
|
| |
llvm-svn: 196102
|
| |
|
|
| |
llvm-svn: 196094
|
| |
|
|
|
|
|
|
|
|
|
|
| |
eliminateFrameIndex() has been reworked to handle both small & large frames
with either a FP or SP.
An additional Slot is required for Scavenging spills when not using FP for large frames.
Reworked the handling of Register Scavenging.
Whether we are using an FP or not, whether it is a large frame or not,
and whether we are using a large code model or not are now independent.
llvm-svn: 196091
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
llvm-svn: 196090
|
| |
|
|
| |
llvm-svn: 196089
|
| |
|
|
| |
llvm-svn: 196088
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using large code model:
Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large"
The folded global address of such objects are lowered into the const pool.
During inspection it was noted that LowerConstantPool() was using a default offset of zero.
A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental.
Correct the flags emitted for explicitly specified sections.
We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize.
To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required.
llvm-svn: 196087
|
| |
|
|
|
|
|
|
| |
Large frame offsets are loaded from the ConstantPool.
Where possible, offsets are encoded using the smaller MKMSK instruction.
Large frame offsets can only be used when there is a frame-pointer.
llvm-svn: 196085
|
| |
|
|
| |
llvm-svn: 196084
|
| |
|
|
| |
llvm-svn: 196068
|
| |
|
|
| |
llvm-svn: 196067
|
| |
|
|
| |
llvm-svn: 196065
|
| |
|
|
| |
llvm-svn: 196063
|
| |
|
|
|
|
|
| |
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.
llvm-svn: 196059
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.
This should fix PR18081.
llvm-svn: 196046
|
| |
|
|
|
|
|
|
|
| |
- Actually abort when an error occurred.
- Check that the frontend lookup worked when parsing length/size/type operators.
Tested by a clang test. PR18096.
llvm-svn: 196044
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).
One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.
Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:
MultiSource/Benchmarks/BitBench/drop3/drop3
with AA: N/A
without AA: -28.7614% +/- 19.8356%
(significantly against AA)
MultiSource/Benchmarks/FreeBench/neural/neural
with AA: -17.7406% +/- 11.2712%
without AA: N/A
(significantly in favor of AA)
MultiSource/Benchmarks/SciMark2-C/scimark2
with AA: -11.2079% +/- 1.80543%
without AA: -11.3263% +/- 2.79651%
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
with AA: -41.8649% +/- 17.0053%
without AA: -34.5256% +/- 23.7072%
MultiSource/Benchmarks/mafft/pairlocalalign
with AA: 25.3016% +/- 17.8614%
without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)
MultiSource/Benchmarks/sim/sim
with AA: N/A
without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)
SingleSource/Benchmarks/BenchmarkGame/Large/fasta
with AA: 15.0664% +/- 6.70216%
without AA: 12.7747% +/- 8.43043%
SingleSource/Benchmarks/BenchmarkGame/puzzle
with AA: 82.2713% +/- 26.3567%
without AA: 75.7525% +/- 41.1842%
SingleSource/Benchmarks/Misc/flops-2
with AA: -37.1621% +/- 20.7964%
without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)
These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.
llvm-svn: 195981
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.
No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).
llvm-svn: 195980
|
| |
|
|
| |
llvm-svn: 195975
|
| |
|
|
|
|
|
|
|
| |
lowering FrameIndex.
This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s
when the stack frame is between 512 and 32,768 bytes in size.
llvm-svn: 195973
|
| |
|
|
|
|
| |
No functional change. An if-statement has been split into two nested if-statements.
llvm-svn: 195972
|
| |
|
|
|
|
|
|
|
|
|
|
| |
in constant islands for Mips16. We introdcuce JalB16 as a synomnym
for Jal16. It makes it easier to read and is also necessary because
Jal16 is a call instruction but JalB16 is being used as a branch.
Various parts of LLVM will not work properly even in this late stage of
the backend if we use what was declared as a call instruction to function
as a branch. For one, basic block labels may not get emitted in some
situations.
llvm-svn: 195968
|
| |
|
|
| |
llvm-svn: 195967
|
| |
|
|
| |
llvm-svn: 195965
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.
Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):
speedups:
MultiSource/Benchmarks/sim/sim
-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
-18.1294% +/- 3.96489%
regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
10.3517% +/- 1.227%
Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.
llvm-svn: 195951
|
| |
|
|
|
|
|
| |
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.
llvm-svn: 195949
|