| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
Several of the 64-bit fixed-point instructions with immediate operands were
using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64)
operand definitions (u16imm instead of u16imm64, for example).
This error has had no effect so far, but would have caused type-checking
violations with an upcoming change.
llvm-svn: 198356
|
|
|
|
| |
llvm-svn: 197878
|
|
|
|
| |
llvm-svn: 197875
|
|
|
|
| |
llvm-svn: 197824
|
|
|
|
|
|
| |
Clang was already getting this right.
llvm-svn: 197694
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.
Future work includes:
- Generating more-informative mnemonics when possible (this may also be done
in the printer).
- Remove the dependence on positional "numbered" operand-to-variable mapping
(for both encoding and decoding).
- Internally using 64-bit instruction variants in 64-bit mode (if this turns
out to matter).
llvm-svn: 197693
|
|
|
|
|
|
|
|
|
|
| |
This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a
structure are only 32 bit aligned.
The patch also drop -f128:64:128 from all ppc darwin, since f128 is
128 bit aligned.
llvm-svn: 197574
|
|
|
|
|
|
|
|
| |
Thanks for Iain Sandoe for testing this with the original gcc.
Clang was already getting this right.
llvm-svn: 197572
|
|
|
|
|
|
|
|
|
|
|
|
| |
The instruction definitions in the PPC backend have a number of variants
defined for the same instruction to represent differences between 64-bit and
32-bit semantics. In order to generate a disassembler for the PPC backend, we
need to mark all but one of these as CodeGen only.
No functionality change intended; this is prep work for PPC disassembly
support.
llvm-svn: 197535
|
|
|
|
|
|
| |
This will be tested from clang.
llvm-svn: 197501
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.
This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:
%vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
%vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
%vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>
Test case: cse-add-with-overflow.ll.
This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.
llvm-svn: 197465
|
|
|
|
| |
llvm-svn: 197464
|
|
|
|
| |
llvm-svn: 197400
|
|
|
|
| |
llvm-svn: 197397
|
|
|
|
|
|
|
| |
PowerPC now has an asm parser (and has for many months now); indicate this in
PowerPC/LLVMBuild.txt.
llvm-svn: 197393
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a base implementation of the powerpc-apple-darwin asm parser dialect.
* Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions.
* Enables parsing of {r,f,v}XX as register identifiers.
* Enables parsing of lo16() hi16() and ha16() as modifiers.
The changes to the test case are from David Fang (fangism).
llvm-svn: 197324
|
|
|
|
| |
llvm-svn: 197249
|
|
|
|
|
|
| |
Amend a comment.
llvm-svn: 197237
|
|
|
|
| |
llvm-svn: 197136
|
|
|
|
| |
llvm-svn: 197100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.
significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):
speedups:
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
-0.55171% +/- 0.333168%
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
-17.5576% +/- 14.598%
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
-29.5708% +/- 7.09058%
MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
-34.9471% +/- 11.4391%
SingleSource/Benchmarks/BenchmarkGame/puzzle
-25.1347% +/- 11.0104%
SingleSource/Benchmarks/Misc/flops-8
-17.7297% +/- 9.79061%
SingleSource/Benchmarks/Shootout-C++/ary3
-35.5018% +/- 23.9458%
SingleSource/Regression/C/uint64_to_float
-56.3165% +/- 25.4234%
SingleSource/UnitTests/Vectorizer/gcc-loops
-18.5309% +/- 6.8496%
regressions:
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
18.351% +/- 12.156%
SingleSource/Benchmarks/Shootout-C++/methcall
27.3086% +/- 14.4733%
llvm-svn: 197099
|
|
|
|
|
|
|
|
|
| |
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.
Fixes PR18003.
llvm-svn: 197089
|
|
|
|
| |
llvm-svn: 196988
|
|
|
|
| |
llvm-svn: 196987
|
|
|
|
|
|
| |
.weak_def_can_be_hidden was not yet supported by the system assembler
llvm-svn: 196970
|
|
|
|
|
|
| |
I'll prune redundant deps in LLVMBuild.txt, later.
llvm-svn: 196881
|
|
|
|
|
|
|
|
|
|
|
|
| |
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.
This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).
llvm-svn: 196472
|
|
|
|
|
|
|
| |
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
llvm-svn: 196471
|
|
|
|
|
|
|
|
|
|
| |
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.
llvm-svn: 196171
|
|
|
|
|
|
| |
No functionality change.
llvm-svn: 196170
|
|
|
|
|
|
| |
This allows it to be used in TargetLoweringObjectFileImpl.cpp.
llvm-svn: 196117
|
|
|
|
|
|
|
|
|
| |
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.
Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.
llvm-svn: 196111
|
|
|
|
| |
llvm-svn: 196065
|
|
|
|
|
|
|
| |
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.
llvm-svn: 196059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).
One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.
Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:
MultiSource/Benchmarks/BitBench/drop3/drop3
with AA: N/A
without AA: -28.7614% +/- 19.8356%
(significantly against AA)
MultiSource/Benchmarks/FreeBench/neural/neural
with AA: -17.7406% +/- 11.2712%
without AA: N/A
(significantly in favor of AA)
MultiSource/Benchmarks/SciMark2-C/scimark2
with AA: -11.2079% +/- 1.80543%
without AA: -11.3263% +/- 2.79651%
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
with AA: -41.8649% +/- 17.0053%
without AA: -34.5256% +/- 23.7072%
MultiSource/Benchmarks/mafft/pairlocalalign
with AA: 25.3016% +/- 17.8614%
without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)
MultiSource/Benchmarks/sim/sim
with AA: N/A
without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)
SingleSource/Benchmarks/BenchmarkGame/Large/fasta
with AA: 15.0664% +/- 6.70216%
without AA: 12.7747% +/- 8.43043%
SingleSource/Benchmarks/BenchmarkGame/puzzle
with AA: 82.2713% +/- 26.3567%
without AA: 75.7525% +/- 41.1842%
SingleSource/Benchmarks/Misc/flops-2
with AA: -37.1621% +/- 20.7964%
without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)
These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.
llvm-svn: 195981
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.
No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).
llvm-svn: 195980
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.
Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):
speedups:
MultiSource/Benchmarks/sim/sim
-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
-18.1294% +/- 3.96489%
regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
10.3517% +/- 1.227%
Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.
llvm-svn: 195951
|
|
|
|
|
|
|
| |
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.
llvm-svn: 195949
|
|
|
|
|
|
|
|
| |
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).
llvm-svn: 195948
|
|
|
|
|
|
|
|
|
|
|
|
| |
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.
This could alter scheduling slightly, but I don't expect a large effect.
llvm-svn: 195947
|
|
|
|
|
|
|
|
|
|
| |
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).
No functionality change intended.
llvm-svn: 195946
|
|
|
|
|
|
|
|
|
|
| |
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
llvm-svn: 195929
|
|
|
|
|
|
|
|
|
| |
CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
llvm-svn: 195927
|
|
|
|
|
|
| |
sets them.
llvm-svn: 195921
|
|
|
|
| |
llvm-svn: 195911
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name. This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.
No functionality change intended.
llvm-svn: 195908
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.
Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.
No functionality change intended.
llvm-svn: 195890
|
|
|
|
| |
llvm-svn: 195884
|
|
|
|
| |
llvm-svn: 195807
|
|
|
|
|
|
|
| |
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.
llvm-svn: 195272
|