bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PPC] Use the correct immediate operands on 64-bit instructions	Hal Finkel	2014-01-02	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \|	Several of the 64-bit fixed-point instructions with immediate operands were using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64) operand definitions (u16imm instead of u16imm64, for example). This error has had no effect so far, but would have caused type-checking violations with an upcoming change. llvm-svn: 198356
*	Use r2 when encoding tls on ppc32. Fixes PR18305.	Roman Divacky	2013-12-22	1	-1/+2
\| \| \| \|	llvm-svn: 197878
*	Add some comments.	Roman Divacky	2013-12-22	1	-0/+2
\| \| \| \|	llvm-svn: 197875
*	Implement initial-exec TLS for PPC32.	Roman Divacky	2013-12-20	5	-13/+58
\| \| \| \|	llvm-svn: 197824
*	Long doubles are required to be aligned to 128 bits and svr4 32 bits.	Rafael Espindola	2013-12-19	1	-4/+0
\| \| \| \| \| \|	Clang was already getting this right. llvm-svn: 197694
*	Add a disassembler to the PowerPC backend	Hal Finkel	2013-12-19	11	-3/+361
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693
*	Fix f64 and f128 for ppc-darwin.	Rafael Espindola	2013-12-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a structure are only 32 bit aligned. The patch also drop -f128:64:128 from all ppc darwin, since f128 is 128 bit aligned. llvm-svn: 197574
*	One ppc32-darwin, a i64 inside a structure can have 32 bit alignment.	Rafael Espindola	2013-12-18	1	-1/+2
\| \| \| \| \| \| \| \|	Thanks for Iain Sandoe for testing this with the original gcc. Clang was already getting this right. llvm-svn: 197572
*	Eliminate PPC instruction decoding ambiguities	Hal Finkel	2013-12-17	2	-36/+47
\| \| \| \| \| \| \| \| \| \| \| \|	The instruction definitions in the PPC backend have a number of variants defined for the same instruction to represent differences between 64-bit and 32-bit semantics. In order to generate a disassembler for the PPC backend, we need to mark all but one of these as CodeGen only. No functionality change intended; this is prep work for PPC disassembly support. llvm-svn: 197535
*	Fix the pointer size for the PS3 datalayout.	Rafael Espindola	2013-12-17	1	-2/+5
\| \| \| \| \| \|	This will be tested from clang. llvm-svn: 197501
*	Allow MachineCSE to coalesce trivial subregister copies the same way that it ↵	Andrew Trick	2013-12-17	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465
*	whitespace	Andrew Trick	2013-12-17	1	-3/+3
\| \| \| \|	llvm-svn: 197464
*	The preferred alignment defaults to the abi alignment. Omit if it is the same.	Rafael Espindola	2013-12-16	1	-1/+1
\| \| \| \|	llvm-svn: 197400
*	On DataLayout, omit the default of p:64:64:64.	Rafael Espindola	2013-12-16	1	-4/+2
\| \| \| \|	llvm-svn: 197397
*	Set has_asmparser in PowerPC/LLVMBuild.txt	Hal Finkel	2013-12-16	1	-0/+1
\| \| \| \| \| \| \|	PowerPC now has an asm parser (and has for many months now); indicate this in PowerPC/LLVMBuild.txt. llvm-svn: 197393
*	[Powerpc darwin] AsmParser Base implementation.	Iain Sandoe	2013-12-14	1	-13/+134
\| \| \| \| \| \| \| \| \| \| \| \|	This is a base implementation of the powerpc-apple-darwin asm parser dialect. * Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions. * Enables parsing of {r,f,v}XX as register identifiers. * Enables parsing of lo16() hi16() and ha16() as modifiers. The changes to the test case are from David Fang (fangism). llvm-svn: 197324
*	Assume defaults to produce smaller datalayout strings.	Rafael Espindola	2013-12-13	1	-12/+2
\| \| \| \|	llvm-svn: 197249
*	test commit.	Iain Sandoe	2013-12-13	1	-1/+1
\| \| \| \| \| \|	Amend a comment. llvm-svn: 197237
*	typo in comment	Gabor Greif	2013-12-12	1	-2/+2
\| \| \| \|	llvm-svn: 197136
*	Remove unused multiclass from PPCInstrInfo.td	Hal Finkel	2013-12-12	1	-14/+0
\| \| \| \|	llvm-svn: 197100
*	Improve instruction scheduling for the PPC POWER7	Hal Finkel	2013-12-12	7	-3/+337
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099
*	Fix the PPC subsumes-predicate check	Hal Finkel	2013-12-11	1	-0/+4
\| \| \| \| \| \| \| \| \|	For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089
*	Prune redundant dependencies in LLVMBuild.txt.	NAKAMURA Takumi	2013-12-11	1	-1/+1
\| \| \| \|	llvm-svn: 196988
*	Move PPC's getDataLayoutString out of line and document it better.	Rafael Espindola	2013-12-11	2	-17/+39
\| \| \| \|	llvm-svn: 196987
*	on darwin<10, fallback to .weak_definition (PPC,X86)	David Fang	2013-12-10	3	-3/+12
\| \| \| \| \| \|	.weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970
*	Add proper dependencies to LLVMBuild.txt in llvm/lib.	NAKAMURA Takumi	2013-12-10	1	-1/+1
\| \| \| \| \| \|	I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881
*	Remove the isImplicitlyPrivate argument of getNameWithPrefix.	Rafael Espindola	2013-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	getSymbolWithGlobalValueBase use is to create a name of a new symbol based on the name of an existing GV. Assert that and then remove the last call to pass true to isImplicitlyPrivate. This gives the mangler API a 1:1 mapping from GV to names, which is what we need to drop the mangler dependency on the target (and use an extended datalayout instead). llvm-svn: 196472
*	Correct word hyphenations	Alp Toker	2013-12-05	1	-1/+1
\| \| \| \| \| \| \|	This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
*	Remove PPCScoreboardHazardRecognizer	Hal Finkel	2013-12-02	3	-41/+2
\| \| \| \| \| \| \| \| \| \|	PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer which did only one thing: filtered out nodes in EmitInstruction for which DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo instructions. As far as I can tell, this is no longer true, and so we can use ScoreboardHazardRecognizer directly. llvm-svn: 196171
*	Refactor the setting of PrivateGlobalPrefix.	Rafael Espindola	2013-12-02	1	-1/+0
\| \| \| \| \| \|	No functionality change. llvm-svn: 196170
*	Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.	Rafael Espindola	2013-12-02	1	-3/+3
\| \| \| \| \| \|	This allows it to be used in TargetLoweringObjectFileImpl.cpp. llvm-svn: 196117
*	Remove dead code.	Rafael Espindola	2013-12-02	1	-24/+0
\| \| \| \| \| \| \| \| \|	MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm. Keeping parts of the old asm printer just to print inline asm to a string that we then parse back looks like a hack. llvm-svn: 196111
*	Change the default of AsmWriterClassName and isMCAsmWriter.	Rafael Espindola	2013-12-02	1	-7/+1
\| \| \| \|	llvm-svn: 196065
*	Refactor for clarity and efficiency.	Rafael Espindola	2013-12-02	1	-23/+22
\| \| \| \| \| \| \|	The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so this should be a nop. llvm-svn: 196059
*	Add a scheduling model (with itinerary) for the PPC POWER7	Hal Finkel	2013-11-30	4	-2/+390
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981
*	Split some PPC itinerary classes	Hal Finkel	2013-11-30	11	-31/+154
\| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980
*	Adjust PPC A2 input operand latencies	Hal Finkel	2013-11-29	1	-52/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951
*	Create a PPC440 SchedMachineModel	Hal Finkel	2013-11-29	2	-6/+20
\| \| \| \| \| \| \|	Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949
*	Fixup PPC440 load/store operand latencies	Hal Finkel	2013-11-29	1	-19/+19
\| \| \| \| \| \| \| \|	The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948
*	Adjust PPC440 operand latencies	Hal Finkel	2013-11-29	1	-54/+54
\| \| \| \| \| \| \| \| \| \| \| \|	The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947
*	Don't model the fetch and decode units for the PPC440	Hal Finkel	2013-11-29	1	-180/+61
\| \| \| \| \| \| \| \| \| \|	Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946
*	[CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.	NAKAMURA Takumi	2013-11-28	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929
*	[CMake] Let add_public_tablegen_target responsible to provide dependency to ↵	NAKAMURA Takumi	2013-11-28	5	-9/+1
\| \| \| \| \| \| \| \| \|	CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927
*	[CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() ↵	NAKAMURA Takumi	2013-11-28	3	-7/+0
\| \| \| \| \| \|	sets them. llvm-svn: 195921
*	Use the mangler consistently instead of using getGlobalPrefix directly.	Rafael Espindola	2013-11-28	2	-5/+4
\| \| \| \|	llvm-svn: 195911
*	Don't share functional units among the PPC itineraries	Hal Finkel	2013-11-28	9	-1261/+1364
\| \| \| \| \| \| \| \| \| \| \|	Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908
*	Add IIC_ prefix to PPC instruction-class names	Hal Finkel	2013-11-27	13	-2355/+2366
\| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890
*	Don't set GlobalPrefix to the default value.	Rafael Espindola	2013-11-27	1	-1/+0
\| \| \| \|	llvm-svn: 195884
*	Fix comment in PPCA2Model	Hal Finkel	2013-11-27	1	-1/+1
\| \| \| \|	llvm-svn: 195807
*	PPC popcnt[dw] do not have record forms	Hal Finkel	2013-11-20	1	-6/+6
\| \| \| \| \| \| \|	The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272