bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PowerPC] Add support for the CMPB instruction	Hal Finkel	2015-01-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106
*	[PowerPC 4/4] Enable little-endian support for VSX.	Bill Schmidt	2014-12-09	1	-7/+0
\| \| \| \| \| \| \| \|	With the foregoing three patches, VSX instructions can be used for little endian. This patch removes the restriction that prevented this, and re-enables the test cases from the first three patches. llvm-svn: 223792
*	Remove redundant calls to isMaterializable.	Rafael Espindola	2014-11-01	1	-3/+1
\| \| \| \| \| \| \| \| \| \|	This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
*	[PowerPC] Reduce names from Power8Vector to P8Vector	Bill Schmidt	2014-10-10	1	-2/+2
\| \| \| \| \| \|	Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514
*	[PowerPC] Add feature for Power8 vector extensions	Bill Schmidt	2014-10-10	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501
*	[PowerPC] Modern Book-E cores support sync	Hal Finkel	2014-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
*	constify the TargetMachine argument used in the subtarget and	Eric Christopher	2014-10-01	1	-1/+1
\| \| \| \| \| \|	lowering constructors. llvm-svn: 218832
*	Now that the optimization level is adjusting the feature string	Eric Christopher	2014-10-01	1	-3/+2
\| \| \| \| \| \|	before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817
*	Rework the PPC TargetMachine so that the non-function specific	Eric Christopher	2014-10-01	1	-25/+3
\| \| \| \| \| \| \|	overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805
*	Remove resetSubtargetFeatures as it is unused.	Eric Christopher	2014-09-03	1	-18/+2
\| \| \| \|	llvm-svn: 217071
*	Reinstate "Nuke the old JIT."	Eric Christopher	2014-09-02	1	-14/+1
\| \| \| \| \| \| \| \|	Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
*	Initialize PPC DataLayout based on the Triple only.	Eric Christopher	2014-08-09	1	-10/+9
\| \| \| \|	llvm-svn: 215281
*	Remove extraneous 64-bit argument to the PPC TargetMachine constructor	Eric Christopher	2014-08-09	1	-2/+4
\| \| \| \| \| \|	and update initialization. llvm-svn: 215280
*	Temporarily Revert "Nuke the old JIT." as it's not quite ready to	Eric Christopher	2014-08-07	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \|	be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
*	Nuke the old JIT.	Rafael Espindola	2014-08-07	1	-14/+1
\| \| \| \| \| \| \| \| \|	I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
*	Add first bunch of SPE instructions. As they overlap with Altivec, mark	Joerg Sonnenberger	2014-08-07	1	-0/+1
\| \| \| \| \| \| \|	them as parser-only until the disassembler is extended to handle predicates properly. llvm-svn: 215102
*	Add support for m[ft][di]bat[ul] instructions.	Joerg Sonnenberger	2014-08-04	1	-0/+1
\| \| \| \|	llvm-svn: 214731
*	Add features for PPC 4xx and e500/e500mc instructions.	Joerg Sonnenberger	2014-08-04	1	-0/+2
\| \| \| \| \| \|	Move the test cases for them into separate files. llvm-svn: 214724
*	[PowerPC] Support ELFv1/ELFv2 ABI selection via features	Ulrich Weigand	2014-07-28	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072
*	Move Post RA Scheduling flag bit into SchedMachineModel	Sanjay Patel	2014-07-15	1	-16/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101
*	add ppc64/pwr8 as target	Will Schmidt	2014-06-26	1	-0/+1
\| \| \| \| \| \| \|	includes handling DIR_PWR8 where appropriate The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later. llvm-svn: 211779
*	Move the PPCSelectionDAGInfo off the TargetMachine and onto the	Eric Christopher	2014-06-12	1	-1/+1
\| \| \| \| \| \|	subtarget. llvm-svn: 210854
*	Move PPCTargetLowering off of the TargetMachine and onto the subtarget.	Eric Christopher	2014-06-12	1	-3/+4
\| \| \| \|	llvm-svn: 210852
*	Move PPCJITInfo off of the TargetMachine and onto the subtarget.	Eric Christopher	2014-06-12	1	-1/+1
\| \| \| \| \| \| \|	Needed to migrate a few functions around to avoid circular header dependencies. llvm-svn: 210845
*	Move PPCInstrInfo off of the target machine and onto the subtarget.	Eric Christopher	2014-06-12	1	-1/+1
\| \| \| \|	llvm-svn: 210839
*	Move DataLayout from the PPCTargetMachine to the subtarget.	Eric Christopher	2014-06-12	1	-1/+37
\| \| \| \|	llvm-svn: 210824
*	Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use	Eric Christopher	2014-06-12	1	-4/+9
\| \| \| \| \| \| \| \|	the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the .cpp file to avoid circular includes. llvm-svn: 210822
*	[PPC64LE] Temporarily disable VSX support in little-endian mode	Bill Schmidt	2014-06-05	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	This is a preliminary patch for the PowerPC64LE support. In stage 1 of the vector support, we will support the VMX (Altivec) instruction set, but will not yet support the VSX instructions. This is merely a staging issue to provide functional vector support as soon as possible. llvm-svn: 210271
*	Save the optimization level the subtarget was created with in a	Eric Christopher	2014-05-13	1	-15/+11
\| \| \| \| \| \| \| \| \| \|	member variable and sink the initialization of crbits into the subtarget feature reset code. No functional change, but this refactor will be used in a future commit. llvm-svn: 208726
*	[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some	Chandler Carruth	2014-04-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	system headers above the includes of generated '.inc' files that actually contain code. In a few targets this was already done pretty consistently, but it wasn't done really consistently anywhere. It is strictly cleaner IMO and necessary in a bunch of places where the DEBUG_TYPE is referenced from the generated code. Consistency with the necessary places trumps. Hopefully the build bots are OK with the movement of intrin.h... llvm-svn: 206838
*	[Modules] Make Support/Debug.h modular. This requires it to not change	Chandler Carruth	2014-04-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	behavior based on other files defining DEBUG_TYPE, which means it cannot define DEBUG_TYPE at all. This is actually better IMO as it forces folks to define relevant DEBUG_TYPEs for their files. However, it requires all files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't already. I've updated all such files in LLVM and will do the same for other upstream projects. This still leaves one important change in how LLVM uses the DEBUG_TYPE macro going forward: we need to only define the macro after header files have been #include-ed. Previously, this wasn't possible because Debug.h required the macro to be pre-defined. This commit removes that. By defining DEBUG_TYPE after the includes two things are fixed: - Header files that need to provide a DEBUG_TYPE for some inline code can do so by defining the macro before their inline code and undef-ing it afterward so the macro does not escape. - We no longer have rampant ODR violations due to including headers with different DEBUG_TYPE definitions. This may be mostly an academic violation today, but with modules these types of violations are easy to check for and potentially very relevant. Where necessary to suppor headers with DEBUG_TYPE, I have moved the definitions below the includes in this commit. I plan to move the rest of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big enough. The comments in Debug.h, which were hilariously out of date already, have been updated to reflect the recommended practice going forward. llvm-svn: 206822
*	[PowerPC] Initial support for the VSX instruction set	Hal Finkel	2014-03-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
*	Add CR-bit tracking to the PowerPC backend for i1 values	Hal Finkel	2014-02-28	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
*	Re-sort all of the includes with ./utils/sort_includes.py so that	Chandler Carruth	2014-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685
*	Add a scheduling model (with itinerary) for the PPC POWER7	Hal Finkel	2013-11-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981
*	Mark PPC MFTB and DST (and friends) as deprecated	Hal Finkel	2013-09-12	1	-0/+2
\| \| \| \| \| \| \| \|	Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605
*	PPC: Enable aggressive anti-dependency breaking	Hal Finkel	2013-09-12	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aggressive anti-dependency breaking is enabled by default for all PPC cores. This provides a general speedup on the P7 and other platforms (among other factors, the instruction group formation for the non-embedded PPC cores is done during post-RA scheduling). In order to do this safely, the incompatibility between uses of the MFOCRF instruction and anti-dependency breaking are resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed FIXME, the problem was that MFOCRF's output is sensitive to the identify of the source register, and always paired with a shift to undo this effect. Because anti-dependency breaking is unaware of this hidden dependency of the shift amount on the source register of the MFOCRF instruction, changing that register must be inhibited. Two test cases were adjusted: The SjLj test was made more insensitive to register choices and scheduling; the saveCR test disabled anti-dependency breaking because part of what it is testing is proper register reuse. llvm-svn: 190587
*	Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores	Hal Finkel	2013-09-11	1	-0/+39
\| \| \| \| \| \| \|	For embedded PPC cores (especially the A2 core), using the MI scheduler with AA is far superior to the other scheduling options. llvm-svn: 190558
*	Add the PPC fcpsgn instruction	Hal Finkel	2013-08-19	1	-0/+1
\| \| \| \| \| \| \| \| \|	Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653
*	[PowerPC] Support powerpc64le as a syntax-checking target.	Bill Schmidt	2013-07-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch provides basic support for powerpc64le as an LLVM target. However, use of this target will not actually generate little-endian code. Instead, use of the target will cause the correct little-endian built-in defines to be generated, so that code that tests for __LITTLE_ENDIAN__, for example, will be correctly parsed for syntax-only testing. Code generation will otherwise be the same as powerpc64 (big-endian), for now. The patch leaves open the possibility of creating a little-endian PowerPC64 back end, but there is no immediate intent to create such a thing. The LLVM portions of this patch simply add ppc64le coverage everywhere that ppc64 coverage currently exists. There is nothing of any import worth testing until such time as little-endian code generation is implemented. In the corresponding Clang patch, there is a new test case variant to ensure that correct built-in defines for little-endian code are generated. llvm-svn: 187179
*	PPC: Refactoring to support subtarget feature changing	Hal Finkel	2013-07-15	1	-37/+62
\| \| \| \| \| \| \| \| \|	This change mirrors the changes that were made to the X86 and ARM targets to support subtarget feature changing. As indicated in r182899, the mechanism is still undergoing revision, and so as with the X86 and ARM targets, there is no test case yet (there is no effective functionality change). llvm-svn: 186357
*	Use PPC reciprocal estimates with Newton iteration in fast-math mode	Hal Finkel	2013-04-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617
*	Add more PPC floating-point conversion instructions	Hal Finkel	2013-04-01	1	-0/+1
\| \| \| \| \| \| \| \| \|	The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480
*	Add the PPC lfiwax instruction	Hal Finkel	2013-03-31	1	-0/+1
\| \| \| \| \| \| \| \| \|	This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446
*	Add PPC FP rounding instructions fri[mnpz]	Hal Finkel	2013-03-29	1	-0/+1
\| \| \| \| \| \| \| \| \|	These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337
*	Add the PPC64 ldbrx/stdbrx instructions	Hal Finkel	2013-03-28	1	-0/+1
\| \| \| \| \| \| \| \|	These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276
*	Add the PPC64 popcntd instruction	Hal Finkel	2013-03-28	1	-0/+1
\| \| \| \| \| \| \|	PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233
*	PPC QPX requires a 32-byte aligned stack	Hal Finkel	2013-01-30	1	-0/+6
\| \| \| \| \| \| \|	On systems which support the QPX vector instructions, the stack must be 32-byte aligned. llvm-svn: 173993
*	Initialize hasQPX in PPCSubtarget	Hal Finkel	2013-01-30	1	-0/+1
\| \| \| \| \| \|	This should have gone in with r173973. llvm-svn: 173984
*	Move all of the header files which are involved in modelling the LLVM IR	Chandler Carruth	2013-01-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366