bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM][MVE] Enable masked gathers from vector of pointers	Anna Welker	2020-01-08	1	-0/+2
\| \| \| \| \| \| \| \|	Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743
*	[CodeGen] Move ARMCodegenPrepare to TypePromotion	Sam Parker	2019-12-03	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert ARMCodeGenPrepare into a generic type promotion pass by: - Removing the insertion of arm specific intrinsics to handle narrow types as we weren't using this. - Removing ARMSubtarget references. - Now query a generic TLI object to know which types should be promoted and what they should be promoted to. - Move all codegen tests into Transforms folder and testing using opt and not llc, which is how they should have been written in the first place... The pass searches up from icmp operands in an attempt to safely promote types so we can avoid generating unnecessary unsigned extends during DAG ISel. Differential Revision: https://reviews.llvm.org/D69556
*	[ARM] MVE Tail Predication	Sam Parker	2019-09-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
*	[ARM] DLS/LE low-overhead loop code generation	Sam Parker	2019-06-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large. Differential Revision: https://reviews.llvm.org/D63476 llvm-svn: 364288
*	[ARM] Some Thumb2ITBlock clean ups. NFC	Sjoerd Meijer	2019-06-18	1	-0/+1
\| \| \| \| \| \| \| \| \|	Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666
*	[ARM] Extract some code from ARMConstantIslandPass	Sam Parker	2019-06-17	1	-5/+0
\| \| \| \| \| \| \| \| \|	Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530
*	[ARM] MVE VPT Block Pass	Sjoerd Meijer	2019-06-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[ARM] ARMCodeGenPrepare backend pass	Sam Parker	2018-07-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
*	[ARM] Parallel DSP Pass	Sjoerd Meijer	2018-06-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850
*	[ARM] Register the Thumb2SizeReducePass. NFC	David Green	2017-12-19	1	-0/+1
\| \| \| \| \| \|	Also adds a simple test case. llvm-svn: 321072
*	[ARM] Register ARMExpandPseudo pass.	Eli Friedman	2017-09-05	1	-0/+1
\| \| \| \| \| \| \| \|	This allows -run-pass etc. to refer to it. (Split off from D35156.) llvm-svn: 312587
*	[ARM] GlobalISel: Use TableGen instruction selector	Diana Picus	2017-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Emit and use the TableGen instruction selector for ARM. At the moment, this allows us to remove the hand-written code for selecting G_SDIV and G_UDIV. Future commits will focus on increasing the code coverage for it and removing more dead code from the current instruction selector. llvm-svn: 301905
*	[ARM] GlobalISel: Get rid of ARMInstructionSelector.h. NFC.	Diana Picus	2017-04-28	1	-0/+6
\| \| \| \| \| \| \| \| \|	Declare the ARMInstructionSelector in an anonymous namespace, to make it more in line with the other targets which were migrated to this in r299637 in order to avoid TableGen'erated headers being included in non-GlobalISel builds. llvm-svn: 301632
*	[ARM] Register ConstantIslands with the pass manager	James Molloy	2017-02-13	1	-0/+1
\| \| \| \| \| \| \|	This allows us to use -stop-before/-stop-after/-run-pass - we can now write .mir tests. llvm-svn: 294948
*	[ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵	Eugene Zelenko	2017-01-26	1	-6/+6
\| \| \| \| \| \|	minor fixes (NFC). llvm-svn: 293229
*	This refactoring of ARM machine block size computation creates two utility	Sjoerd Meijer	2016-07-22	1	-0/+5
\| \| \| \| \| \| \| \| \|	functions so that the size computation is available not only in ConstantIslands but in other passes as well. Differential Revision: https://reviews.llvm.org/D22640 llvm-svn: 276399
*	ARM: Initialize LoadStore passes in TargetMachine	Matthias Braun	2016-07-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initializing them in LLVMInitializeARMTarget() makes them visible early enough for "llc -run-pass usage". This required the pass to be renamed from "arm-load-store-opt" to "arm-ldst-opt", because there already exists an arm-load-store-opt cl::opt switch which would now clash with the passname getting added as a switch in opt. On the bright side the pass name now matches the DEBUG_TYPE name. Renamed "arm-prera-load-store-opt" to "arm-repra-ldst-opt" as well for consistency. llvm-svn: 275661
*	ARM/ELF: Better codegen for global variable addresses.	Peter Collingbourne	2015-10-26	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PIC mode we were previously computing global variable addresses (or GOT entry addresses) by adding the PC, the PC-relative GOT displacement and the GOT-relative symbol/GOT entry displacement. Because the latter two displacements are fixed, we ended up performing one more addition than necessary. This change causes us to compute addresses using a single PC-relative displacement, resulting in a shorter code sequence. This reduces code size by about 4% in a recent build of Chromium for Android. As a result of this change we no longer need to compute the GOT base address in the ARM backend, which allows us to remove the Global Base Reg pass and SDAG lowering for the GOT. We also now no longer use the GOT when addressing a symbol which is known to be defined in the same linkage unit. Specifically, the symbol must have either hidden visibility or a strong definition in the current module in order to not use the the GOT. This is a change from the previous behaviour where we would use the GOT to address externally visible symbols defined in the same module. I think the only cases where this could matter are cases involving symbol interposition, but we don't really support that well anyway. Differential Revision: http://reviews.llvm.org/D13650 llvm-svn: 251322
*	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)	Alexander Kornienko	2015-06-23	1	-1/+1
\| \| \| \| \| \|	Apparently, the style needs to be agreed upon first. llvm-svn: 240390
*	Fixed/added namespace ending comments using clang-tidy. NFC	Alexander Kornienko	2015-06-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
*	[ARM] Pass a callback to FunctionPass constructors to enable skipping execution	Akira Hatanaka	2015-06-08	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on a per-function basis. Previously some of the passes were conditionally added to ARM's pass pipeline based on the target machine's subtarget. This patch makes changes to add those passes unconditionally and execute them conditonally based on the predicate functor passed to the pass constructors. This enables running different sets of passes for different functions in the module. rdar://problem/20542263 Differential Revision: http://reviews.llvm.org/D8717 llvm-svn: 239325
*	[ARM] Remove unused declaration. NFC.	Ahmed Bougacha	2015-02-16	1	-1/+0
\| \| \| \| \| \| \|	GlobalMerge was moved to lib/CodeGen a while ago, and is no longer called "ARMGlobalMerge". llvm-svn: 229448
*	[PM] Remove a bunch of stale TTI creation method declarations. I nuked	Chandler Carruth	2015-02-01	1	-3/+0
\| \| \| \| \| \| \|	their definitions, but forgot to clean up all the declarations which are in different files. llvm-svn: 227698
*	Reinstate "Nuke the old JIT."	Eric Christopher	2014-09-02	1	-5/+0
\| \| \| \| \| \| \| \|	Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
*	Canonicalize header guards into a common format.	Benjamin Kramer	2014-08-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
*	Temporarily Revert "Nuke the old JIT." as it's not quite ready to	Eric Christopher	2014-08-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
*	Nuke the old JIT.	Rafael Espindola	2014-08-07	1	-5/+0
\| \| \| \| \| \| \| \| \|	I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
*	Atomics: promote ARM's IR-based atomics pass to CodeGen.	Tim Northover	2014-04-17	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485
*	ARM: expand atomic ldrex/strex loops in IR	Tim Northover	2014-04-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded at MachineInstr emission time had grown to be extremely large and involved, to account for the subtly different code needed for the various flavours (8/16/32/64 bit, cmpxchg/add/minmax). Moving this transformation into the IR clears up the code substantially, and makes future optimisations much easier: 1. an atomicrmw followed by using the new value can be more efficient. As an IR pass, simple CSE could handle this efficiently. 2. Making use of cmpxchg success/failure orderings only has to be done in one (simpler) place. 3. The common "cmpxchg; did we store?" idiom can be exposed to optimisation. I intend to gradually improve this situation within the ARM backend and make sure there are no hidden issues before moving the code out into CodeGen to be shared with (at least ARM64/AArch64, though I think PPC & Mips could benefit too). llvm-svn: 205525
*	Remove duplicated DMB instructions	Renato Golin	2014-04-02	1	-0/+1
\| \| \| \| \| \| \| \| \|	ARM specific optimiztion, finding places in ARM machine code where 2 dmbs follow one another, and eliminating one of them. Patch by Reinoud Elhorst. llvm-svn: 205409
*	Prune includes in ARM target.	Craig Topper	2014-03-22	1	-4/+3
\| \| \| \|	llvm-svn: 204548
*	Adding an A15 specific optimization pass for interactions between S/D/Q ↵	Silviu Baranga	2013-03-15	1	-0/+1
\| \| \| \| \| \|	registers. The pass handles all the required transformations pre-regalloc. llvm-svn: 177169
*	ARM: Fix a few copy-paste errors.	Jim Grosbach	2013-01-07	1	-1/+1
\| \| \| \| \| \|	s/X86/ARM/ llvm-svn: 171789
*	Switch TargetTransformInfo from an immutable analysis pass that requires	Chandler Carruth	2013-01-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis pass, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it always being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. llvm-svn: 171681
*	[arm-fast-isel] Add support for ELF PIC.	Jush Lu	2012-09-27	1	-0/+1
\| \| \| \| \| \| \|	This is a preliminary step towards ELF support; currently ARMFastISel hasn't been used for ELF object files yet. llvm-svn: 164759
*	Reorder includes to match coding standards. Fix an issue or two exposed by that.	Craig Topper	2012-03-17	1	-2/+0
\| \| \| \|	llvm-svn: 152978
*	comment fix ARM.h	Jia Liu	2012-02-19	1	-1/+1
\| \| \| \|	llvm-svn: 150904
*	Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, ↵	Jia Liu	2012-02-18	1	-1/+1
\| \| \| \| \| \|	MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878
*	Delete NEONMoveFix, now unused.	Jakob Stoklund Olesen	2011-09-29	1	-1/+0
\| \| \| \|	llvm-svn: 140773
*	Move to ISelLowering.	Bill Wendling	2011-09-29	1	-1/+0
\| \| \| \|	llvm-svn: 140754
*	This is the start of the new SjLj EH preparation pass, which will replace the	Bill Wendling	2011-09-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	current IR-level pass. The old SjLj EH pass has some problems, especially with the new EH model. Most significantly, it violates some of the new restrictions the new model has. For instance, the 'dispatch' table wants to jump to the landing pad, but we cannot allow that because only an invoke's unwind edge can jump to a landing pad. This requires us to mangle the code something awful. In addition, we need to keep the now dead landingpad instructions around instead of CSE'ing them because the DWARF emitter uses that information (they are dead because no control flow edge will execute them - the control flow edge from an invoke's unwind is superceded by the edge coming from the dispatch). Basically, this pass belongs not at the IR level where SSA is king, but at the code-gen level, where we have more flexibility. llvm-svn: 140646
*	Code clean up.	Evan Cheng	2011-07-25	1	-6/+0
\| \| \| \|	llvm-svn: 135954
*	Sink ARM mc routines into MCTargetDesc.	Evan Cheng	2011-07-23	1	-13/+1
\| \| \| \|	llvm-svn: 135825
*	Next round of MC refactoring. This patch factor MC table instantiations, MC	Evan Cheng	2011-07-14	1	-2/+1
\| \| \| \| \| \|	registeration and creation code into XXXMCDesc libraries. llvm-svn: 135184
*	- Eliminate MCCodeEmitter's dependency on TargetMachine. It now uses MCInstrInfo	Evan Cheng	2011-07-11	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \|	and MCSubtargetInfo. - Added methods to update subtarget features (used when targets automatically detect subtarget features or switch modes). - Teach X86Subtarget to update MCSubtargetInfo features bits since the MCSubtargetInfo layer can be shared with other modules. - These fixes .code 16 / .code 32 support since mode switch is updated in MCSubtargetInfo so MC code emitter can do the right thing. llvm-svn: 134884
*	Add missing header.	Jim Grosbach	2011-06-22	1	-0/+1
\| \| \| \|	llvm-svn: 133640
*	Move ARMMachObjectWriter to its own file.	Jim Grosbach	2011-06-22	1	-0/+7
\| \| \| \| \| \|	Just tidy up a bit. No functional change. llvm-svn: 133638
*	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	Evan Cheng	2010-12-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
*	Move the ARMAsmPrinter class defintiion into a header file.	Jim Grosbach	2010-12-01	1	-3/+3
\| \| \| \|	llvm-svn: 120551