summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [multiversion] Switch all of the targets over to use theChandler Carruth2015-02-011-2/+4
| | | | | | | | | | | | | | | | TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735
* [PM] Switch the TargetMachine interface from accepting a pass managerChandler Carruth2015-01-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI *pass* in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
* [PM] Change the core design of the TTI analysis to use a polymorphicChandler Carruth2015-01-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
* Remove a few getSubtarget calls in AArch64 pass manager initialization.Eric Christopher2015-01-301-2/+2
| | | | llvm-svn: 227531
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-261-0/+7
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.Matthias Braun2014-12-111-5/+5
| | | | llvm-svn: 224075
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-17/+13
| | | | | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
* This reverts commit r224043 and r224042.Rafael Espindola2014-12-111-8/+12
| | | | | | check-llvm was failing. llvm-svn: 224045
* Enable machineverifier in debug mode for X86, ARM, AArch64, MipsMatthias Braun2014-12-111-5/+5
| | | | llvm-svn: 224043
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-17/+13
| | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
* Add out of line virtual destructors to all LLVMTargetMachine subclassesReid Kleckner2014-11-201-0/+2
| | | | | | | | | | | | | | | | | These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480
* [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on ↵Hao Liu2014-11-191-0/+18
| | | | | | | | | | | | AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-131-0/+12
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* AArch64: enable Cortex-A57 FP balancing on Cortex-A53.Tim Northover2014-10-281-1/+2
| | | | | | | | | | Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744
* [PBQP] Teach PassConfig to tell if the default register allocator is used.Arnaud A. de Grandmaison2014-10-211-13/+2
| | | | | | | | This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321
* [AArch64] Add workaround for Cortex-A53 erratum (835769)Bradley Smith2014-10-131-0/+7
| | | | | | | | | | | | | | | | | | | | Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. This patch implements such work-around in the backend, enabled via the option -aarch64-fix-cortex-a53-835769. The work-around code generation is not enabled by default. llvm-svn: 219603
* [PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint).Lang Hames2014-10-091-1/+1
| | | | | | | | | | | | | | | | This patch removes the PBQPBuilder class and its subclasses and replaces them with a composable constraints class: PBQPRAConstraint. This allows constraints that are only required for optimisation (e.g. coalescing, soft pairing) to be mixed and matched. This patch also introduces support for target writers to supply custom constraints for their targets by overriding a TargetSubtargetInfo method: std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const; This patch should have no effect on allocations. llvm-svn: 219421
* Add subtarget caches to aarch64, arm, ppc, and x86.Eric Christopher2014-10-061-1/+28
| | | | | | | | | These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106
* [AArch64] Don't enable the post-RA MI scheduler at OptNone.Chad Rosier2014-09-121-1/+2
| | | | | | Hopefully, this will appease the bots. llvm-svn: 217712
* [AArch64] Enable post-RA MI scheduler.Chad Rosier2014-09-121-1/+3
| | | | | | | Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693
* [AArch64] Add experimental PBQP supportArnaud A. de Grandmaison2014-09-101-2/+14
| | | | | | | | | | This adds target specific support for using the PBQP register allocator on the AArch64, for the A57 cpu. By default, the PBQP allocator is not used, unless explicitely required on the command line with "-aarch64-pbqp". llvm-svn: 217504
* [AArch64] Add pass to enable additional comparison optimizations by CSE.Jiangning Liu2014-09-051-0/+7
| | | | | | | | | | | | | | | | | | Patched by Sergey Dmitrouk. This pass tries to make consecutive compares of values use same operands to allow CSE pass to remove duplicated instructions. For this it analyzes branches and adjusts comparisons with immediate values by converting: GE -> GT GT -> GE LT -> LE LE -> LT and adjusting immediate values appropriately. It basically corrects two immediate values towards each other to make them equal. llvm-svn: 217220
* Rename AtomicExpandLoadLinked into AtomicExpandRobin Morisset2014-08-211-1/+1
| | | | | | | | | | | AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html for details The command line option is "atomic-expand" llvm-svn: 216231
* [AArch64] Run a peephole pass right after AdvSIMD pass.Quentin Colombet2014-08-211-1/+5
| | | | | | | | | The AdvSIMD pass may produce copies that are not coalescer-friendly. The peephole optimizer knows how to fix that as demonstrated in the test case. <rdar://problem/12702965> llvm-svn: 216200
* [AArch64] Add an FP load balancing pass for Cortex-A57James Molloy2014-08-081-0/+4
| | | | | | | | | | | | For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations. This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU. Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness). This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected. llvm-svn: 215199
* MachineCombiner Pass for selecting faster instruction sequence on AArch64Gerolf Hoflehner2014-08-071-0/+6
| | | | | | | | | | Re-commit of r214832,r21469 with a work-around that avoids the previous problem with gcc build compilers The work-around is to use SmallVector instead of ArrayRef of basic blocks in preservesResourceLen()/MachineCombiner.cpp llvm-svn: 215151
* [AArch64] Add a testcase for r214957.James Molloy2014-08-061-1/+8
| | | | llvm-svn: 214965
* Revert "r214832 - MachineCombiner Pass for selecting faster instruction"Kevin Qin2014-08-051-6/+0
| | | | | | | It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845
* MachineCombiner Pass for selecting faster instructionGerolf Hoflehner2014-08-051-0/+6
| | | | | | | | | | | sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832
* Revert "r214669 - MachineCombiner Pass for selecting faster instruction"Kevin Qin2014-08-041-6/+0
| | | | | | This commit broke "make check" for several hours, so get it reverted. llvm-svn: 214697
* MachineCombiner Pass for selecting faster instructionGerolf Hoflehner2014-08-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669
* Run sort_includes.py on the AArch64 backend.Benjamin Kramer2014-07-251-1/+1
| | | | | | No functionality change. llvm-svn: 213938
* AArch64: remove "arm64_be" support in favour of "aarch64_be".Tim Northover2014-07-231-3/+1
| | | | | | | | | There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748
* AArch64: Re-enable AArch64AddressTypePromotionDuncan P. N. Exon Smith2014-07-021-0/+2
| | | | | | | | | | | This reverts commits r212189 and r212190. While this pass was accidentally disabled (until r212073), r205437 slipped in a use of `auto` that should have been `auto&`. This fixes PR20188. llvm-svn: 212201
* AArch64: Temporarily disable AArch64AddressTypePromotionDuncan P. N. Exon Smith2014-07-021-2/+0
| | | | | | | Temporarily disable AArch64AddressTypePromotion, which was effectively re-enabled in r212073 and r212075, while I look into PR20188. llvm-svn: 212189
* Move AArch64TargetLowering to AArch64Subtarget.Eric Christopher2014-06-101-1/+1
| | | | | | | This currently necessitates a TargetMachine for the TargetLowering constructor and TLOF. llvm-svn: 210605
* Move AArch64InstrInfo to AArch64Subtarget.Eric Christopher2014-06-101-2/+1
| | | | llvm-svn: 210599
* Move AArch64SelectionDAGInfo down to the subtarget.Eric Christopher2014-06-101-1/+1
| | | | llvm-svn: 210557
* Have AArch64SelectionDAGInfo take a DataLayout parameter ratherEric Christopher2014-06-101-1/+1
| | | | | | than a TargetMachine. llvm-svn: 210554
* Move DataLayout onto the AArch64 subtarget.Eric Christopher2014-06-101-7/+0
| | | | llvm-svn: 210552
* Move AArch64FrameLowering into the subtarget.Eric Christopher2014-06-101-1/+1
| | | | llvm-svn: 210549
* Remove the uses of AArch64TargetMachine and AArch64Subtarget fromEric Christopher2014-06-101-2/+1
| | | | | | AArch64FrameLowering. llvm-svn: 210548
* ARM & AArch64: make use of common cmpxchg idioms after expansionTim Northover2014-05-301-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | The C and C++ semantics for compare_exchange require it to return a bool indicating success. This gets mapped to LLVM IR which follows each cmpxchg with an icmp of the value loaded against the desired value. When lowered to ldxr/stxr loops, this extra comparison is redundant: its results are implicit in the control-flow of the function. This commit makes two changes: it replaces that icmp with appropriate PHI nodes, and then makes sure earlyCSE is called after expansion to actually make use of the opportunities revealed. I've also added -{arm,aarch64}-enable-atomic-tidy options, so that existing fragile tests aren't perturbed too much by the change. Many of them either rely on undef/unreachable too pervasively to be restored to something well-defined (particularly while making sure they test the same obscure assert from many years ago), or depend on a particular CFG shape, which is disrupted by SimplifyCFG. rdar://problem/16227836 llvm-svn: 209883
* AArch64/ARM64: move ARM64 into AArch64's placeTim Northover2014-05-241-0/+208
| | | | | | | | | | | | | | | This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577
* AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.Tim Northover2014-05-241-121/+0
| | | | | | | | | | | | | | | | I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576
* [C++11] Add 'override' keywords and remove 'virtual'. Additionally add ↵Craig Topper2014-04-291-2/+2
| | | | | | 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. AArch64 edition llvm-svn: 207510
* [AArch64] Enable global merge pass.Jiangning Liu2014-04-221-0/+9
| | | | llvm-svn: 206861
* Add AArch64 big endian Target (aarch64_be)Christian Pirker2014-02-241-4/+26
| | | | llvm-svn: 202024
* [AArch64] Add support for TargetTransformInfo Analysis.Chad Rosier2014-02-201-0/+8
| | | | llvm-svn: 201793
* Test commit - remove the new line to ↵Christian Pirker2014-02-191-1/+0
| | | | | | lib/Target/AArch64/AArch64TargetMachine.cpp. llvm-svn: 201698
OpenPOWER on IntegriCloud