summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64/AArch64InstrInfo.h
Commit message (Collapse)AuthorAgeFilesLines
...
* [MachineOutliner] AArch64: Avoid saving + restoring LR if possibleJessica Paquette2017-09-271-8/+7
| | | | | | | | | | | | | | | | This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-131-1/+2
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* [AArch64] Adjust the cost model for Exynos M1 and M2Evandro Menezes2017-08-281-0/+3
| | | | | | | | | Add new predicate to more accurately model the cost of arithmetic and logical operations shifted left. Differential revision: https://reviews.llvm.org/D37151 llvm-svn: 311943
* [MachineOutliner] NFC: Change IsTailCall to a call class + frame classJessica Paquette2017-07-291-50/+103
| | | | | | | | | | | | | | | | | | | | | This commit - Removes IsTailCall and replaces it with a target-defined unsigned - Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall - Adds a call class + frame class classification to OutlinedFunction and Candidate respectively This accomplishes a couple things. Firstly, we don't need the notion of *tail call* in the general outlining algorithm. Secondly, we now can have different "outlining classes" for each candidate within a set of candidates. This will make it easy to add new ways to outline sequences for certain targets and dynamically choose an appropriate cost model for a sequence depending on the context that that sequence lives in. Ultimately, this should get us closer to being able to do something like, say avoid saving the link register when outlining AArch64 instructions. llvm-svn: 309475
* [MachineOutliner] NFC: Split up getOutliningBenefitJessica Paquette2017-07-281-2/+4
| | | | | | | | | | | | | | | | | | | | | This is some more cleanup in preparation for some actual functional changes. This splits getOutliningBenefit into two cost functions: getOutliningCallOverhead and getOutliningFrameOverhead. These functions return the number of instructions that would be required to call a specific function and the number of instructions that would be required to construct a frame for a specific funtion. The actual outlining benefit logic is moved into the outliner, which calls these functions. The goal of refactoring getOutliningBenefit is to: - Get us closer to getting rid of the IsTailCall flag - Further split up "target-specific" things and "general algorithm" things llvm-svn: 309356
* Remove unused function from AArch64 backend (NFC)Adrian Prantl2017-07-271-4/+0
| | | | llvm-svn: 309336
* fix typos in comments; NFCHiroshi Inoue2017-07-161-1/+1
| | | | llvm-svn: 308126
* [AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1)Geoff Berry2017-07-141-0/+10
| | | | | | | | | | | | | | | | | | | | Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059
* [MIR] Add support for printing and parsing target MMO flagsGeoff Berry2017-07-131-0/+2
| | | | | | | | | | | | | | | | | Summary: Add target hooks for printing and parsing target MMO flags. Targets may override getSerializableMachineMemOperandTargetFlags() to return a mapping from string to flag value for target MMO values that should be serialized/parsed in MIR output. Add implementation of this hook for AArch64 SuppressPair MMO flag. Reviewers: bogner, hfinkel, qcolombet, MatzeB Subscribers: mcrosier, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34962 llvm-svn: 307877
* Doxygen formatting. NFCIJoel Jones2017-07-101-2/+2
| | | | llvm-svn: 307597
* [AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".Chad Rosier2017-06-231-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (Bcc), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling Bcc instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can reduce register pressure and is also the default behavior for GCC. A few examples: add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS. cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed. add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses. cbz w8, .LBB1_2 -> b.eq .LBB1_2 sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses. tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2 In looking at all current sub-target machine descriptions, this transformation appears to be either positive or neutral. Differential Revision: https://reviews.llvm.org/D34220. llvm-svn: 306144
* [AArch64][Falkor] Refine sched details for LSLfast/ASRfast.Geoff Berry2017-05-231-1/+1
| | | | llvm-svn: 303682
* Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"Hans Wennborg2017-04-211-1/+1
| | | | | | | | | In addition to the original commit, tighten the condition for when to pad empty functions to COFF Windows. This avoids running into problems when targeting e.g. Win32 AMDGPU, which caused test failures when this was committed initially. llvm-svn: 301047
* Revert r301040 "X86: Don't emit zero-byte functions on Windows"Hans Wennborg2017-04-211-1/+1
| | | | | | This broke almost all bots. Reverting while fixing. llvm-svn: 301041
* X86: Don't emit zero-byte functions on WindowsHans Wennborg2017-04-211-1/+1
| | | | | | | | | | | | | | | | | | Empty functions can lead to duplicate entries in the Guard CF Function Table of a binary due to multiple functions sharing the same RVA, causing the kernel to refuse to load that binary. We had a terrific bug due to this in Chromium. It turns out we were already doing this for Mach-O in certain situations. This patch expands the code for that in AsmPrinter::EmitFunctionBody() and renames TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it seems it was used for not just Mach-O anyway. Differential Revision: https://reviews.llvm.org/D32330 llvm-svn: 301040
* [AArch64] Refine Falkor Machine Model - Part 3Balaram Makam2017-04-081-1/+3
| | | | | | | | | This concludes the refinements to Falkor Machine Model. It includes SchedPredicates for immediate zero and LSL Fast. Forwarding logic is also modeled for vector multiply and accumulate only. llvm-svn: 299810
* [Outliner] Add outliner for AArch64Jessica Paquette2017-03-171-0/+34
| | | | | | | | | | | | | | | | | This commit adds the necessary target hooks for outlining in AArch64. It also refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a more general function, `getMemOpInfo`. This allows the outliner to share that code without copying and pasting it. The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with the X86-64 outliner. The test for this pass verifies that the outliner does, in fact outline functions, fixes up the stack accesses properly, and can correctly generate a tail call. In the future, this test should be replaced with a MIR test, so that we can properly test immediate offset overflows in fixed-up instructions. llvm-svn: 298162
* TargetInstrInfo: Provide default implementation of isTailCall().Matthias Braun2017-03-161-2/+0
| | | | | | | | | | In fact this default implementation should be the only implementation, keep it virtual for now to accomodate targets that don't model flags correctly. Differential Revision: https://reviews.llvm.org/D30747 llvm-svn: 297980
* [CodeGen] Move MacroFusion to the targetEvandro Menezes2017-02-011-3/+0
| | | | | | | | | | | | | This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737
* [XRay][AArch64] More staging for tail call support in XRay on AArch64 - in LLVMSerge Rogatch2017-01-251-0/+2
| | | | | | | | | | | | | | | | Summary: This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch. Related: https://reviews.llvm.org/D28948 (compiler-rt) Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson Differential Revision: https://reviews.llvm.org/D28947 llvm-svn: 293080
* [AArch64] Fold some filled/spilled subreg COPYsGeoff Berry2017-01-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend AArch64 foldMemoryOperandImpl() to handle folding spills of subreg COPYs with read-undef defs like: %vreg0:sub_32<def,read-undef> = COPY %WZR; GPR64:%vreg0 by widening the spilled physical source reg and generating: STRXui %XZR <fi#0> as well as folding fills of similar COPYs like: %vreg0:sub_32<def,read-undef> = COPY %vreg1; GPR64:%vreg0, GPR32:%vreg1 by generating: %vreg0:sub_32<def,read-undef> = LDRWui <fi#0> Reviewers: MatzeB, qcolombet Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27425 llvm-svn: 291180
* MachineScheduler: Export function to construct "default" scheduler.Matthias Braun2016-11-281-6/+2
| | | | | | | | | | | | | | | | | | This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
* AArch64: Move remaining target specific BranchRelaxation bits to TIIMatt Arsenault2016-10-061-4/+5
| | | | llvm-svn: 283458
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-141-2/+2
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-141-1/+1
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-141-2/+4
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* [AArch64] Re-factor code shared by AArch64LoadStoreOpt and AArch64InstrInfo.Geoff Berry2016-08-121-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | This re-factoring could cause the following slight changes in generated code, though none were observed during testing: - MachineScheduler could decide not to cluster some loads/stores if there are other load/stores with non-pairable opcodes that have the same base register and offset as a pairable set of load/stores. One case of different MachineScheduler pairing did show up in my testing, but it wasn't due to this issue, but due BaseMemOpClusterMutation::clusterNeighboringMemOps() being unstable w.r.t. the order it considers memory operations. See PR28942. - The ImplicitNullChecks optimization could be done for more load/store opcodes. This optimization isn't done for C/C++ code, so it didn't show up in my testing. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23365 llvm-svn: 278515
* AArch64: BranchRelaxtion cleanupsMatt Arsenault2016-08-021-0/+6
| | | | | | Move some logic into TII. llvm-svn: 277430
* TargetInstrInfo: add virtual function getInstSizeInBytesSjoerd Meijer2016-07-291-1/+1
| | | | | | | | | This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126
* TargetInstrInfo: rename GetInstSizeInBytes to getInstSizeInBytes. NFCSjoerd Meijer2016-07-281-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D22925 llvm-svn: 276997
* [AArch64] Mark various *Info classes as 'final'. NFC.Ahmed Bougacha2016-07-271-1/+1
| | | | llvm-svn: 276874
* Rename AnalyzeBranch* to analyzeBranch*.Jacques Pienaar2016-07-151-1/+1
| | | | | | | | | | | | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
* [CodeGen] Refactor MachineMemOperand::Flags's target-specific flags.Justin Lebar2016-07-141-6/+0
| | | | | | | | | | | | | | | | | Summary: Make the target-specific flags in MachineMemOperand::Flags real, bona fide enum values. This simplifies users, prevents various constants from going out of sync, and avoids the false sense of security provided by declaring static members in classes and then forgetting to define them inside of cpp files. Reviewers: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22372 llvm-svn: 275451
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-9/+11
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* [foldMemoryOperand()] Pass LiveIntervals to enable liveness check.Jonas Paulsson2016-05-101-1/+2
| | | | | | | | | | | | | | | SystemZ (and probably other targets as well) can fold a memory operand by changing the opcode into a new instruction that as a side-effect also clobbers the CC-reg. In order to do this, liveness of that reg must first be checked. When LIS is passed, getRegUnit() can be called on it and the right LiveRange is computed on demand. Reviewed by Matthias Braun. http://reviews.llvm.org/D19861 llvm-svn: 269026
* Cleanup comments. NFC.Chad Rosier2016-05-021-1/+1
| | | | llvm-svn: 268236
* [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)Gerolf Hoflehner2016-04-241-0/+5
| | | | | | | | | | | | | | | | | | | The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328
* Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64Daniel Sanders2016-04-221-5/+0
| | | | | | It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
* [MachineCombiner] Support for floating-point FMA on ARM64Gerolf Hoflehner2016-04-221-0/+5
| | | | | | | | | | | | | | | | Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098
* [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in ↵Evgeny Astigeevich2016-04-211-2/+2
| | | | | | | | | | | | | | | | AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969
* [MachineScheduler]Add support for store clusteringJun Bum Lim2016-04-151-1/+3
| | | | | | | | | | | | Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
* [AArch64][CodeGen] NFC refactor AArch64InstrInfo::optimizeCompareInstr to ↵Evgeny Astigeevich2016-04-061-0/+2
| | | | | | | | | | | | | | | | | | | prepare it for fixing a bug in it AArch64InstrInfo::optimizeCompareInstr has a bug which causes generation of incorrect code (PR#27158). The patch refactors the function to simplify reviewing the fix of the bug. 1. Function name ‘modifiesConditionCode’ is changed to ‘areCFlagsAccessedBetweenInstrs’ to reflect that the function can check modifying accesses, reading accesses or both. 2. Function ‘AArch64InstrInfo::optimizeCompareInstr’ - Documented the function - Cmp_NZCV is DeadNZCVIdx to reflect that it is an operand index of dead NZCV - The code for the case of substituting CmpInstr is put into separate functions the main of them is ‘substituteCmpInstr’. Differential Revision: http://reviews.llvm.org/D18609 llvm-svn: 265531
* [AArch64] Enable more load clustering in the MI Scheduler.Chad Rosier2016-03-181-0/+3
| | | | | | | | | | | | | This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 llvm-svn: 263819
* [AArch64] Move helper functions into TII, so they can be reused elsewhere. NFC.Chad Rosier2016-03-091-0/+6
| | | | llvm-svn: 263032
* [AArch64] Minor cleanup/remove redundant code. NFC.Chad Rosier2016-03-091-1/+1
| | | | llvm-svn: 263024
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-091-1/+1
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* [AArch64 MachineCombine] Enhance/Add support for general reassociation to ↵Haicheng Wu2016-01-071-1/+3
| | | | | | | | reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024
* replace MachineCombinerPattern namespace and enum with enum class; NFCISanjay Patel2015-11-051-2/+2
| | | | | | | | Also, remove an enum hack where enum values were used as indexes into an array. We may want to make this a real class to allow pattern-based queries/customization (D13417). llvm-svn: 252196
* MIR Serialization: Serialize the operand's bit mask target flags.Alex Lorenz2015-08-181-0/+8
| | | | | | | | | This commit adds support for bit mask target flag serialization to the MIR printer and the MIR parser. It also adds support for the machine operand's target flag serialization to the AArch64 target. Reviewers: Duncan P. N. Exon Smith llvm-svn: 245383
OpenPOWER on IntegriCloud