summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Merge memtag instructions with adjacent stack slots."Evgenii Stepanov2020-01-081-20/+4
| | | | | | | | | | | | *** Bad machine code: Tied use must be a register *** - function: stg_alloca17 - basic block: %bb.0 entry (0x20076710580) - instruction: early-clobber %0:gpr64common, early-clobber %1:gpr64sp = STGloop 272, %stack.0.a :: (store 272 into %ir.a, align 16) - operand 3: %stack.0.a http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/21481/steps/test-check-all/logs/stdio This reverts commit b675a7628ce6a21b1e4a71c079a67badfb8b073d.
* Merge memtag instructions with adjacent stack slots.Evgenii Stepanov2020-01-081-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Detect a run of memory tagging instructions for adjacent stack frame slots, and replace them with a shorter instruction sequence * replace STG + STG with ST2G * replace STGloop + STGloop with STGloop This code needs to run when stack slot offsets are already known, but before FrameIndex operands in STG instructions are eliminated; that's the reason for the new hook in PrologueEpilogue. This change modifies STGloop and STZGloop pseudos to take the size as an immediate integer operand, and base address as a FI operand when possible. This is needed to simplify recognizing an STGloop instruction as operating on a stack slot post-regalloc. This improves memtag code size by ~0.25%, and it looks like an additional ~0.1% is possible by rearranging the stack frame such that consecutive STG instructions reference adjacent slots (patch pending). Reviewers: pcc, ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70286
* Lower TAGPstack with negative offset to SUBG.Evgenii Stepanov2020-01-061-2/+4
| | | | | | | | | | | | | | Summary: This never really occurs in the current codegen, so only a MIR test is possible. Reviewers: ostannard, pcc Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72123
* [AArch64ExpandPseudos] Preserve renamable state when expanding MOVi64 & co.Florian Hahn2019-11-121-2/+6
| | | | | | | | | | | If the MOVi operand was renamable, the operands of the expanded instructions are also renamable. Reviewers: thegameg, samparker, zatrazz Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D70061
* [AArch64] Stackframe accesses to SVE objects.Sander de Smalen2019-10-141-1/+1
| | | | | | | | | | | | | | | | | Materialize accesses to SVE frame objects from SP or FP, whichever is available and beneficial. This patch still assumes the objects are pre-allocated. The automatic layout of SVE objects within the stackframe will be added in a separate patch. Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D67749 llvm-svn: 374772
* AArch64: fix EXPENSIVE_CHECKS for arm64_32.Tim Northover2019-09-131-1/+1
| | | | | | | For some reason I'd decided to mark the end-result of a GOT load as dead. It's clearly not (necessarily). llvm-svn: 371883
* AArch64: support arm64_32, an ILP32 slice for watchOS.Tim Northover2019-09-121-4/+18
| | | | | | | | This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722
* [aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-121-15/+15
| | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Manual fixups in: AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned* AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register Depends on D65919 Reviewers: aemerson Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision for full review was: https://reviews.llvm.org/D65962 llvm-svn: 368628
* [AArch64] NFC: Add generic StackOffset to describe scalable offsets.Sander de Smalen2019-08-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support spilling/filling of scalable vectors we need a more generic representation of a stack offset than simply 'int'. For this we introduce the StackOffset struct, which comprises multiple offsets sized by their respective MVTs. Byte-offsets will thus be a simple tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different types, we can canonicalise them to use the same MVT, as long as their runtime sizes are guaranteed to have the same size-ratio as they would have at compile-time. When we have both scalable- and fixed-size objects on the stack, we can create an offset that is: ({ offset_fixed, MVT::i8 } + { offset_scalable, MVT::nxv1i8 }) The struct also contains a getForFrameOffset() method that is specific to AArch64 and decomposes the frame-offset to be used directly in instructions that operate on the stack or index into the stack. Note: This patch adds StackOffset as an AArch64-only concept, but we would like to make this a generic concept/struct that is supported by all interfaces that take or return stack offsets (currently as 'int'). Since that would be a bigger change that is currently pending on D32530 landing, we thought it makes sense to first show/prove the concept in the AArch64 target before proposing to roll this out further. Reviewers: thegameg, rovka, t.p.northover, efriedma, greened Reviewed By: rovka, greened Differential Revision: https://reviews.llvm.org/D61435 llvm-svn: 368024
* AArch64: Add a tagged-globals backend feature.Peter Collingbourne2019-07-311-0/+17
| | | | | | | | | | | | | | | | | | | This feature instructs the backend to allow locally defined global variable addresses to contain a pointer tag in bits 56-63 that will be ignored by the hardware (i.e. TBI), but may be used by an instrumentation pass such as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD sequence that sets bits 48-63 to the corresponding bits of the global, with the linker bounds check disabled on the ADRP instruction to prevent the tag from causing a link failure. This implementation of the feature omits the MOVK when loading from or storing to a global, which is sufficient for TBI. If the same approach is extended to MTE, assuming that 0 is not configured as a catch-all tag, we will most likely also need the MOVK in this case in order to avoid a tag mismatch. Differential Revision: https://reviews.llvm.org/D65364 llvm-svn: 367475
* Basic codegen for MTE stack tagging.Evgeniy Stepanov2019-07-171-0/+102
| | | | | | | | | | | | Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360
* [AArch64] Allow -mattr=tpidr-el[1|2|3]Oliver Stannard2019-03-211-0/+6
| | | | | | | | | | | Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base register, rather than the default TPIDR_EL0. Patch by Philip Derrin! Differential revision: https://reviews.llvm.org/D54685 llvm-svn: 356657
* [AArch64] Refactor floating point materialization. NFCAdhemerval Zanella2019-03-181-463/+41
| | | | | | | | | | | | | | | It splits the login of actual instruction emission away from the logic that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm. The new function AArch64_IMM::expandMOVImm, which return the list of the instructions to materialize the immediate constant, is implemented on a separated unit because it will be used in a subsequent patch to optimize floating point materialization. Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58915 llvm-svn: 356387
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AArch64] Add Tiny Code Model for AArch64David Green2018-08-221-25/+44
| | | | | | | | | | | | | | This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397
* [AArch64] Improve orr+movk sequences for MOVi64imm.Eli Friedman2018-05-241-115/+96
| | | | | | | | | | | | | | The existing code has three different ways to try to lower a 64-bit immediate to the sequence ORR+MOVK. The result is messy: it misses some possible sequences, and the order of the checks means we sometimes emit two MOVKs when we only need one. Instead, just use a simple loop to try all possible two-instruction ORR+MOVK sequences. Differential Revision: https://reviews.llvm.org/D47176 llvm-svn: 333218
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-14/+14
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/strMartin Storsjo2018-03-121-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-1/+1
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Insert IMPLICIT_DEFS for undef uses in tail mergingMatthias Braun2017-09-061-12/+10
| | | | | | | | | | | | | | | | | | | | | Tail merging can convert an undef use into a normal one when creating a common tail. Doing so can make the register live out from a block which previously contained the undef use. To keep the liveness up-to-date, insert IMPLICIT_DEFs in such blocks when necessary. To enable this patch the computeLiveIns() function which used to compute live-ins for a block and set them immediately is split into new functions: - computeLiveIns() just computes the live-ins in a LivePhysRegs set. - addLiveIns() applies the live-ins to a block live-in list. - computeAndAddLiveIns() is a convenience function combining the other two functions and behaving like computeLiveIns() before this patch. Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D37034 llvm-svn: 312668
* Revert "[AArch64] Simplify AES*Tied pseudo expansion (NFC)."Tim Northover2017-08-031-3/+10
| | | | | | | | | | This reverts commit r309821. My suggestion was wrong because it left the MachineOperands tied which confused the verifier. Since there's no easy way to untie operands, the original BuildMI solution is probably best. llvm-svn: 309962
* [AArch64] Simplify AES*Tied pseudo expansion (NFC).Florian Hahn2017-08-021-10/+3
| | | | | | | | | | | | | | | | Summary: Suggested by @t.p.northover in https://bugs.llvm.org/show_bug.cgi?id=34015. Reviewers: javed.absar, t.p.northover, rengolin Reviewed By: t.p.northover Subscribers: aemerson, kristof.beyls, llvm-commits, t.p.northover Differential Revision: https://reviews.llvm.org/D36223 llvm-svn: 309821
* [AArch64] Tie source and destination operands for AESMC/AESIMC. Florian Hahn2017-07-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495
* [AArch64] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-07-251-10/+30
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309062
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* AArch64: Fix cmpxchg O0 expansionMatthias Braun2017-05-261-58/+61
| | | | | | | | | | | | | | - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304048
* LivePhysRegs: Rework constructor + documentation; NFCMatthias Braun2017-05-261-2/+2
| | | | | | | - Take reference instead of pointer to a TRI that cannot be nullptr. - Improve documentation comments. llvm-svn: 304038
* AArch64: lower "fence singlethread" to a pure compiler barrier.Tim Northover2017-04-201-0/+1
| | | | | | | | Single-threaded fences aren't required to provide any synchronization with other processing elements so there's no need for a DMB. They should still be a barrier for compiler optimizations though. llvm-svn: 300905
* [AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsiaPetr Hosek2017-04-041-1/+6
| | | | | | | | | | | This mode is just like -mcmodel=small except that it moves the thread pointer from TPIDR_EL0 to TPIDR_EL1. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31624 llvm-svn: 299462
* [AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as not having side effects.Chad Rosier2017-03-271-0/+8
| | | | | | | | | Among other things, this allows Machine LICM to hoist a costly 'mrs' instruction from within a loop. Differential Revision: http://reviews.llvm.org/D31151 llvm-svn: 298851
* [AArch64] Generate literals by the little endEvandro Menezes2017-01-181-6/+6
| | | | | | | | | | | ARM seems to prefer that long literals be formed from their little end in order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section 4.14). Differential revision: https://reviews.llvm.org/D28697 llvm-svn: 292422
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-21/+19
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* AArch64: Enable post-ra liveness updatesMatthias Braun2016-12-161-1/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
* AArch64: fix 128-bit cmpxchg at -O0 (again, again).Tim Northover2016-12-011-6/+14
| | | | | | | | | | | | | | | This time the issue is fortunately just a simple mistake rather than a horrible design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for comparing two 64-bit values, but they don't. The fix is slightly clunkier in AArch64 because we can't use conditional execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway. Thanks to Anton Korobeynikov for pointing out the issue. llvm-svn: 288418
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* [AArch64] Register AArch64LoadStoreOptimizer so it can be run by llc ↵Geoff Berry2016-07-201-4/+0
| | | | | | -run-pass. NFCI. llvm-svn: 276193
* Move helper classes into anonymous namespaces. NFC.Benjamin Kramer2016-05-151-1/+1
| | | | llvm-svn: 269591
* livePhysRegs: Pass MBB by reference in addLive{Ins|Outs}(); NFCMatthias Braun2016-05-031-2/+2
| | | | | | | The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340
* LivePhysRegs: Automatically determine presence of pristine regs.Matthias Braun2016-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336
* [AArch64] Set AddPristinesAndCSRs to expandCMP_SWAP LivePhysRegs.Ahmed Bougacha2016-04-271-2/+2
| | | | | | | | | We run after PEI. Found via inspection; no obvious testcase. Follow-up to r266339. llvm-svn: 267780
* [AArch64] Set correct successors in CMPXCHG pseudo expansion.Ahmed Bougacha2016-04-271-2/+4
| | | | | | | | | | | transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779
* AArch64: expand cmpxchg after regalloc at -O0.Tim Northover2016-04-141-3/+201
| | | | | | | | | | | | | | | | | | | FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339
* AArch64: avoid clobbering SP for dead MOVimm pseudos.Tim Northover2016-04-011-1/+8
| | | | | | | | We were producing ORR, which actually defines a GPR32sp rather than a GPR32. Should fix PR23209. llvm-svn: 265198
* [AArch64] Register (existing) AArch64ExpandPseudo pass with LLVM pass manager.Chad Rosier2015-08-051-2/+13
| | | | | | | Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass. llvm-svn: 244046
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-1/+1
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-1/+1
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* Transfer implicit operands when expanding the RET_ReallyLR pseudo instruction.Juergen Ributzka2015-03-301-3/+6
| | | | | | | | | | | | When we expand the RET_ReallyLR pseudo instruction we also need to transfer the implicit operands. The return register is an implicit operand and without it the liveness calculation generates an incorrect live-out set for the patchpoint. This fixes rdar://problem/19068476. llvm-svn: 233635
* MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line ↵Benjamin Kramer2015-02-121-2/+2
| | | | | | | | with countTrailingZeros Update all callers. llvm-svn: 228930
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-051-2/+1
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-1/+3
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
OpenPOWER on IntegriCloud