summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [ShrinkWrap] Add (a simplified version) of shrink-wrapping.Quentin Colombet2015-05-051-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. ** Context ** Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. ** Motivating example ** Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. ** Proposed Solution ** This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. ** Design Decisions ** 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507
* In preparation for moving ARM's TargetRegisterInfo to the TargetMachineEric Christopher2015-03-121-7/+7
| | | | | | | merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable us to match the TargetMachine for our TargetRegisterInfo classes. llvm-svn: 232117
* Have getCalleeSavedRegs take a non-null MachineFunction all theEric Christopher2015-03-111-1/+1
| | | | | | | | time. The target independent code was passing in one all the time and targets weren't checking validity before using. Update a few calls to pass in a MachineFunction where necessary. llvm-svn: 231970
* ARM: simplify and extend byval handlingTim Northover2015-03-111-4/+2
| | | | | | | | | | | | | | | | | | | The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959
* Migrate ARM except for TTI, AsmPrinter, and frame loweringEric Christopher2015-01-291-20/+13
| | | | | | away from getSubtargetImpl. llvm-svn: 227399
* Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag.Adrian Prantl2014-12-221-7/+14
| | | | | | | | | | | | | Followup to r224294: ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224743
* Re-apply r214881: Fix return sequence on armv4 thumbJonathan Roelofs2014-08-051-19/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928
* Revert r214881 because it broke lots of build-botsJonathan Roelofs2014-08-051-59/+19
| | | | llvm-svn: 214893
* Fix return sequence on armv4 thumbJonathan Roelofs2014-08-051-19/+59
| | | | | | | | | | | | | | | | | | | | | | | | | POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-051-13/+8
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-13/+24
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Move the frame lowering constructors out of line to avoid circularEric Christopher2014-06-261-0/+3
| | | | | | includes. llvm-svn: 211798
* Make consistent use of MCPhysReg instead of uint16_t throughout the tree.Craig Topper2014-04-041-2/+2
| | | | llvm-svn: 205610
* Replace PROLOG_LABEL with a new CFI_INSTRUCTION.Rafael Espindola2014-03-071-32/+31
| | | | | | | | | | | | | | | | | | | | | | | The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
* Fix clang -Werror build break due to mismatched sign comparison.David Blaikie2014-03-051-1/+1
| | | | | | Originally committed in r202985. llvm-svn: 202992
* ARM: Correctly align arguments after a byval struct is passed on the stackOliver Stannard2014-03-051-9/+15
| | | | llvm-svn: 202985
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-3/+3
| | | | | | Remove the old functions. llvm-svn: 202636
* Generate the DWARF stack frame decode operations in the function prologue ↵Artyom Skrobov2014-02-141-14/+90
| | | | | | | | for ARM/Thumb functions. Patch by Keith Walker! llvm-svn: 201423
* ARM: correctly determine final tBX_LR in Thumb1 functionsTim Northover2014-01-141-3/+3
| | | | | | | | | | | | | The changes caused by folding an sp-adjustment into a "pop" previously disrupted the forward search for the final real instruction in a terminating block. This switches to a backward search (skipping debug instrs). This fixes PR18399. Patch by Zhaoshi. llvm-svn: 199266
* ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks.Tim Northover2014-01-061-1/+1
| | | | | | | | | | | | | | | | | | The ARM backend has been using most of the MachO related subtarget checks almost interchangeably, and since the only target it's had to run on has been IOS (which is all three of MachO, Darwin and IOS) it's worked out OK so far. But we'd like to support embedded targets under the "*-*-none-macho" triple, which means everything starts falling apart and inconsistent behaviours emerge. This patch should pick a reasonably sensible set of behaviours for the new triple (and any others that come along, with luck). Some choices were debatable (notably FP == r7 or r11), but we can revisit those later when deficiencies become apparent. llvm-svn: 198617
* ARM: decide whether to use movw/movt based on "minsize" attribute.Tim Northover2013-12-021-3/+3
| | | | llvm-svn: 196102
* ARM: fix bug in -Oz stack adjustment foldingTim Northover2013-12-011-7/+0
| | | | | | | | | | | Previously, we clobbered callee-saved registers when folding an "add sp, #N" into a "pop {rD, ...}" instruction. This change checks whether a register we're going to add to the "pop" could actually be live outside the function before doing so and should fix the issue. This should fix PR18081. llvm-svn: 196046
* ARM: fold prologue/epilogue sp updates into push/pop for code sizeTim Northover2013-11-081-4/+11
| | | | | | | | | | | | | | | | | | ARM prologues usually look like: push {r7, lr} sub sp, sp, #4 If code size is extremely important, this can be optimised to the single instruction: push {r6, r7, lr} where we don't actually care about the contents of r6, but pushing it subtracts 4 from sp as a side effect. This should implement such a conversion, predicated on the "minsize" function attribute (-Oz) since I've yet to find any code it actually makes faster. llvm-svn: 194264
* ARM: remove unnecessary state-tracking during frame lowering.Tim Northover2013-11-041-8/+4
| | | | | | | | | | | | | | | | | | | | | ResolveFrameIndex had what appeared to be a very nasty hack for when the frame-index referred to a callee-saved register. In this case it "adjusted" the offset so that the address was correct if (and only if) the MachineInstr immediately followed the respective push. This "worked" for all forms of GPR & DPR but was only ever used to set the frame pointer itself, and once this was put in a more sensible location the entire state-tracking machinery it relied on became redundant. So I stripped it. The only wrinkle is that "add r7, sp, #0" might theoretically be slower (need an actual ALU slot) compared to "mov r7, sp" so I added a micro-optimisation that also makes emitARMRegUpdate and emitT2RegUpdate also work when NumBytes == 0. No test changes since there shouldn't be any functionality change. llvm-svn: 194025
* PR15868 fix.Stepan Dyatkovskiy2013-05-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduction: In case when stack alignment is 8 and GPRs parameter part size is not N*8: we add padding to GPRs part, so part's last byte must be recovered at address K*8-1. We need to do it, since remained (stack) part of parameter starts from address K*8, and we need to "attach" "GPRs head" without gaps to it: Stack: |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes... [ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ... FIX: Note, once we added padding we need to correct *all* Arg offsets that are going after padded one. That's why we need this fix: Arg offsets were never corrected before this patch. See new test-cases included in patch. We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8. Though, stack area, allocated for recovered byval params, must satisfy "Size mod 8 = 0" restriction. This patch reduces stack usage for some cases: We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be "packed" with alignment 4 in some cases. llvm-svn: 182237
* Refactoring patch.Stepan Dyatkovskiy2013-04-301-7/+7
| | | | | | | | | | | | 1. VarArgStyleRegisters: functionality that emits "store" instructions for byval regs moved out into separated method "StoreByValRegs". Before this patch VarArgStyleRegisters had confused use-cases. It was used for both variadic functions and for regular functions with byval parameters. In last case it created new stack-frame and registered it as VarArg frame, that is wrong. This patch replaces VarArgsStyleRegisters usage for byval parameters with StoreByValRegs method. 2. In ARMMachineFunctionInfo, "get/setVarArgsRegSaveSize" was renamed to "get/setArgRegsSaveSize". By the same reason. Sometimes it was used for variadic functions, and sometimes for byval parameters in regular functions. Actually, this property means the size of registers, that keeps arguments, and thats why it was renamed. 3. In ARMISelLowering.cpp, ARMTargetLowering class, in methods computeRegArea and StoreByValRegs, VARegXXXXXX was renamed to ArgRegsXXXXXX still by the same reasons. llvm-svn: 180774
* Move the eliminateCallFramePseudoInstr method from TargetRegisterInfoEli Bendersky2013-02-211-0/+35
| | | | | | | | | | | | | | | to TargetFrameLowering, where it belongs. Incidentally, this allows us to delete some duplicated (and slightly different!) code in TRI. There are potentially other layering problems that can be cleaned up as a result, or in a similar manner. The refactoring was OK'd by Anton Korobeynikov on llvmdev. Note: this touches the target interfaces, so out-of-tree targets may be affected. llvm-svn: 175788
* Fix thumbv5e frame lowering assertion failure.Logan Chien2013-02-201-3/+6
| | | | | | | | | | | | | | It is possible that frame pointer is not found in the callee saved info, thus FramePtrSpillFI may be incorrect if we don't check the result of hasFP(MF). Besides, if we enable the stack coloring algorithm, there will be an assertion to ensure the slot is live. But in the test case, %var1 is not live in the prologue of the function, and we will get the assertion failure. Note: There is similar code in ARMFrameLowering.cpp. llvm-svn: 175616
* Add an MF argument to MI::copyImplicitOps().Jakob Stoklund Olesen2012-12-201-2/+2
| | | | | | | | | This function is often used to decorate dangling instructions, so a context reference is required to allocate memory for the operands. Also add a corresponding MachineInstrBuilder method. llvm-svn: 170797
* Reorder includes to match coding standards. Fix an issue or two exposed by that.Craig Topper2012-03-171-1/+0
| | | | llvm-svn: 152978
* Use uint16_t to store registers in callee saved register tables to reduce ↵Craig Topper2012-03-041-3/+3
| | | | | | size of static data. llvm-svn: 151996
* Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, ↵Jia Liu2012-02-181-1/+1
| | | | | | MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878
* Don't forget to transfer implicit uses of return instruction.Evan Cheng2012-01-081-2/+5
| | | | llvm-svn: 147752
* Copy implicit defs (e.g. r0) when changing tBX_RET to tPOP_RET. This bug isEvan Cheng2012-01-071-0/+1
| | | | | | | exposed with an upcoming change will would delete the copy to return register because there is no use! It's amazing anything works. llvm-svn: 147715
* Fix more places which should be checking for iOS, not darwin.Evan Cheng2012-01-041-1/+1
| | | | llvm-svn: 147513
* Revert 142337. Thumb1 still doesn't support dynamic stack realignment. :(Chad Rosier2011-10-201-26/+4
| | | | llvm-svn: 142557
* Add support for dynamic stack realignment when in thumb1 mode.Chad Rosier2011-10-181-4/+26
| | | | | | rdar://10288916 llvm-svn: 142337
* Thumb1 does not support dynamic stack realignment.Chad Rosier2011-10-151-0/+5
| | | | | | | | | | | rdar://10288916 is tracking this fix. In the past, instcombine and other passes were promoting alloca alignment past the natural alignment, resulting in dynamic stack realignment. Lang's work now prevents this from happening (LLVM commit r141599). Now that this really shouldn't happen report a fatal error rather than silently generate bad code. llvm-svn: 142028
* Revert r140924 "Attempt to fix dynamic stack realignment for thumb1 functions."Chad Rosier2011-10-011-21/+0
| | | | | | to appease nightly testers. Not quite there yet. llvm-svn: 140953
* Attempt to fix dynamic stack realignment for thumb1 functions. It is in fact Chad Rosier2011-10-011-0/+21
| | | | | | | | useful if an optimization assumes the stack has been realigned. Credit to Eli for his assistance. rdar://10043857 llvm-svn: 140924
* Tidy up a few 80 column violations.Jim Grosbach2011-09-131-1/+1
| | | | llvm-svn: 139636
* Thumb1 ADD/SUB SP instructions are predicable in Thumb2 mode.Jim Grosbach2011-08-241-2/+2
| | | | | | | | | Add the predicate operand to the instructions. Update the back end accordingly where the instructions are used. Restrict the SP operands to actually only be SP, as otherwise these break assembly parsing for the normal instruction variants. llvm-svn: 138445
* Make tBX_RET and tBX_RET_vararg predicable.Jim Grosbach2011-07-081-2/+2
| | | | | | | | | | The normal tBX instruction is predicable, so there's no reason the pseudos for using it as a return shouldn't be. Gives us some nice code-gen improvements as can be seen by the test changes. In particular, several tests now have to disable if-conversion because it works too well and defeats the test. llvm-svn: 134746
* Refact ARM Thumb1 tMOVr instruction family.Jim Grosbach2011-06-301-3/+3
| | | | | | | | | | Merge the tMOVr, tMOVgpr2tgpr, tMOVtgpr2gpr, and tMOVgpr2gpr instructions into tMOVr. There's no need to keep them separate. Giving the tMOVr instruction the proper GPR register class for its operands is sufficient to give the register allocator enough information to do the right thing directly. llvm-svn: 134204
* Thumb1 register to register MOV instruction is predicable.Jim Grosbach2011-06-301-5/+8
| | | | | | | | | Fix a FIXME and allow predication (in Thumb2) for the T1 register to register MOV instructions. This allows some better codegen with if-conversion (as seen in the test updates), plus it lays the groundwork for pseudo-izing the tMOVCC instructions. llvm-svn: 134197
* Refactor away tSpill and tRestore pseudos in ARM backend.Jim Grosbach2011-06-291-1/+1
| | | | | | | The tSpill and tRestore instructions are just copies of the tSTRspi and tLDRspi instructions, respectively. Just use those directly instead. llvm-svn: 134092
* Fix coordination for using R4 in Thumb1 as a scratch for SP restore.Jim Grosbach2011-06-131-2/+2
| | | | | | | | The logic for reserving R4 for use as a scratch needs to match that for actually using it. Also, it's not necessary for immediate <=508, so adjust the value checked. llvm-svn: 132934
* Implement frame unwinding information emission for Thumb1. Not finished yet ↵Anton Korobeynikov2011-03-051-11/+17
| | | | | | because there is no way given the constpool index to examine the actual entry: the reason is clones inserted by constant island pass, which are not tracked at all! The only connection is done during asmprinting time via magic label names which is really gross and needs to be eventually fixed. llvm-svn: 127104
* Preliminary support for ARM frame save directives emission via MI flags.Anton Korobeynikov2011-03-051-4/+4
| | | | | | | This is just very first approximation how the stuff should be done (e.g. ARM-only for now). More to follow. llvm-svn: 127101
* Teach frame lowering to ignore debug values after the terminators.Jakob Stoklund Olesen2011-01-131-1/+1
| | | | llvm-svn: 123399
OpenPOWER on IntegriCloud