summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tablesJames Molloy2016-11-013-15/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690
* CodeGen: further loosen -O0 CG for WoA divisionSaleem Abdulrasool2016-10-311-5/+13
| | | | | | | | | | | | Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649
* ARM: ensure that the Windows DBZ check is in rangeSaleem Abdulrasool2016-10-272-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312
* [ARM] Predicate UMAAL selection on hasDSP.Sam Parker2016-10-272-2/+3
| | | | | | | | | | | | UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278
* ARM: don't rely on push/pop reglists being in order when folding SP adjust.Tim Northover2016-10-261-8/+19
| | | | | | | | | | | | | | | | | | It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232
* CodeGen/Passes: Pass MachineFunction as functor arg; NFCMatthias Braun2016-10-241-4/+4
| | | | | | | | Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039
* Revert r284580+r284917. ("Synthesize TBB/TBH instructions")Eli Friedman2016-10-243-180/+14
| | | | | | | The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993
* [ARM] Fix crash in ConstantIslandsJames Molloy2016-10-221-1/+3
| | | | | | tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* Reapply r284571 (with the new tests fixed).Sjoerd Meijer2016-10-191-14/+15
| | | | llvm-svn: 284588
* [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tablesJames Molloy2016-10-193-14/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 284580
* Revert of r284571 because of failing tests.Sjoerd Meijer2016-10-191-17/+14
| | | | llvm-svn: 284572
* Checking FP function attribute values and adding more build attribute tests.Sjoerd Meijer2016-10-191-14/+17
| | | | | | | | | | This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 llvm-svn: 284571
* Improve ARM lowering for "icmp <2 x i64> eq".Eli Friedman2016-10-181-6/+21
| | | | | | | | | The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 llvm-svn: 284536
* [ARM] Assign cost of scaling for Cortex-R52Javed Absar2016-10-181-1/+2
| | | | | | | | | | | | This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460
* [XRay] Support for for tail calls for ARM no-ThumbDean Michael Berris2016-10-183-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
* [ArmFastISel] Kill dead code. NFCI.Davide Italiano2016-10-161-35/+0
| | | | llvm-svn: 284320
* Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to helpEric Christopher2016-10-141-3/+3
| | | | | | readability a bit. llvm-svn: 284202
* [ARM]: Assign cost of scaling used in addressing mode for ARM coresJaved Absar2016-10-134-2/+29
| | | | | | | | | | | | | | | | | | | | | This patch assigns cost of the scaling used in addressing. On many ARM cores, a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. For instance: LDR R0, [R1, R2 LSL #2] LDR R0, [R1, -R2 LSL #2] Above, (1) takes less cycles than (2). By assigning appropriate scaling factor cost, we enable the LLVM to make the right trade-offs in the optimization and code-selection phase. Differential Revision: http://reviews.llvm.org/D24857 Reviewers: jmolloy, rengolin llvm-svn: 284127
* Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"Reid Kleckner2016-10-113-24/+375
| | | | | | | | | Reverts r283938 to reinstate r283867 with a fix. The original change had an ArrayRef referring to a destroyed temporary initializer list. Use plain C arrays instead. llvm-svn: 283942
* Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"Reid Kleckner2016-10-113-369/+24
| | | | | | | | | | | | | | | | | | This reverts r283867. This appears to be an infinite loop: while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) { if (HiRegsToSave.count(*HiRegToSave)) { ... CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end()); HiRegToSave = findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end()); } } llvm-svn: 283938
* ARMMachineFunctionInfo.cpp: Add an initializer of ↵NAKAMURA Takumi2016-10-111-2/+2
| | | | | | | | ARMFunctionInfo::ReturnRegsCount in the explicit ctor. It caused crash since r283867. llvm-svn: 283909
* Reformat.NAKAMURA Takumi2016-10-111-4/+4
| | | | llvm-svn: 283908
* Silence unused warning in non-assert builds.Daniel Jasper2016-10-111-0/+1
| | | | llvm-svn: 283899
* [Thumb] Save/restore high registers in Thumb1 pro/epiloguesOliver Stannard2016-10-113-24/+368
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The high registers are not allocatable in Thumb1 functions, but they could still be used by inline assembly, so we need to save and restore the callee-saved high registers (r8-r11) in the prologue and epilogue. This is complicated by the fact that the Thumb1 push and pop instructions cannot access these registers. Therefore, we have to move them down into low registers before pushing, and move them back after popping into low registers. In most functions, we will have low registers that are also being pushed/popped, which we can use as the temporary registers for saving/restoring the high registers. However, this is not guaranteed, so we may need to push some extra low registers to ensure that the high registers can be saved/restored. For correctness, it would be sufficient to use just one low register, but if we have enough low registers available then we only need one push/pop instruction, rather than one per high register. We can also use the argument/return registers when they are not live, and the link register when saving (but not restoring), reducing the number of extra registers we need to push. There are still a few extreme edge cases where we need two push/pop instructions, because not enough low registers can be made live in the prologue or epilogue. In addition to the regression tests included here, I've also tested this using a script to generate functions which clobber different combinations of registers, have different numbers of argument and return registers (including variadic arguments), allocate different fixed sized objects on the stack, and do or don't use variable sized allocas and the __builtin_return_address intrinsic (all of which affect the available registers in the prologue and epilogue). I ran these functions in a test harness which verifies that all of the callee-saved registers are correctly preserved. Differential Revision: https://reviews.llvm.org/D24228 llvm-svn: 283867
* [ARM] Fix registers clobbered by SjLj EH on soft-float targetsOliver Stannard2016-10-114-2/+15
| | | | | | | | | | | | | | | | | | | Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as clobbering all registers, including floating-point registers that may not be present on the target. This is technically true, as we could get linked against code that does use the FP registers, but that will not actually work, as the soft-float code cannot save and restore the FP registers. SjLj exception handling can only work correctly if either all or none of the code is built for a target with FP registers. Therefore, we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a soft-float target, it is only going to be linked against other soft-float code, and so only clobbers the general-purpose registers. This allows us to check that no non-savable registers are clobbered when generating the prologue/epilogue. Differential Revision: https://reviews.llvm.org/D25180 llvm-svn: 283866
* Revert r283690, "MC: Remove unused entities."Peter Collingbourne2016-10-101-1/+1
| | | | llvm-svn: 283814
* [ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEONAlexandros Lamprineas2016-10-101-2/+12
| | | | | | | | | | | | | | | The instructions VLDM/VSTM can only access word-aligned memory locations and produce alignment fault if the condition is not met. The compiler currently generates VLDM/VSTM for v2f64 load/store regardless the alignment of the memory access. Instead, if a v2f64 load/store is not word-aligned, the compiler should generate VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV instruction should be generated when targeting Big Endian. Differential Revision: https://reviews.llvm.org/D25281 llvm-svn: 283763
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-097-36/+54
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* MC: Remove unused entities.Peter Collingbourne2016-10-091-1/+1
| | | | llvm-svn: 283691
* Target: Remove unused entities.Peter Collingbourne2016-10-091-1/+1
| | | | llvm-svn: 283690
* Turn cl::values() (for enum) from a vararg function to using C++ variadic ↵Mehdi Amini2016-10-082-4/+2
| | | | | | | | | | | | | | | template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671
* [ARM]: add missing switch case for cortex-r52Javed Absar2016-10-071-0/+1
| | | | | | | Adds a missing switch case for handling cortex-r52 in init-subtarget-features. llvm-svn: 283551
* [ARM] Reapply: Use __rt_div functions for divrem on WindowsMartin Storsjo2016-10-071-21/+45
| | | | | | | | | | | | | | | | | | | | | | | | Reapplying r283383 after revert in r283442. The additional fix is a getting rid of a stray space in a function name, in the refactoring part of the commit. This avoids falling back to calling out to the GCC rem functions (__moddi3, __umoddi3) when targeting Windows. The __rt_div functions have flipped the two arguments compared to the __aeabi_divmod functions. To match MSVC, we emit a check for division by zero before actually calling the library function (even if the library function itself also might do the same check). Not all calls to __rt_div functions for division are currently merged with calls to the same function with the same parameters for the remainder. This is more wasteful than a div + mls as before, but avoids calls to __moddi3. Differential Revision: https://reviews.llvm.org/D25332 llvm-svn: 283550
* [ARM]: Add Cortex-R52 target to LLVMJaved Absar2016-10-073-4/+24
| | | | | | | This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. Cortex-R52 implements the ARMv8-R architecture. llvm-svn: 283542
* [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPIOliver Stannard2016-10-071-0/+10
| | | | | | | | | | | | With the ROPI and RWPI relocation models we can't always have pointers to global data or functions in constant data, so don't try to convert switches into lookup tables if any value in the lookup table would require a relocation. We can still safely emit lookup tables of other values, such as simple constants. Differential Revision: https://reviews.llvm.org/D24462 llvm-svn: 283530
* Use StringRef in ARMELFStreamer (NFC)Mehdi Amini2016-10-071-2/+2
| | | | llvm-svn: 283529
* Use StringReg in TargetParser APIs (NFC)Mehdi Amini2016-10-071-1/+1
| | | | llvm-svn: 283527
* Revert "[ARM] Use __rt_div functions for divrem on Windows"Diana Picus2016-10-061-45/+21
| | | | | | | | | | This reverts commit r283383 because it broke some of the bots: undefined reference to ` __aeabi_uldivmod' It affected (at least) clang-cmake-armv7-a15-selfhost, clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt. llvm-svn: 283442
* [ARM] Constant pool promotion - fix alignment calculationJames Molloy2016-10-061-1/+1
| | | | | | | | | Global variables are GlobalValues, so they have explicit alignment. Querying DataLayout for the alignment was incorrect. Testcase added. llvm-svn: 283423
* [ARM] Use __rt_div functions for divrem on WindowsMartin Storsjo2016-10-051-21/+45
| | | | | | | | | | | | | | | | | | | | This avoids falling back to calling out to the GCC rem functions (__moddi3, __umoddi3) when targeting Windows. The __rt_div functions have flipped the two arguments compared to the __aeabi_divmod functions. To match MSVC, we emit a check for division by zero before actually calling the library function (even if the library function itself also might do the same check). Not all calls to __rt_div functions for division are currently merged with calls to the same function with the same parameters for the remainder. This is more wasteful than a div + mls as before, but avoids calls to __moddi3. Differential Revision: https://reviews.llvm.org/D24076 llvm-svn: 283383
* [Thumb] Don't try and emit LDRH/LDRB from the constant poolJames Molloy2016-10-051-0/+1
| | | | | | | | This is not a valid encoding - these instructions cannot do PC-relative addressing. The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it. llvm-svn: 283323
* Use StringRef in ARMConstantPool APIs (NFC)Mehdi Amini2016-10-053-16/+15
| | | | llvm-svn: 283293
* Consistent fp denormal mode names. NFC.Sjoerd Meijer2016-10-041-2/+2
| | | | | | | | | This fixes the inconsistency of the fp denormal option names: in LLVM this was DenormalType, but in Clang this is DenormalMode which seems better. Differential Revision: https://reviews.llvm.org/D24906 llvm-svn: 283192
* [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of aSjoerd Meijer2016-10-031-1/+19
| | | | | | | | | | | | | | | library call to __aeabi_uidivmod. This is an improved implementation of r280808, see also D24133, that got reverted because isel was stuck in a loop. That was caused by the optimisation incorrectly triggering on i64 ints, which shouldn't happen because there is no 64bit hwdiv support; that put isel's type legalization and this optimisation in a loop. A native ARM compiler and testing now shows that this is fixed. Patch mostly by Pablo Barrio. Differential Revision: https://reviews.llvm.org/D25077 llvm-svn: 283098
* Use StringRef in Datalayout API (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283013
* Revert "Use StringRef in Datalayout API (NFC)"Mehdi Amini2016-10-011-1/+1
| | | | | | This reverts commit r283009. Bots are broken. llvm-svn: 283011
* Use StringRef in Datalayout API (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283009
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-0110-19/+11
| | | | llvm-svn: 283004
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-268-6/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282387
OpenPOWER on IntegriCloud