summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Thumb1] Move padding earlier when synthesizing TBBs off of the PCJames Molloy2016-11-071-0/+8
| | | | | | | | | | When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding. If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset. In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4. llvm-svn: 286107
* [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tablesJames Molloy2016-11-011-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690
* Revert r284580+r284917. ("Synthesize TBB/TBH instructions")Eli Friedman2016-10-241-77/+0
| | | | | | | The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993
* Reapply r284571 (with the new tests fixed).Sjoerd Meijer2016-10-191-14/+15
| | | | llvm-svn: 284588
* [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tablesJames Molloy2016-10-191-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 284580
* Revert of r284571 because of failing tests.Sjoerd Meijer2016-10-191-17/+14
| | | | llvm-svn: 284572
* Checking FP function attribute values and adding more build attribute tests.Sjoerd Meijer2016-10-191-14/+17
| | | | | | | | | | This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 llvm-svn: 284571
* [XRay] Support for for tail calls for ARM no-ThumbDean Michael Berris2016-10-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-091-4/+4
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* [ARM]: Add Cortex-R52 target to LLVMJaved Absar2016-10-071-2/+4
| | | | | | | This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. Cortex-R52 implements the ARMv8-R architecture. llvm-svn: 283542
* Use StringRef in ARMConstantPool APIs (NFC)Mehdi Amini2016-10-051-1/+1
| | | | llvm-svn: 283293
* Consistent fp denormal mode names. NFC.Sjoerd Meijer2016-10-041-2/+2
| | | | | | | | | This fixes the inconsistency of the fp denormal option names: in LLVM this was DenormalType, but in Clang this is DenormalMode which seems better. Differential Revision: https://reviews.llvm.org/D24906 llvm-svn: 283192
* Use StringRef in Datalayout API (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283013
* Revert "Use StringRef in Datalayout API (NFC)"Mehdi Amini2016-10-011-1/+1
| | | | | | This reverts commit r283009. Bots are broken. llvm-svn: 283011
* Use StringRef in Datalayout API (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283009
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-261-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282387
* Revert "[ARM] Promote small global constants to constant pools"James Molloy2016-09-231-33/+0
| | | | | | This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882. llvm-svn: 282249
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-231-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282241
* Revert r281715, it caused PR30475Nico Weber2016-09-211-13/+0
| | | | llvm-svn: 282076
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-191-0/+9
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: https://reviews.llvm.org/D23932 (Clang test) https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 281878
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-161-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 281715
* Move the Mangler from the AsmPrinter down to TLOF and clean up theEric Christopher2016-09-161-3/+3
| | | | | | TLOF API accordingly. llvm-svn: 281708
* Revert "[ARM] Promote small global constants to constant pools"Evgeniy Stepanov2016-09-151-13/+0
| | | | | | This reverts r281604, which adds text relocations to ARM binaries. llvm-svn: 281645
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-151-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. llvm-svn: 281604
* Revert "[ARM] Promote small global constants to constant pools"Evgeniy Stepanov2016-09-141-13/+0
| | | | | | | | Breaks Android tests by introducing text relocations to ARM binaries. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/25362/steps/run%20asan%20lit%20tests%20%5Barm%2Fbullhead-userdebug%2FMTC20F%5D/logs/stdio llvm-svn: 281526
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). llvm-svn: 281484
* Revert "[ARM] Promote small global constants to constant pools"James Molloy2016-09-131-13/+0
| | | | | | This reverts commit r281314. Speculatively revert as it's possible this caused linker errors: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19656 llvm-svn: 281327
* [ARM] Add ".code 32" to functions in the ARM instruction setPablo Barrio2016-09-131-1/+2
| | | | | | | | | | | | | | | | | | | Before, only Thumb functions were marked as ".code 16". These ".code x" directives are effective until the next directive of its kind is encountered. Therefore, in code with interleaved ARM and Thumb functions, it was possible to declare a function as ARM and end up with a Thumb function after assembly. A test has been added. An existing test has also been fixed to take this change into account. Reviewers: aschwaighofer, t.p.northover, jmolloy, rengolin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D24337 llvm-svn: 281324
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-131-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). llvm-svn: 281314
* Revert "[ARM] Promote small global constants to constant pools"James Molloy2016-09-121-13/+0
| | | | | | This reverts commit r281213. It made a bot go bang: http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/14625 llvm-svn: 281228
* [ARM] Promote small global constants to constant poolsJames Molloy2016-09-121-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). llvm-svn: 281213
* Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"Renato Golin2016-09-081-9/+0
| | | | | | | | | | And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-081-0/+9
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
* Setting fp trapping mode and denormal type: this an improvement ofSjoerd Meijer2016-09-021-16/+24
| | | | | | | | | r280246 and calculates compatibility of functions attributes in a better way. Differential Revision: https://reviews.llvm.org/D24070 llvm-svn: 280534
* Clang patch r280064 introduced ways to set the FP exceptions and denormalSjoerd Meijer2016-08-311-10/+31
| | | | | | | | | | types. This is the LLVM counterpart and it adds options that map onto FP exceptions and denormal build attributes allowing better fp math library selections. Differential Revision: https://reviews.llvm.org/D24070 llvm-svn: 280246
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* [ARM] Add support for embedded position-independent codeOliver Stannard2016-08-081-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for some new relocation models to the ARM backend: * Read-only position independence (ROPI): Code and read-only data is accessed PC-relative. The offsets between all code and RO data sections are known at static link time. This does not affect read-write data. * Read-write position independence (RWPI): Read-write data is accessed relative to the static base register (r9). The offsets between all writeable data sections are known at static link time. This does not affect read-only data. These two modes are independent (they specify how different objects should be addressed), so they can be used individually or together. They are otherwise the same as the "static" relocation model, and are not compatible with SysV-style PIC using a global offset table. These modes are normally used by bare-metal systems or systems with small real-time operating systems. They are designed to avoid the need for a dynamic linker, the only initialisation required is setting r9 to an appropriate value for RWPI code. I have only added support to SelectionDAG, not FastISel, because FastISel is currently disabled for bare-metal targets where these modes would be used. Differential Revision: https://reviews.llvm.org/D23195 llvm-svn: 278015
* ARM: support high registers in __builtin_longjmp on WoASaleem Abdulrasool2016-07-081-3/+32
| | | | | | | | | | Windows on ARM uses a pure thumb-2 environment. This means that it can select a high register when doing a __builtin_longjmp. We would use a tLDRi which would truncate the register to a low register. Use a t2LDRi12 to get the full register file access. Tweak the code to just load into PC, as that is an interworking branch on all supported cores anyways. llvm-svn: 274815
* Don't pass a Reloc::Model to GVIsIndirectSymbol.Rafael Espindola2016-06-281-2/+2
| | | | | | | | It already has access to it. While at it, rename it to isGVIndirectSymbol. llvm-svn: 274023
* Move isPositionIndependent up to AsmPrinter.Rafael Espindola2016-06-271-4/+0
| | | | | | Use it in ppc too. llvm-svn: 273877
* Add support for musl-libc on ARM Linux.Rafael Espindola2016-06-241-1/+2
| | | | | | Patch by Lei Zhang! llvm-svn: 273726
* Define a isPositionIndependent helper for ARMAsmPrinter. NFC.Rafael Espindola2016-06-211-2/+6
| | | | llvm-svn: 273261
* Don't print (PLT) on arm.Rafael Espindola2016-06-161-2/+0
| | | | | | | | | The R_ARM_PLT32 relocation is deprecated and is not produced by MC. This means that the code being deleted is dead from the .o point of view and was making the .s more confusing. llvm-svn: 272909
* ARM: correct TLS access on WoASaleem Abdulrasool2016-06-071-0/+2
| | | | | | | | | | | | TLS access requires an offset from the TLS index. The index itself is the section-relative distance of the symbol. For ARM, the relevant relocation (IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may not be an immediate and must be lowered into a constant pool. This offset will not be base relocated. We were previously emitting the actual address of the symbol which would be base relocated and would therefore be the vaue offset by the ImageBase + TLS Offset. llvm-svn: 271974
* ARM: clang-format a couple of switches, add commentsSaleem Abdulrasool2016-06-071-5/+10
| | | | | | | clang-format a couple of switches in preparation for a future change. Add some enumeration comments llvm-svn: 271973
* Use StringRef::startswith instead of find(...) == 0.Benjamin Kramer2016-05-271-1/+1
| | | | | | It's faster and easier to read. llvm-svn: 271018
* Avoid some copies by using const references.Benjamin Kramer2016-05-271-1/+1
| | | | | | | clang-tidy's performance-unnecessary-copy-initialization with some manual fixes. No functional changes intended. llvm-svn: 270988
* Simplify handling of hidden stub.Rafael Espindola2016-05-171-16/+2
| | | | | | | | | Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776
* ARM: support export directives for WindowsSaleem Abdulrasool2016-05-141-0/+23
| | | | | | | | | | It seems that cl will emit the export directives for Windows ARM targets. The fact that it did this had originally been missed and this functionality was never implemented. This makes it possible to rely solely on the source code for indicating what the exported interfaces are and brings us more compatibility with cl. llvm-svn: 269574
* ARM: put extern __thread stubs in a special section.Tim Northover2016-04-251-2/+18
| | | | | | | The linker needs to know that the symbols are thread-local to do its job properly. llvm-svn: 267473
OpenPOWER on IntegriCloud