summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* Add DAG optimisation for FP16_TO_FPOliver Stannard2015-08-241-0/+40
| | | | | | | | | | | | | | The FP16_TO_FP node only uses the bottom 16 bits of its input, so the following pattern can be optimised by removing the AND: (FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op) This is a common pattern for ARM targets when functions have __fp16 arguments, as they are passed as floats (so that they get passed in the correct registers), but then bitcast and truncated to ignore the top 16 bits. llvm-svn: 245832
* [ARM] Use AEABI helpers for i64 div and remScott Douglass2015-08-241-13/+58
| | | | | | Differential Revision: http://reviews.llvm.org/D12232 llvm-svn: 245830
* [ARM] Fix MachO CPU Subtype selectionVedant Kumar2015-08-211-0/+68
| | | | | | Differential Revision: http://reviews.llvm.org/D12040 llvm-svn: 245744
* [DAGCombiner] Fold together mul and shl when both are by a constantJohn Brawn2015-08-211-0/+77
| | | | | | | | | | This is intended to improve code generation for GEPs, as the index value is shifted by the element size and in GEPs of multi-dimensional arrays the index of higher dimensions is multiplied by the lower dimension size. Differential Revision: http://reviews.llvm.org/D12197 llvm-svn: 245689
* [ARM] Add instruction selection patterns for vmin/vmaxSilviu Baranga2015-08-192-8/+200
| | | | | | | | | | | | | | | | Summary: The mid-end was generating vector smin/smax/umin/umax nodes, but we were using vbsl to generatate the code. This adds the vmin/vmax patterns and a test to check that we are now generating vmin/vmax instructions. Reviewers: rengolin, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12105 llvm-svn: 245439
* Split ARM and AArch64 emutls.ll testChih-Hung Hsieh2015-08-191-0/+288
| | | | | | Differential Revision: http://reviews.llvm.org/D12127 llvm-svn: 245399
* Align SP adjustment in function getSPAdjustGuozhi Wei2015-08-171-0/+47
| | | | | | | This commit adds a new function TargetFrameLowering::alignSPAdjust and calls it from TargetInstrInfo::getSPAdjust. It fixes PR24142. llvm-svn: 245253
* [ARM] Fix crash when targetting CPU without NEONJames Molloy2015-08-171-0/+1
| | | | | | | | We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available. Found here: https://code.google.com/p/chromium/issues/detail?id=521671 llvm-svn: 245231
* Generate FMINNAN/FMINNUM/FMAXNAN/FMAXNUM from SDAGBuilder.James Molloy2015-08-171-1/+1
| | | | | | | | | | These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted. For example on AArch32 (V8), we have scalar fminnm but not fmin. Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with. llvm-svn: 245196
* Revert "[ARM] Fix MachO CPU Subtype selection"Renato Golin2015-08-141-68/+0
| | | | | | This reverts commit r245081, as it breaks many builds. llvm-svn: 245086
* [ARM] Fix MachO CPU Subtype selectionVedant Kumar2015-08-141-0/+68
| | | | | | | | | | This patch makes the Darwin ARM backend take advantage of TargetParser. It also teaches TargetParser about ARMV7K for the first time. This makes target triple parsing more consistent across llvm. Differential Revision: http://reviews.llvm.org/D11996 llvm-svn: 245081
* [ARM] Rejig vmax tests a bitJames Molloy2015-08-132-245/+509
| | | | | | They rely on global fast-math options, but soon ISel will rely only on fast-math flags on the instructions themselves. Rip the fast checks out into their own file so we can mark their instructions as fast. llvm-svn: 244914
* [ARM] Reorganise and simplify thumb-1 load/store selectionJohn Brawn2015-08-131-21/+550
| | | | | | | | | | | | | | | | Other than PC-relative loads/store the patterns that match the various load/store addressing modes have the same complexity, so the order that they are matched is the order that they appear in the .td file. Rearrange the instruction definitions in ARMInstrThumb.td, and make use of AddedComplexity for PC-relative loads, so that the instruction matching order is the order that results in the simplest selection logic. This also makes register-offset load/store be selected when it should, as previously it was only selected for too-large immediate offsets. Differential Revision: http://reviews.llvm.org/D11800 llvm-svn: 244882
* Redo "Make global aliases have symbol size equal to their type"John Brawn2015-08-122-0/+23
| | | | | | | | | | | | r242520 was reverted in r244313 as the expected behaviour of the alias attribute in C is that the alias has the same size as the aliasee. However we can re-introduce adding the size on the alias when the aliasee does not, from a source code or object perspective, exist as a discrete entity. This happens when the aliasee is not a symbol, or when that symbol is private. Differential Revision: http://reviews.llvm.org/D11943 llvm-svn: 244752
* [GlobalMerge] Use private linkage for MergedGlobals variablesJohn Brawn2015-08-113-15/+13
| | | | | | | | | | | | | | | Other objects can never reference the MergedGlobals symbol so external linkage is never needed. Using private instead of internal linkage means the object is more similar to what it looks like when global merging is not enabled, with the only difference being that the merged variables are addressed indirectly relative to the start of the section they are in. Also add aliases for merged variables with internal linkage, as this also makes the object be more like what it is when they are not merged. Differential Revision: http://reviews.llvm.org/D11942 llvm-svn: 244615
* Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCIJonathan Roelofs2015-08-101-1/+1
| | | | | | | I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481
* [ARM] Update ReconstructShuffle to handle mismatched typesSilviu Baranga2015-08-072-0/+109
| | | | | | | | | | | | | | | | | | Summary: Port the ReconstructShuffle function from AArch64 to ARM to handle mismatched incoming types in the BUILD_VECTOR node. This fixes an outstanding FIXME in the ReconstructShuffle code. Reviewers: t.p.northover, rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D11720 llvm-svn: 244314
* Revert "Make global aliases have symbol size equal to their type"John Brawn2015-08-072-22/+0
| | | | | | | This reverts r242520, as it caused pr24379. Also removes part of the test added by r243874 that checks the size of alias symbols. llvm-svn: 244313
* Fix possible infinite loop in shrink wrapping when searching for save/restoreKit Barton2015-08-061-0/+28
| | | | | | | | | | | | | | | | | | | | points. There is an infinite loop that can occur in Shrink Wrapping while searching for the Save/Restore points. Part of this search checks whether the save/restore points are located in different loop nests and if so, uses the (post) dominator trees to find the immediate (post) dominator blocks. However, if the current block does not have any immediate (post) dominators then this search will result in an infinite loop. This can occur in code containing an infinite loop. The modification checks whether the immediate (post) dominator is different from the current save/restore block. If it is not, then the search terminates and the current location is not considered as a valid save/restore point for shrink wrapping. Phabricator: http://reviews.llvm.org/D11607 llvm-svn: 244247
* ARMISelDAGToDAG.cpp had this self-contradictory code:Artyom Skrobov2015-08-051-1/+1
| | | | | | | | | | | | | | | | | | return StringSwitch<int>(Flags) .Case("g", 0x1) .Case("nzcvq", 0x2) .Case("nzcvqg", 0x3) .Default(-1); ... // The _g and _nzcvqg versions are only valid if the DSP extension is // available. if (!Subtarget->hasThumb2DSP() && (Mask & 0x2)) return -1; ARMARM confirms that the comment is right, and the code was wrong. llvm-svn: 244029
* ARM: support windows division routinesSaleem Abdulrasool2015-08-041-1/+38
| | | | | | | | | This adds the software division routines for the Windows RTABI. These are not expected to be used often though as most modern Windows ARM capable targets support hardware division. In the case that the target CPU doesnt support hardware division, this will be the fallback. llvm-svn: 243952
* DI: Disallow uniquable DICompileUnitsDuncan P. N. Exon Smith2015-08-0320-20/+20
| | | | | | | | | | | | | | | | | | Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test | grep -v test/Bitcode | xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885
* ARM: prefer allocating VFP regs at stride 4 on Darwin.Tim Northover2015-08-032-4/+34
| | | | | | | | | This is necessary for WatchOS support, where the compact unwind format assumes this kind of layout. For now we only want this on Swift-like CPUs though, where it's been the Xcode behaviour for ages. Also, since it can expand the prologue we don't want it at -Oz. llvm-svn: 243884
* [ARM] Make GlobalMerge merge extern globals by defaultJohn Brawn2015-08-031-0/+48
| | | | | | | | | | | Enabling merging of extern globals appears to be generally either beneficial or harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57) it gives improvements in the 1-5% range, but in the rest the overall effect is zero. Differential Revision: http://reviews.llvm.org/D10966 llvm-svn: 243874
* Be less conservative about forming IT blocks.James Molloy2015-08-031-14/+10
| | | | | | | | | | | | | | | | In http://reviews.llvm.org/rL215382, IT forming was made more conservative under the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M. But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block as long as the flags register is dead afterwards. This gives significant performance improvements in a variety of MPEG based workloads. Differential revision: http://reviews.llvm.org/D11680 llvm-svn: 243869
* DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variableDuncan P. N. Exon Smith2015-07-3119-113/+113
| | | | | | | | | | | | | | | | | | | | | | | | Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name "*.ll" -o -name "*.mir" | xargs grep -l 'DILocalVariable' | xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774
* [ARM] Lower modulo operation to generate __aeabi_divmod on AndroidSumanth Gundapaneni2015-07-311-0/+2
| | | | | | | | | | | | | | For a modulo (reminder) operation, clang -target armv7-none-linux-gnueabi generates "__modsi3" clang -target armv7-none-eabi generates "__aeabi_idivmod" clang -target armv7-linux-androideabi generates "__modsi3" Android bionic libc doesn't provide a __modsi3, instead it provides a "__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate the correct call when ever there is a modulo operation. Differential Revision: http://reviews.llvm.org/D11661 llvm-svn: 243717
* [ARM] Define subtarget feature strict-align.Akira Hatanaka2015-07-285-73/+64
| | | | | | | | | | | | | | This commit defines subtarget feature strict-align and uses it instead of cl::opt -arm-strict-align to decide whether strict alignment should be forced. Also, remove the logic that was checking the OS and architecture as clang is now responsible for setting strict-align based on the command line options specified and the target architecute and OS. rdar://problem/21529937 http://reviews.llvm.org/D11470 llvm-svn: 243493
* Move unit tests to target specific directories.Chih-Hung Hsieh2015-07-281-0/+61
| | | | | | Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243454
* Implement target independent TLS compatible with glibc's emutls.c.Chih-Hung Hsieh2015-07-283-21/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'common' section TLS is not implemented. Current C/C++ TLS variables are not placed in common section. DWARF debug info to get the address of TLS variables is not generated yet. clang and driver changes in http://reviews.llvm.org/D10524 Added -femulated-tls flag to select the emulated TLS model, which will be used for old targets like Android that do not support ELF TLS models. Added TargetLowering::LowerToTLSEmulatedModel as a target-independent function to convert a SDNode of TLS variable address to a function call to __emutls_get_address. Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel for TLSModel::Emulated. Although all targets supporting ELF TLS models are enhanced, emulated TLS model has been tested only for Android ELF targets. Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for emulated TLS variables. Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls. TODO: Add proper DIE for emulated TLS variables. Added new unit tests with emulated TLS. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243438
* DI/Verifier: Fix argument bitrot in DILocalVariableDuncan P. N. Exon Smith2015-07-245-13/+13
| | | | | | | | | | | | | | | | | | | | | | Add a verifier check that `DILocalVariable`s of tag `DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are the only configurations that are properly understood by the backend. (Also, fix the bad examples in LangRef and test/Assembler, and fix the bug in Kaleidoscope Ch8.) A large number of testcases seem to have bitrotted their way forward from some ancient version of the debug info hierarchy that didn't have `arg:` parameters. If you have out-of-tree testcases that start failing in the verifier and you don't care enough to get the `arg:` right, you may have some luck just calling: sed -e 's/, arg: 0/, arg: 1/' or some such, but I hand-updated the ones in tree. llvm-svn: 243183
* [ARM] - Fix lowering of shufflevectors in AArch32Luke Cheeseman2015-07-244-0/+94
| | | | | | | | | | | | | | | | | | | Some shufflevectors are currently being incorrectly lowered in the AArch32 backend as the existing checks for detecting the NEON operations from the shufflevector instruction expects the shuffle mask and the vector operands to be of the same length. This is not always the case as the mask may be twice as long as the operand; here only the lower half of the shufflemask gets checked, so provided the lower half of the shufflemask looks like a vector transpose (or even is just all -1 for undef) then the intrinsics may get incorrectly lowered into a vector transpose (VTRN) instruction. This patch fixes this by accommodating for both cases and adds regression tests. Differential Revision: http://reviews.llvm.org/D11407 llvm-svn: 243103
* When lowering vector shifts a check is performed to see if the value to shift byLuke Cheeseman2015-07-241-0/+13
| | | | | | | | | | | | is an immediate, in this check the value is negated and stored in and int64_t. The value can be -2^63 yet the result cannot be stored in an int64_t and this gives some undefined behaviour causing failures. The negation is only necessary when the values is within a certain range and so it should not need to negate -2^63, this patch introduces this and also a regression test. Differential Revision: http://reviews.llvm.org/D11408 llvm-svn: 243100
* [ARM] Make the frame lowering code ready for shrink-wrapping.Quentin Colombet2015-07-221-0/+536
| | | | | | | | Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap. Related to <rdar://problem/20821730> llvm-svn: 242908
* [ARM] Define subtarget feature "reserve-r9", which is used to decideAkira Hatanaka2015-07-212-2/+2
| | | | | | | | | | | | | | | | | | | | whether register r9 should be reserved. This recommits r242737, which broke bots because the number of subtarget features went over the limit of 64. This change is needed because we cannot use a backend option to set cl::opt "arm-reserve-r9" when doing LTO. Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to reserve r9 should make changes to add subtarget feature "reserve-r9" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11320 llvm-svn: 242756
* ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common codeMatthias Braun2015-07-211-4/+52
| | | | | | | | | | | | | | | Re-apply of r241928 which had to be reverted because of the r241926 revert. This commit factors out common code from MergeBaseUpdateLoadStore() and MergeBaseUpdateLSMultiple() and introduces a new function MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a strd/ldrd instruction into an strd/ldrd instruction with writeback where possible. Differential Revision: http://reviews.llvm.org/D10676 llvm-svn: 242743
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-215-12/+34
| | | | | | | | | | Re-apply r241926 with an additional check that r13 and r15 are not used for LDRD/STRD. See http://llvm.org/PR24190. This also already includes the fix from r241951. Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 242742
* Revert r242737.Akira Hatanaka2015-07-202-2/+2
| | | | | | | | This caused builds to fail with the following error message: error:Too many subtarget features! Bump MAX_SUBTARGET_FEATURES. llvm-svn: 242740
* [ARM] Define subtarget feature "reserve-r9", which is used to decideAkira Hatanaka2015-07-202-2/+2
| | | | | | | | | | | | | | | | | whether register r9 should be reserved. This change is needed because we cannot use a backend option to set cl::opt "arm-reserve-r9" when doing LTO. Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to reserve r9 should make changes to add subtarget feature "reserve-r9" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11320 llvm-svn: 242737
* Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"Matthias Braun2015-07-205-34/+12
| | | | | | This reverts commit r241926. This caused http://llvm.org/PR24190 llvm-svn: 242735
* Revert "ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code"Matthias Braun2015-07-201-52/+4
| | | | | | This reverts commit r241928. This caused http://llvm.org/PR24190 llvm-svn: 242734
* [ARM] Refactor the prologue/epilogue emission to be more robust.Quentin Colombet2015-07-202-25/+30
| | | | | | | | | | | | | | | | This is the first step toward supporting shrink-wrapping for this target. The changes could be summarized by these items: - Expand the tail-call return as part of the expand pseudo pass. - Get rid of the assumptions that the epilogue is the exit block: * Do not assume which registers are free in the epilogue. (This indirectly improve the lowering of the code for the segmented stacks, see the test cases.) * Take into account that the basic block can be empty. Related to <rdar://problem/20821730> llvm-svn: 242714
* ARM: Enable MachineScheduler and disable PostRAScheduler for swift.Matthias Braun2015-07-176-31/+31
| | | | | | | | | | | | | | | | | | | | | | | Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588
* Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift."Adam Nemet2015-07-176-31/+31
| | | | | | | | | This reverts commit r242500. It broke some internal tests and Matthias asked me to revert it while he is investigating. llvm-svn: 242553
* Make global aliases have symbol size equal to their typeJohn Brawn2015-07-171-0/+19
| | | | | | | | | | This is mainly for the benefit of GlobalMerge, so that an alias into a MergedGlobals variable has the same size as the original non-merged variable. Differential Revision: http://reviews.llvm.org/D10837 llvm-svn: 242520
* ARM: Enable MachineScheduler and disable PostRAScheduler for swift.Matthias Braun2015-07-176-31/+31
| | | | | | | | | | | | | | | | | | | | | This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242500
* Fix __builtin_setjmp in combination with sjlj exception handling.Matthias Braun2015-07-161-0/+113
| | | | | | | | | | | | | | | | | | | llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling style but is also used in clang to implement __builtin_setjmp. The ARM backend needs to output additional dispatch tables for the SjLj exception handling style, these tables however can't be emitted if llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual landing pad blocks exist. To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj exception handling lowering, so we can differentiate between the case where we actually need to setup a dispatch table and the case where we just need the __builtin_setjmp semantic. Differential Revision: http://reviews.llvm.org/D9313 llvm-svn: 242481
* Revert "Add missing load/store flags to thumb2 instructions."Pete Cooper2015-07-161-1/+1
| | | | | | | | | | This reverts commit r242300. This is causing buildbot failures which we are investigating. I'll reapply once we know whats going on, but for now want to get the bots green. llvm-svn: 242428
* [ARM] Define a subtarget feature that is used to avoid using movt/movwAkira Hatanaka2015-07-162-5/+50
| | | | | | | | | | | | | | | | | pairs for 32-bit immediates. This change is needed to avoid emitting movt/movw pairs when doing LTO and do so on a per-function basis. Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or false to avoid emitting movt/movw pairs should make changes to add subtarget feature "+no-movt" (see the changes made to clang in r242368). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11026 llvm-svn: 242369
* Add missing load/store flags to thumb2 instructions.Pete Cooper2015-07-151-1/+1
| | | | | | | | | | | | | These were the cause of a verifier error when building 7zip with -verify-machineinstrs. Running 'make check' with the verifier triggered the same error on the test here so i've updated the test to run the verifier on one of its runs instead of adding a new one. While looking at this code, there was a stale comment that these instructions were only used for disassembly. This probably used to be the case, but they are now used in the 'ARM load / store optimization pass' too. llvm-svn: 242300
OpenPOWER on IntegriCloud