summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-215-12/+34
| | | | | | | | | | Re-apply r241926 with an additional check that r13 and r15 are not used for LDRD/STRD. See http://llvm.org/PR24190. This also already includes the fix from r241951. Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 242742
* Revert r242737.Akira Hatanaka2015-07-202-2/+2
| | | | | | | | This caused builds to fail with the following error message: error:Too many subtarget features! Bump MAX_SUBTARGET_FEATURES. llvm-svn: 242740
* [ARM] Define subtarget feature "reserve-r9", which is used to decideAkira Hatanaka2015-07-202-2/+2
| | | | | | | | | | | | | | | | | whether register r9 should be reserved. This change is needed because we cannot use a backend option to set cl::opt "arm-reserve-r9" when doing LTO. Out-of-tree projects currently using cl::opt option "-arm-reserve-r9" to reserve r9 should make changes to add subtarget feature "reserve-r9" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11320 llvm-svn: 242737
* Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"Matthias Braun2015-07-205-34/+12
| | | | | | This reverts commit r241926. This caused http://llvm.org/PR24190 llvm-svn: 242735
* Revert "ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code"Matthias Braun2015-07-201-52/+4
| | | | | | This reverts commit r241928. This caused http://llvm.org/PR24190 llvm-svn: 242734
* [ARM] Refactor the prologue/epilogue emission to be more robust.Quentin Colombet2015-07-202-25/+30
| | | | | | | | | | | | | | | | This is the first step toward supporting shrink-wrapping for this target. The changes could be summarized by these items: - Expand the tail-call return as part of the expand pseudo pass. - Get rid of the assumptions that the epilogue is the exit block: * Do not assume which registers are free in the epilogue. (This indirectly improve the lowering of the code for the segmented stacks, see the test cases.) * Take into account that the basic block can be empty. Related to <rdar://problem/20821730> llvm-svn: 242714
* ARM: Enable MachineScheduler and disable PostRAScheduler for swift.Matthias Braun2015-07-176-31/+31
| | | | | | | | | | | | | | | | | | | | | | | Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588
* Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift."Adam Nemet2015-07-176-31/+31
| | | | | | | | | This reverts commit r242500. It broke some internal tests and Matthias asked me to revert it while he is investigating. llvm-svn: 242553
* Make global aliases have symbol size equal to their typeJohn Brawn2015-07-171-0/+19
| | | | | | | | | | This is mainly for the benefit of GlobalMerge, so that an alias into a MergedGlobals variable has the same size as the original non-merged variable. Differential Revision: http://reviews.llvm.org/D10837 llvm-svn: 242520
* ARM: Enable MachineScheduler and disable PostRAScheduler for swift.Matthias Braun2015-07-176-31/+31
| | | | | | | | | | | | | | | | | | | | | This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242500
* Fix __builtin_setjmp in combination with sjlj exception handling.Matthias Braun2015-07-161-0/+113
| | | | | | | | | | | | | | | | | | | llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling style but is also used in clang to implement __builtin_setjmp. The ARM backend needs to output additional dispatch tables for the SjLj exception handling style, these tables however can't be emitted if llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual landing pad blocks exist. To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj exception handling lowering, so we can differentiate between the case where we actually need to setup a dispatch table and the case where we just need the __builtin_setjmp semantic. Differential Revision: http://reviews.llvm.org/D9313 llvm-svn: 242481
* Revert "Add missing load/store flags to thumb2 instructions."Pete Cooper2015-07-161-1/+1
| | | | | | | | | | This reverts commit r242300. This is causing buildbot failures which we are investigating. I'll reapply once we know whats going on, but for now want to get the bots green. llvm-svn: 242428
* [ARM] Define a subtarget feature that is used to avoid using movt/movwAkira Hatanaka2015-07-162-5/+50
| | | | | | | | | | | | | | | | | pairs for 32-bit immediates. This change is needed to avoid emitting movt/movw pairs when doing LTO and do so on a per-function basis. Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or false to avoid emitting movt/movw pairs should make changes to add subtarget feature "+no-movt" (see the changes made to clang in r242368). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11026 llvm-svn: 242369
* Add missing load/store flags to thumb2 instructions.Pete Cooper2015-07-151-1/+1
| | | | | | | | | | | | | These were the cause of a verifier error when building 7zip with -verify-machineinstrs. Running 'make check' with the verifier triggered the same error on the test here so i've updated the test to run the verifier on one of its runs instead of adding a new one. While looking at this code, there was a stale comment that these instructions were only used for disassembly. This probably used to be the case, but they are now used in the 'ARM load / store optimization pass' too. llvm-svn: 242300
* ARM: add at least one real test for r242123.Tim Northover2015-07-141-0/+10
| | | | | | | | The ones committed were orthogonal to the change and would have passed before that revision. What it *did* do was prevent an assertion failure when generating object files. llvm-svn: 242166
* PrologEpilogInserter: Rewrite API to determine callee save regsiters.Matthias Braun2015-07-141-4/+2
| | | | | | | | | | | | | | | | This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified. Related to rdar://21539507 Differential Revision: http://reviews.llvm.org/D10909 llvm-svn: 242165
* Generate correct asm info for mingw and cygwin ARM targets.Yaron Keren2015-07-144-12/+36
| | | | | | | | | http://reviews.llvm.org/D11075 Patch by Martell Malone Reviewed by Reid Kleckner llvm-svn: 242123
* ARM: Fix cttz expansion on vector types.Logan Chien2015-07-133-11/+473
| | | | | | | | | | | | The 64/128-bit vector types are legal if NEON instructions are available. However, there was no matching patterns for @llvm.cttz.*() intrinsics and result in fatal error. This commit fixes the problem by lowering cttz to: a. ctpop((x & -x) - 1) b. width - ctlz(x & -x) - 1 llvm-svn: 242037
* [ARM] Add support for nest attribute using r12Renato Golin2015-07-121-0/+21
| | | | | | | | | | | | | | | | Register r12 ('ip') is used by GCC for this purpose and hence is used here. As discussed on the GCC mailing list, the register choice is an ABI issue and so choosing the same register as GCC means __builtin_call_with_static_chain is compatible. A similar patch has just gone in the AArch64 backend, so this is just the ARM counterpart, following the same discussion. Patch by Stephen Cross. llvm-svn: 241996
* ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common codeMatthias Braun2015-07-101-4/+52
| | | | | | | | | | | | This commit factors out common code from MergeBaseUpdateLoadStore() and MergeBaseUpdateLSMultiple() and introduces a new function MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a strd/ldrd instruction into an strd/ldrd instruction with writeback where possible. Differential Revision: http://reviews.llvm.org/D10676 llvm-svn: 241928
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-105-12/+34
| | | | | | Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 241926
* ARMLoadStoreOptimizer: Rewrite LDM/STM matching logic.Matthias Braun2015-07-101-2/+2
| | | | | | | | | | | | | | | | | | | | | This improves the logic in several ways and is a preparation for followup patches: - First perform an analysis and create a list of merge candidates, then transform. This simplifies the code in that you have don't have to care to much anymore that you may be holding iterators to MachineInstrs that get removed. - Analyze/Transform basic blocks in reverse order. This allows to use LivePhysRegs to find free registers instead of the RegisterScavenger. The RegisterScavenger will become less precise in the future as it relies on the deprecated kill-flags. - Return the newly created node in MergeOps so there's no need to look around in the schedule to find it. - Rename some MBBI iterators to InsertBefore to make their role clear. - General code cleanup. Differential Revision: http://reviews.llvm.org/D10140 llvm-svn: 241920
* Fix test case to unbreak build.Akira Hatanaka2015-07-071-9/+18
| | | | | | | | This commit changes the target arch to fix the test case commited in r241566 that was failing on ninja-x64-msvc-RA-centos6. Also add checks to make sure the callee's address is loaded to blx's operand. llvm-svn: 241588
* [ARM] Define a subtarget feature and use it to decide whether long calls shouldAkira Hatanaka2015-07-075-9/+49
| | | | | | | | | | | | | | | | | be emitted. This is needed to enable ARM long calls for LTO and enable and disable it on a per-function basis. Out-of-tree projects currently using EnableARMLongCalls to emit long calls should start passing "+long-calls" to the feature string (see the changes made to clang in r241565). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D9364 llvm-svn: 241566
* llvm/test/CodeGen/ARM/fnattr-trap.ll: Add -mtriple, to appease targeting ↵NAKAMURA Takumi2015-07-031-2/+2
| | | | | | | | *-win32. LLVM ERROR: CPU: 'generic' does not support ARM mode execution! llvm-svn: 241329
* Use function attribute "trap-func-name" and remove TargetOptions::TrapFuncName.Akira Hatanaka2015-07-021-0/+40
| | | | | | | | | | | | | | | | | | This commit changes normal isel and fast isel to read the user-defined trap function name from function attribute "trap-func-name" attached to llvm.trap or llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to use clang's command line option "-ftrap-function" for LTO and enable changing the trap function name on a per-call-site basis. Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the trap function name should attach attribute "trap-func-name" to the call sites of llvm.trap and llvm.debugtrap instead. rdar://problem/21225723 Differential Revision: http://reviews.llvm.org/D10832 llvm-svn: 241305
* ARM: add correct kill flags when combining stm instructionsTim Northover2015-06-291-0/+43
| | | | | | | | | When the store sequence being combined actually stores the base register, we should not mark it as killed until the end. rdar://21504262 llvm-svn: 241003
* [ARM]: Extend -mfpu options for half-precision and vfpv3xdJaved Absar2015-06-291-1/+14
| | | | | | | | | | | | | | | Some of the the permissible ARM -mfpu options, which are supported in GCC, are currently not present in llvm/clang.This patch adds the options: 'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16. These are related to half-precision floating-point and single precision. Reviewers: rengolin, ranjeet.singh Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10645 llvm-svn: 240930
* [ARM] Cortex-R5 is not VFPOnlySPJaved Absar2015-06-261-1/+1
| | | | | | | | | | | | | This patch fixes the error in ARM.td which stated that Cortex-R5 floating point unit can do only single precision, when it can do double as well. Reviewers: rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10769 llvm-svn: 240799
* [ARM] Cortex-R4F is not VFPOnlySPJaved Absar2015-06-261-1/+1
| | | | | | | | | | | | | Cortex-R4F TRM states that fpu supports both single and double precision. This patch corrects the information in ARM.td file and corresponding test. Reviewers: rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10763 llvm-svn: 240776
* [ARM] Lower interleaved memory accesses to vldN/vstN intrinsics.Hao Liu2015-06-261-0/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4 %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %vld2 = { <4 x i32>, <4 x i32> } call llvm.arm.neon.vld2(%ptr, 4) %vec0 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr, align 4 into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240755
* Fix mismatched architectures in testMatthias Braun2015-06-261-1/+1
| | | | llvm-svn: 240745
* ARMLoadStoreOptimizer: Fix errata 602117 handling and make testcase actually ↵Matthias Braun2015-06-241-14/+15
| | | | | | | | | | test for it This fixes PR23912 Differential Revision: http://reviews.llvm.org/D10620 llvm-svn: 240582
* [ARM] Look through concat when lowering in-place shuffles (VZIP, ..)Ahmed Bougacha2015-06-193-241/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, we canonicalize shuffles that produce a result larger than their operands with: shuffle(concat(v1, undef), concat(v2, undef)) -> shuffle(concat(v1, v2), undef) because we can access quad vectors (see PerformVECTOR_SHUFFLECombine). This is useful in the general case, but there are special cases where native shuffles produce larger results: the two-result ops. We can look through the concat when lowering them: shuffle(concat(v1, v2), undef) -> concat(VZIP(v1, v2):0, :1) This lets us generate the native shuffles instead of scalarizing to dozens of VMOVs. Differential Revision: http://reviews.llvm.org/D10424 llvm-svn: 240118
* [ARM] Add D-sized vtrn/vuzp/vzip tests, and cleanup. NFC.Ahmed Bougacha2015-06-193-79/+819
| | | | llvm-svn: 240114
* Fix "the the" in comments.Eric Christopher2015-06-191-1/+1
| | | | llvm-svn: 240112
* Move the personality function from LandingPadInst to FunctionDavid Majnemer2015-06-1724-72/+63
| | | | | | | | | | | | | | | | | | | The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940
* [ARM] Disabling vfp4 should disable fp16John Brawn2015-06-121-3/+3
| | | | | | | | | | | ARMTargetParser::getFPUFeatures should disable fp16 whenever it disables vfp4, as otherwise something like -mcpu=cortex-a7 -mfpu=none leaves us with fp16 enabled (though the only effect that will have is a wrong build attribute). Differential Revision: http://reviews.llvm.org/D10397 llvm-svn: 239599
* Add explicit -mtriple=arm-unknown to ↵NAKAMURA Takumi2015-06-091-3/+3
| | | | | | llvm/test/CodeGen/ARM/disable-tail-calls.ll, to satisfy *-win32. llvm-svn: 239442
* Remove DisableTailCalls from TargetOptions and the code in resetTargetOptionsAkira Hatanaka2015-06-091-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | that was resetting it. Remove the uses of DisableTailCalls in subclasses of TargetLowering and use the value of function attribute "disable-tail-calls" instead. Also, unconditionally add pass TailCallElim to the pipeline and check the function attribute at the start of runOnFunction to disable the pass on a per-function basis. This is part of the work to remove TargetMachine::resetTargetOptions, and since DisableTailCalls was the last non-fast-math option that was being reset in that function, we should be able to remove the function entirely after the work to propagate IR-level fast-math flags to DAG nodes is completed. Out-of-tree users should remove the uses of DisableTailCalls and make changes to attach attribute "disable-tail-calls"="true" or "false" to the functions in the IR. rdar://problem/13752163 Differential Revision: http://reviews.llvm.org/D10099 llvm-svn: 239427
* [ARM] Pass a callback to FunctionPass constructors to enable skipping executionAkira Hatanaka2015-06-081-0/+22
| | | | | | | | | | | | | | | | on a per-function basis. Previously some of the passes were conditionally added to ARM's pass pipeline based on the target machine's subtarget. This patch makes changes to add those passes unconditionally and execute them conditonally based on the predicate functor passed to the pass constructors. This enables running different sets of passes for different functions in the module. rdar://problem/20542263 Differential Revision: http://reviews.llvm.org/D8717 llvm-svn: 239325
* [ARM] Add support for -sp- FPUs and FPU none to TargetParserJohn Brawn2015-06-051-3/+3
| | | | | | | | | | These are added mainly for the benefit of clang, but this also means that they are now allowed in .fpu directives and we emit the correct .fpu directive when single-precision-only is used. Differential Revision: http://reviews.llvm.org/D10238 llvm-svn: 239151
* ARM: Thumb2 LDRD/STRD supports independent input/output regsMatthias Braun2015-06-032-6/+24
| | | | | | | | | | | | | | | | The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 Recommiting after the revert in r238821, the buildbot still failed with the patch removed so there seems to be another reason for the breakage. llvm-svn: 238935
* Revert "ARM: Thumb2 LDRD/STRD supports independent input/output regs"Renato Golin2015-06-022-24/+6
| | | | | | | | | This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot. Since self-hosting issues with Clang are hard to investigate, I'm taking the liberty to revert now, so we can investigate it offline. llvm-svn: 238821
* ARM: Thumb2 LDRD/STRD supports independent input/output regsMatthias Braun2015-06-012-6/+24
| | | | | | | | | | | | | The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 llvm-svn: 238795
* Re-commit of r238201 with fix for building with shared libraries.Luke Cheeseman2015-06-015-2/+301
| | | | llvm-svn: 238739
* ARM: recommit r237590: allow jump tables to be placed as constant islands.Tim Northover2015-05-313-2/+94
| | | | | | | | | | | | | | | The original version didn't properly account for the base register being modified before the final jump, so caused miscompilations in Chromium and LLVM. I've fixed this and tested with an LLVM self-host (I don't have the means to build & test Chromium). The general idea remains the same: in pathological cases jump tables can be too far away from the instructions referencing them (like other constants) so they need to be movable. Should fix PR23627. llvm-svn: 238680
* Revert "Re-commit changes in r237579 with fix for bug breaking windows builds."Diego Novillo2015-05-265-301/+2
| | | | | | | This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223
* Re-commit changes in r237579 with fix for bug breaking windows builds.Luke Cheeseman2015-05-265-2/+301
| | | | llvm-svn: 238201
* Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.Akira Hatanaka2015-05-231-0/+25
| | | | | | | | | | | | | | This is part of the work to remove TargetMachine::resetTargetOptions. In this patch, instead of updating global variable NoFramePointerElim in resetTargetOptions, its use in DisableFramePointerElim is replaced with a call to TargetFrameLowering::noFramePointerElim. This function determines on a per-function basis if frame pointer elimination should be disabled. There is no change in functionality except that cl:opt option "disable-fp-elim" can now override function attribute "no-frame-pointer-elim". llvm-svn: 238080
OpenPOWER on IntegriCloud