summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
* Make global aliases have symbol size equal to their typeJohn Brawn2015-07-171-0/+19
| | | | | | | | | | This is mainly for the benefit of GlobalMerge, so that an alias into a MergedGlobals variable has the same size as the original non-merged variable. Differential Revision: http://reviews.llvm.org/D10837 llvm-svn: 242520
* ARM: Enable MachineScheduler and disable PostRAScheduler for swift.Matthias Braun2015-07-176-31/+31
| | | | | | | | | | | | | | | | | | | | | This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242500
* Fix __builtin_setjmp in combination with sjlj exception handling.Matthias Braun2015-07-161-0/+113
| | | | | | | | | | | | | | | | | | | llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling style but is also used in clang to implement __builtin_setjmp. The ARM backend needs to output additional dispatch tables for the SjLj exception handling style, these tables however can't be emitted if llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual landing pad blocks exist. To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj exception handling lowering, so we can differentiate between the case where we actually need to setup a dispatch table and the case where we just need the __builtin_setjmp semantic. Differential Revision: http://reviews.llvm.org/D9313 llvm-svn: 242481
* Revert "Add missing load/store flags to thumb2 instructions."Pete Cooper2015-07-161-1/+1
| | | | | | | | | | This reverts commit r242300. This is causing buildbot failures which we are investigating. I'll reapply once we know whats going on, but for now want to get the bots green. llvm-svn: 242428
* [ARM] Define a subtarget feature that is used to avoid using movt/movwAkira Hatanaka2015-07-162-5/+50
| | | | | | | | | | | | | | | | | pairs for 32-bit immediates. This change is needed to avoid emitting movt/movw pairs when doing LTO and do so on a per-function basis. Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or false to avoid emitting movt/movw pairs should make changes to add subtarget feature "+no-movt" (see the changes made to clang in r242368). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11026 llvm-svn: 242369
* Add missing load/store flags to thumb2 instructions.Pete Cooper2015-07-151-1/+1
| | | | | | | | | | | | | These were the cause of a verifier error when building 7zip with -verify-machineinstrs. Running 'make check' with the verifier triggered the same error on the test here so i've updated the test to run the verifier on one of its runs instead of adding a new one. While looking at this code, there was a stale comment that these instructions were only used for disassembly. This probably used to be the case, but they are now used in the 'ARM load / store optimization pass' too. llvm-svn: 242300
* ARM: add at least one real test for r242123.Tim Northover2015-07-141-0/+10
| | | | | | | | The ones committed were orthogonal to the change and would have passed before that revision. What it *did* do was prevent an assertion failure when generating object files. llvm-svn: 242166
* PrologEpilogInserter: Rewrite API to determine callee save regsiters.Matthias Braun2015-07-141-4/+2
| | | | | | | | | | | | | | | | This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified. Related to rdar://21539507 Differential Revision: http://reviews.llvm.org/D10909 llvm-svn: 242165
* Generate correct asm info for mingw and cygwin ARM targets.Yaron Keren2015-07-144-12/+36
| | | | | | | | | http://reviews.llvm.org/D11075 Patch by Martell Malone Reviewed by Reid Kleckner llvm-svn: 242123
* ARM: Fix cttz expansion on vector types.Logan Chien2015-07-133-11/+473
| | | | | | | | | | | | The 64/128-bit vector types are legal if NEON instructions are available. However, there was no matching patterns for @llvm.cttz.*() intrinsics and result in fatal error. This commit fixes the problem by lowering cttz to: a. ctpop((x & -x) - 1) b. width - ctlz(x & -x) - 1 llvm-svn: 242037
* [ARM] Add support for nest attribute using r12Renato Golin2015-07-121-0/+21
| | | | | | | | | | | | | | | | Register r12 ('ip') is used by GCC for this purpose and hence is used here. As discussed on the GCC mailing list, the register choice is an ABI issue and so choosing the same register as GCC means __builtin_call_with_static_chain is compatible. A similar patch has just gone in the AArch64 backend, so this is just the ARM counterpart, following the same discussion. Patch by Stephen Cross. llvm-svn: 241996
* ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common codeMatthias Braun2015-07-101-4/+52
| | | | | | | | | | | | This commit factors out common code from MergeBaseUpdateLoadStore() and MergeBaseUpdateLSMultiple() and introduces a new function MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a strd/ldrd instruction into an strd/ldrd instruction with writeback where possible. Differential Revision: http://reviews.llvm.org/D10676 llvm-svn: 241928
* ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2Matthias Braun2015-07-105-12/+34
| | | | | | Differential Revision: http://reviews.llvm.org/D10623 llvm-svn: 241926
* ARMLoadStoreOptimizer: Rewrite LDM/STM matching logic.Matthias Braun2015-07-101-2/+2
| | | | | | | | | | | | | | | | | | | | | This improves the logic in several ways and is a preparation for followup patches: - First perform an analysis and create a list of merge candidates, then transform. This simplifies the code in that you have don't have to care to much anymore that you may be holding iterators to MachineInstrs that get removed. - Analyze/Transform basic blocks in reverse order. This allows to use LivePhysRegs to find free registers instead of the RegisterScavenger. The RegisterScavenger will become less precise in the future as it relies on the deprecated kill-flags. - Return the newly created node in MergeOps so there's no need to look around in the schedule to find it. - Rename some MBBI iterators to InsertBefore to make their role clear. - General code cleanup. Differential Revision: http://reviews.llvm.org/D10140 llvm-svn: 241920
* Fix test case to unbreak build.Akira Hatanaka2015-07-071-9/+18
| | | | | | | | This commit changes the target arch to fix the test case commited in r241566 that was failing on ninja-x64-msvc-RA-centos6. Also add checks to make sure the callee's address is loaded to blx's operand. llvm-svn: 241588
* [ARM] Define a subtarget feature and use it to decide whether long calls shouldAkira Hatanaka2015-07-075-9/+49
| | | | | | | | | | | | | | | | | be emitted. This is needed to enable ARM long calls for LTO and enable and disable it on a per-function basis. Out-of-tree projects currently using EnableARMLongCalls to emit long calls should start passing "+long-calls" to the feature string (see the changes made to clang in r241565). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D9364 llvm-svn: 241566
* llvm/test/CodeGen/ARM/fnattr-trap.ll: Add -mtriple, to appease targeting ↵NAKAMURA Takumi2015-07-031-2/+2
| | | | | | | | *-win32. LLVM ERROR: CPU: 'generic' does not support ARM mode execution! llvm-svn: 241329
* Use function attribute "trap-func-name" and remove TargetOptions::TrapFuncName.Akira Hatanaka2015-07-021-0/+40
| | | | | | | | | | | | | | | | | | This commit changes normal isel and fast isel to read the user-defined trap function name from function attribute "trap-func-name" attached to llvm.trap or llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to use clang's command line option "-ftrap-function" for LTO and enable changing the trap function name on a per-call-site basis. Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the trap function name should attach attribute "trap-func-name" to the call sites of llvm.trap and llvm.debugtrap instead. rdar://problem/21225723 Differential Revision: http://reviews.llvm.org/D10832 llvm-svn: 241305
* ARM: add correct kill flags when combining stm instructionsTim Northover2015-06-291-0/+43
| | | | | | | | | When the store sequence being combined actually stores the base register, we should not mark it as killed until the end. rdar://21504262 llvm-svn: 241003
* [ARM]: Extend -mfpu options for half-precision and vfpv3xdJaved Absar2015-06-291-1/+14
| | | | | | | | | | | | | | | Some of the the permissible ARM -mfpu options, which are supported in GCC, are currently not present in llvm/clang.This patch adds the options: 'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16. These are related to half-precision floating-point and single precision. Reviewers: rengolin, ranjeet.singh Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10645 llvm-svn: 240930
* [ARM] Cortex-R5 is not VFPOnlySPJaved Absar2015-06-261-1/+1
| | | | | | | | | | | | | This patch fixes the error in ARM.td which stated that Cortex-R5 floating point unit can do only single precision, when it can do double as well. Reviewers: rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10769 llvm-svn: 240799
* [ARM] Cortex-R4F is not VFPOnlySPJaved Absar2015-06-261-1/+1
| | | | | | | | | | | | | Cortex-R4F TRM states that fpu supports both single and double precision. This patch corrects the information in ARM.td file and corresponding test. Reviewers: rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10763 llvm-svn: 240776
* [ARM] Lower interleaved memory accesses to vldN/vstN intrinsics.Hao Liu2015-06-261-0/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4 %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %vld2 = { <4 x i32>, <4 x i32> } call llvm.arm.neon.vld2(%ptr, 4) %vec0 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr, align 4 into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240755
* Fix mismatched architectures in testMatthias Braun2015-06-261-1/+1
| | | | llvm-svn: 240745
* ARMLoadStoreOptimizer: Fix errata 602117 handling and make testcase actually ↵Matthias Braun2015-06-241-14/+15
| | | | | | | | | | test for it This fixes PR23912 Differential Revision: http://reviews.llvm.org/D10620 llvm-svn: 240582
* [ARM] Look through concat when lowering in-place shuffles (VZIP, ..)Ahmed Bougacha2015-06-193-241/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, we canonicalize shuffles that produce a result larger than their operands with: shuffle(concat(v1, undef), concat(v2, undef)) -> shuffle(concat(v1, v2), undef) because we can access quad vectors (see PerformVECTOR_SHUFFLECombine). This is useful in the general case, but there are special cases where native shuffles produce larger results: the two-result ops. We can look through the concat when lowering them: shuffle(concat(v1, v2), undef) -> concat(VZIP(v1, v2):0, :1) This lets us generate the native shuffles instead of scalarizing to dozens of VMOVs. Differential Revision: http://reviews.llvm.org/D10424 llvm-svn: 240118
* [ARM] Add D-sized vtrn/vuzp/vzip tests, and cleanup. NFC.Ahmed Bougacha2015-06-193-79/+819
| | | | llvm-svn: 240114
* Fix "the the" in comments.Eric Christopher2015-06-191-1/+1
| | | | llvm-svn: 240112
* Move the personality function from LandingPadInst to FunctionDavid Majnemer2015-06-1724-72/+63
| | | | | | | | | | | | | | | | | | | The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940
* [ARM] Disabling vfp4 should disable fp16John Brawn2015-06-121-3/+3
| | | | | | | | | | | ARMTargetParser::getFPUFeatures should disable fp16 whenever it disables vfp4, as otherwise something like -mcpu=cortex-a7 -mfpu=none leaves us with fp16 enabled (though the only effect that will have is a wrong build attribute). Differential Revision: http://reviews.llvm.org/D10397 llvm-svn: 239599
* Add explicit -mtriple=arm-unknown to ↵NAKAMURA Takumi2015-06-091-3/+3
| | | | | | llvm/test/CodeGen/ARM/disable-tail-calls.ll, to satisfy *-win32. llvm-svn: 239442
* Remove DisableTailCalls from TargetOptions and the code in resetTargetOptionsAkira Hatanaka2015-06-091-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | that was resetting it. Remove the uses of DisableTailCalls in subclasses of TargetLowering and use the value of function attribute "disable-tail-calls" instead. Also, unconditionally add pass TailCallElim to the pipeline and check the function attribute at the start of runOnFunction to disable the pass on a per-function basis. This is part of the work to remove TargetMachine::resetTargetOptions, and since DisableTailCalls was the last non-fast-math option that was being reset in that function, we should be able to remove the function entirely after the work to propagate IR-level fast-math flags to DAG nodes is completed. Out-of-tree users should remove the uses of DisableTailCalls and make changes to attach attribute "disable-tail-calls"="true" or "false" to the functions in the IR. rdar://problem/13752163 Differential Revision: http://reviews.llvm.org/D10099 llvm-svn: 239427
* [ARM] Pass a callback to FunctionPass constructors to enable skipping executionAkira Hatanaka2015-06-081-0/+22
| | | | | | | | | | | | | | | | on a per-function basis. Previously some of the passes were conditionally added to ARM's pass pipeline based on the target machine's subtarget. This patch makes changes to add those passes unconditionally and execute them conditonally based on the predicate functor passed to the pass constructors. This enables running different sets of passes for different functions in the module. rdar://problem/20542263 Differential Revision: http://reviews.llvm.org/D8717 llvm-svn: 239325
* [ARM] Add support for -sp- FPUs and FPU none to TargetParserJohn Brawn2015-06-051-3/+3
| | | | | | | | | | These are added mainly for the benefit of clang, but this also means that they are now allowed in .fpu directives and we emit the correct .fpu directive when single-precision-only is used. Differential Revision: http://reviews.llvm.org/D10238 llvm-svn: 239151
* ARM: Thumb2 LDRD/STRD supports independent input/output regsMatthias Braun2015-06-032-6/+24
| | | | | | | | | | | | | | | | The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 Recommiting after the revert in r238821, the buildbot still failed with the patch removed so there seems to be another reason for the breakage. llvm-svn: 238935
* Revert "ARM: Thumb2 LDRD/STRD supports independent input/output regs"Renato Golin2015-06-022-24/+6
| | | | | | | | | This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot. Since self-hosting issues with Clang are hard to investigate, I'm taking the liberty to revert now, so we can investigate it offline. llvm-svn: 238821
* ARM: Thumb2 LDRD/STRD supports independent input/output regsMatthias Braun2015-06-012-6/+24
| | | | | | | | | | | | | The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 llvm-svn: 238795
* Re-commit of r238201 with fix for building with shared libraries.Luke Cheeseman2015-06-015-2/+301
| | | | llvm-svn: 238739
* ARM: recommit r237590: allow jump tables to be placed as constant islands.Tim Northover2015-05-313-2/+94
| | | | | | | | | | | | | | | The original version didn't properly account for the base register being modified before the final jump, so caused miscompilations in Chromium and LLVM. I've fixed this and tested with an LLVM self-host (I don't have the means to build & test Chromium). The general idea remains the same: in pathological cases jump tables can be too far away from the instructions referencing them (like other constants) so they need to be movable. Should fix PR23627. llvm-svn: 238680
* Revert "Re-commit changes in r237579 with fix for bug breaking windows builds."Diego Novillo2015-05-265-301/+2
| | | | | | | This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223
* Re-commit changes in r237579 with fix for bug breaking windows builds.Luke Cheeseman2015-05-265-2/+301
| | | | llvm-svn: 238201
* Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.Akira Hatanaka2015-05-231-0/+25
| | | | | | | | | | | | | | This is part of the work to remove TargetMachine::resetTargetOptions. In this patch, instead of updating global variable NoFramePointerElim in resetTargetOptions, its use in DisableFramePointerElim is replaced with a call to TargetFrameLowering::noFramePointerElim. This function determines on a per-function basis if frame pointer elimination should be disabled. There is no change in functionality except that cl:opt option "disable-fp-elim" can now override function attribute "no-frame-pointer-elim". llvm-svn: 238080
* Revert r237590, "ARM: allow jump tables to be placed as constant islands."Peter Collingbourne2015-05-212-42/+2
| | | | | | | Caused a miscompile of the Android port of Chromium, details forthcoming. llvm-svn: 237972
* [Target/ARM] Only enable OptimizeBarrierPass at -O1 and above.Davide Italiano2015-05-202-1/+16
| | | | | | | | | | Ideally this is going to be and LLVM IR pass (shared, among others with AArch64), but for the time being just enable it if consumers ask us for optimization and not unconditionally. Discussed with Tim Northover on IRC. llvm-svn: 237837
* ARM: allow jump tables to be placed as constant islands.Tim Northover2015-05-182-2/+42
| | | | | | | | | | | | | | | | | Previously, they were forced to immediately follow the actual branch instruction. This was usually OK (the LEAs actually accessing them got emitted nearby, and weren't usually separated much afterwards). Unfortunately, a sufficiently nasty phi elimination dumps many instructions right before the basic block terminator, and this can increase the range too much. This patch frees them up to be placed as usual by the constant islands pass, and consequently has to slightly modify the form of TBB/TBH tables to refer to a PC-relative label at the final jump. The other jump table formats were already position-independent. rdar://20813304 llvm-svn: 237590
* Revert r237579, as it broke windows buildbotsOliver Stannard2015-05-185-301/+2
| | | | llvm-svn: 237583
* [LLVM - ARM/AArch64] Add ACLE special register intrinsicsOliver Stannard2015-05-185-2/+301
| | | | | | | | | | | | | | | | | | | This patch implements LLVM support for the ACLE special register intrinsics in section 10.1, __arm_{w,r}sr{,p,64}. This patch is intended to lower the read/write_register instrinsics, used to implement the special register intrinsics in the clang patch for special register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor registers in AArch32 and AArch64. This is done by inspecting the register string passed to the intrinsic and then lowering to the appropriate instruction. Patch by Luke Cheeseman. Differential Revision: http://reviews.llvm.org/D9699 llvm-svn: 237579
* [CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin.Ahmed Bougacha2015-05-141-3/+3
| | | | | | | | Other targets probably should as well. Since r237161, compiler-rt has both, but I don't see why anything other than gnueabi would use a gnueabi naming scheme. llvm-svn: 237324
* CodeGen: ignore DEBUG_VALUE nodes in KILL taggingSaleem Abdulrasool2015-05-121-0/+88
| | | | | | | DEBUG_VALUE nodes do not take part in code generation. Ignore them when performing KILL updates. Addresses PR23486. llvm-svn: 237211
* Changed renaming of local symbols by inserting a dot vefore the numeric suffix.Sunil Srivastava2015-05-122-4/+4
| | | | | | | One code change and several test changes to match that details in http://reviews.llvm.org/D9481 llvm-svn: 237150
OpenPOWER on IntegriCloud