summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
* Recommit r265547, and r265610,r265639,r265657 on top of it, plusWei Mi2016-04-131-0/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | two fixes with one about error verify-regalloc reported, and another about live range update of phi after rematerialization. r265547: Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Patches on top of r265547: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" Differential Revision: http://reviews.llvm.org/D15302 Differential Revision: http://reviews.llvm.org/D18934 Differential Revision: http://reviews.llvm.org/D18935 Differential Revision: http://reviews.llvm.org/D18936 llvm-svn: 266162
* CodeGen: Clear the MFI's save and restore point after PrologEpilogInserterJustin Bogner2016-04-121-0/+27
| | | | | | | | | | This state is no longer useful and not guaranteed to be valid in later codegen passes. For example, see the added test, which would print a savepoint of %bb.-1 without this change, and crashes with a use-after-free error under ASan if you apply the recycling allocator patch from llvm.org/PR26808. llvm-svn: 266150
* ARM: use r7 as the frame-pointer on all MachO targets.Tim Northover2016-04-112-8/+10
| | | | | | | | | | | | This is better for a few reasons: + It matches the other tooling for iOS. + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't use ARM mode). + It leads to infinitesimally smaller code (0.2%, yay!). rdar://25369506 llvm-svn: 266003
* Swift Calling Convention: swifterror target support.Manman Ren2016-04-111-0/+381
| | | | | | Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997
* More upgrading of old- and very-old-style debug info in testcases.Adrian Prantl2016-04-112-3/+3
| | | | llvm-svn: 265953
* Add trailing colons to labels in a test.James Y Knight2016-04-081-12/+12
| | | | | | | This will avoid matching on the FILENAME if it happened to contain, say, "f4" anywhere in the file path. llvm-svn: 265837
* Revert r265817Colin LeMahieu2016-04-084-8/+8
| | | | | | lld tests need to be addressed. llvm-svn: 265822
* [llvm-objdump] Printing hex instead of dec by defaultColin LeMahieu2016-04-084-8/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D18770 llvm-svn: 265817
* [ARM] Enable SMLAW[B|T] and SMLUW[B|T] instruction selectionSam Parker2016-04-082-26/+105
| | | | | | | | | | Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt, smulwb and smulwt instructions for the ARM backend. Also updated the smul CodeGen test and removed the smulw one. Differential Revision: http://reviews.llvm.org/D18892 llvm-svn: 265793
* Swift Calling Convention: swiftcc for ARM.Manman Ren2016-04-052-0/+201
| | | | | | Differential Revision: http://reviews.llvm.org/D18769 llvm-svn: 265482
* Don't delete empty preheaders in CodeGenPrepare if it would create a ↵Chuang-Yu Cheng2016-04-052-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
* ARM, AArch64, X86: Check preserved registers for tail calls.Matthias Braun2016-04-041-0/+22
| | | | | | | | | | | | | | | | We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329
* Add missing emissionKind flags to the DICompileUnits of several old testcases.Adrian Prantl2016-04-011-1/+1
| | | | llvm-svn: 265192
* testcase gardening: update the emissionKind enum to the new syntax. (NFC)Adrian Prantl2016-04-0118-18/+18
| | | | llvm-svn: 265081
* Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.Adrian Prantl2016-03-317-7/+7
| | | | | | | | | | | | | This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
* [ARM] Expand v1i64 and v2i64 ctpop.Benjamin Kramer2016-03-311-0/+16
| | | | | | | The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040
* Swift Calling Convention: add swiftself attribute.Manman Ren2016-03-291-0/+32
| | | | | | Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754
* [Codegen] Decrease minimum jump table density.Kyle Butt2016-03-291-1/+1
| | | | | | | | | | | Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689
* fix checks: *_DAG -> *-DAGSanjay Patel2016-03-281-1/+1
| | | | llvm-svn: 264676
* ARM: maintain BB ordering when expanding WIN__DBZCHKSaleem Abdulrasool2016-03-252-25/+64
| | | | | | | | | | | | | It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454
* ARM: fix optimised division on WoASaleem Abdulrasool2016-03-251-3/+32
| | | | | | | | | We did not have an explicit branch to the continuation BB. When the check was hoisted, this could permit control follow to fall through into the division trap. Add the explicit branch to the continuation basic block to ensure that code execution is correct. llvm-svn: 264370
* Remove unsafe AssertZext after promoting result of FP_TO_FP16Pirama Arumuga Nainar2016-03-241-0/+12
| | | | | | | | | | | | | | | Summary: Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32 instruction, do not guarantee that the top 16 bits are zeroed out. Remove the unsafe AssertZext and add tests to exercise this. Reviewers: jmolloy, sbaranga, kristof.beyls, aadg Subscribers: llvm-commits, srhines, aemerson Differential Revision: http://reviews.llvm.org/D18426 llvm-svn: 264285
* CodeGen: check return types match when emitting tail call to builtin.Tim Northover2016-03-221-0/+37
| | | | | | | | | | | We were just completely ignoring the types when determining whether we could safely emit a libcall as a tail call. This is clearly wrong. Theoretically, we could dig deeper looking for incidental matches (much like the generic code in Analysis.cpp does), but it's probably not worth it for the few libcalls that exist. llvm-svn: 264084
* ARM: Better codegen for 64-bit compares.Peter Collingbourne2016-03-213-144/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a custom lowering for ISD::SETCCE (introduced in r253572) that allows us to emit a short code sequence for 64-bit compares. Before: push {r7, lr} cmp r0, r2 mov.w r0, #0 mov.w r12, #0 it hs movhs r0, #1 cmp r1, r3 it ge movge.w r12, #1 it eq moveq r12, r0 cmp.w r12, #0 bne .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} After: push {r7, lr} subs r0, r0, r2 sbcs.w r0, r1, r3 bge .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} Saves around 80KB in Chromium's libchrome.so. Some notes on this patch: - I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I introduced (nothing else needs them). However, they are necessary in order to avoid poor codegen, and they seem similar to existing combines in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to (brcond Compare)). - No support for Thumb-1. This is in principle possible, but we'd need to implement ARMISD::SUBE for Thumb-1. Differential Revision: http://reviews.llvm.org/D15256 llvm-svn: 263962
* [ARM] Add Cortex-A32 supportRenato Golin2016-03-211-0/+35
| | | | | | | | Adding Cortex-A32 as an available target in the ARM backend. Patch by Sam Parker. llvm-svn: 263956
* [DAGCombine] Catch the case where extract_vector_elt can cause an any_ext ↵Silviu Baranga2016-03-211-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | while processing AND SDNodes Summary: extract_vector_elt can cause an implicit any_ext if the types don't match. When processing the following pattern: (and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c) DAGCombine was ignoring the possible extend, and sometimes removing the AND even though it was required to maintain some of the bits in the result to 0, resulting in a miscompile. This change fixes the issue by limiting the transformation only to cases where the extract_vector_elt doesn't perform the implicit extend. Reviewers: t.p.northover, jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18247 llvm-svn: 263935
* [CXX_FAST_TLS] Fix issues in ARM.Manman Ren2016-03-181-4/+16
| | | | | | | | | We need to be careful on which registers can be explicitly handled via copies. Prologue, Epilogue use physical registers and if one belongs to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites it after the explicit copies. llvm-svn: 263857
* [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.Manman Ren2016-03-181-0/+13
| | | | | | | Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
* [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.Manman Ren2016-03-181-1/+49
| | | | | | | Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855
* ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.Tim Northover2016-03-172-0/+16
| | | | llvm-svn: 263741
* ARM: Revert SVN r253865, 254158, fix windows divisionSaleem Abdulrasool2016-03-173-81/+95
| | | | | | | | | | | | | | | | | | | | | | The two changes together weakened the test and caused a regression with division handling in MSVC mode. They were applied to avoid an assertion being triggered in the block frequency analysis. However, the underlying problem was simply being masked rather than solved properly. Address the actual underlying problem and revert the changes. Rather than analyze the cause of the assertion, the division failure was assumed to be an overflow. The underlying issue was a subtle bug in the BB construction in the emission of the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor information in the basic blocks, nor did we update the PHIs associated with the basic block when we split them. This would result in assertions being triggered in the block frequency analysis pass. Although the original tests are being removed, the tests themselves performed very little in terms of validation but merely tested that we did not assert when generating code. Update this with new tests that actually ensure that we do not regress on the code generation. llvm-svn: 263714
* [ARM] Cortex-R8 supportAlexandros Lamprineas2016-03-101-0/+31
| | | | | | | | | | | This patch adds Cortex-R8 to Target Parser and TableGen. It also adds CodeGen tests for the build attributes. Patch by Pablo Barrio. Differential Revision: http://reviews.llvm.org/D17925 llvm-svn: 263132
* ARM: follow up improvements for SVN r263118Saleem Abdulrasool2016-03-101-0/+1
| | | | | | | | | | | | | | The initial change was insufficiently complete for always getting the semantics of __builtin_longjmp correct. The builtin is translated into a `tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the code would then follow up with a clobber of R11. I had failed to notice the imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible to conditionalise the Defs list via an !if. Instead, construct a new parallel WIN node and prefer that when targeting windows. This ensures that we now both correctly model the __builtin_longjmp as well as construct the frame in a more ABI conformant manner. llvm-svn: 263123
* ARM: correct __builtin_longjmp on WoASaleem Abdulrasool2016-03-101-0/+16
| | | | | | | | WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and to properly restore the frame pointer on execution. llvm-svn: 263118
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-041-1/+18
| | | | | | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make sure we only emit valid types or the ones that were explicitly marked as custom. Now, passing check-all and test-suite on x86, ARM and AArch64. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262738
* llvm/test/CodeGen/ARM/rem_crash.ll: Avoid unsupported targets to specify ↵NAKAMURA Takumi2016-03-031-1/+1
| | | | | | | | | | explicit triple. We will see it for targeting win32; LLVM ERROR: CPU: 'generic' does not support ARM mode execution! llvm-svn: 262668
* Making rem_crash.ll target-specificRenato Golin2016-03-031-0/+257
| | | | | | | | | | This test failed in some ARM bots after a divmod change because it was running on a native llc, instead of targeted one. This makes sure the test is target-specific (as intended), and also copies to ARM and AArch64 directories. If it is also supposed to work on other architectures, I'll leave as an exercise to the respective maintainers. llvm-svn: 262620
* Revert "[ARM] Merging 64-bit divmod lib calls into one"Renato Golin2016-03-031-3/+1
| | | | | | This reverts commit r262507, which broke some ARM buildbots. llvm-svn: 262594
* [ARM] Merging 64-bit divmod lib calls into oneRenato Golin2016-03-021-1/+3
| | | | | | | | | | | | | | | | | When div+rem calls on the same arguments are found, the ARM back-end merges the two calls into one __aeabi_divmod call for up to 32-bits values. However, for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't merging the calls, and thus calling ldivmod twice and spilling the temporary results, which generated pretty bad code. This patch legalises 64-bit lib calls for divmod, so that now all the spilling and the second call are gone. It also relaxes the DivRem combiner a bit on the legal type check, since it was already checking for isLegalOrCustom on every value, so the extra check for isTypeLegal was redundant. This patch fixes PR17193 (and a long time FIXME in the tests). llvm-svn: 262507
* ARM: Introduce conservative load/store optimization modeMatthias Braun2016-03-022-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which means that unaligned loads/stores do not trap and even extensive testing will not catch these bugs. However the multi/double variants are not affected by this bit and will still trap. In effect a more aggressive load/store optimization will break existing (bad) code. These bugs do not necessarily manifest in the broken code where the misaligned pointer is formed but often later in perfectly legal code where it is accessed. This means recompiling system libraries (which have no alignment bugs) with a newer compiler will break existing applications (with alignment bugs) that worked before. So (under protest) I implemented this safe mode which limits the formation of multi/double operations to cases that are not affected by user code (stack operations like spills/reloads) or cases where the normal operations trap anyway (floating point load/stores). It is disabled by default. Differential Revision: http://reviews.llvm.org/D17015 llvm-svn: 262504
* ARM: sink atomic release barrier as far as possible into cmpxchg.Tim Northover2016-02-224-46/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | DMB instructions can be expensive, so it's best to avoid them if possible. In atomicrmw operations there will always be an attempted store so a release barrier is always needed, but in the cmpxchg case we can delay the DMB until we know we'll definitely try to perform a store (and so need release semantics). In the strong cmpxchg case this isn't quite free: we must duplicate the LDREX instructions to skip the barrier on subsequent iterations. The basic outline becomes: ldrex rOld, [rAddr] cmp rOld, rDesired bne Ldone dmb Lloop: strex rRes, rNew, [rAddr] cbz rRes Ldone ldrex rOld, [rAddr] cmp rOld, rDesired beq Lloop Ldone: So we'll skip this version for strong operations in "minsize" functions. llvm-svn: 261568
* [RegAllocFast] Properly track the physical register definitions on calls.Quentin Colombet2016-02-201-1/+3
| | | | | | PR26485 llvm-svn: 261384
* [SjLjEHPrepare] Don't grab pointers to functions in doInitializationDavid Majnemer2016-02-191-0/+31
| | | | | | | | | | Certain optimization passes (like globaldce) can prune function declaration that SjLjEHPrepare assumed would exit when it'd runOnFunction. This fixes PR26669. llvm-svn: 261303
* When printing MIR, output to errs() rather than outs().Justin Lebar2016-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: Without this, this command $ llvm-run llc -stop-after machine-cp -o - <( echo '' ) outputs an error, because we close stdout twice -- once when closing the file opened for "-o", and again when closing outs(). Also clarify in the outs() definition that you can't ever call it if you want to open your own raw_fd_ostream on stdout. Reviewers: jroelofs, tstellarAMD Subscribers: jholewinski, qcolombet, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17422 llvm-svn: 261286
* Fix some erroneous lit test failures due to unlucky name of working directory.Mitch Bodart2016-02-173-7/+16
| | | | | | Differential Revision: http://reviews.llvm.org/D17044 llvm-svn: 261104
* ARM: support TLS for WoASaleem Abdulrasool2016-02-031-0/+143
| | | | | | | | | | | Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676
* [ARM] Move GNUEABI divmod to __aeabi_divmod*Renato Golin2016-02-031-47/+2
| | | | | | | | | | The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657
* Removed FeatureVFPOnlySP from the Cortex-R7 processor modelSjoerd Meijer2016-02-021-2/+1
| | | | | | | | | description and changed the regression test accordingly. The default configuration of a Cortex-R7 is to implement the VFPv3-D16 architecture and the feature line as it was is too restrictive. llvm-svn: 259480
* ARM: don't mangle DAG constant if it has more than one useTim Northover2016-01-291-0/+17
| | | | | | | | | | | | | | | | The basic optimisation was to convert (mul $LHS, $complex_constant) into roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected to be cheaper. The original logic checks that the mul only has one use (since we're mangling $complex_constant), but when used in even more complex addressing modes there may be an outer addition that can pick up the wrong value too. I *think* the ARM addressing-mode problem is actually unreachable at the moment, but that depends on complex assessments of the profitability of pre-increment addressing modes so I've put a real check in there instead of an assertion. llvm-svn: 259228
* [ARM] Emit trap instruction using .inst directiveAlexandros Lamprineas2016-01-292-20/+49
| | | | | | | | | | The trap instruction is emitted as a data-in-text rather than an instruction. This patch uses the .inst directive for emitting trap. Differential Revision: http://reviews.llvm.org/D16684 llvm-svn: 259182
OpenPOWER on IntegriCloud