summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] GlobalISel: Add support for G_SUBDiana Picus2017-04-185-0/+329
| | | | | | | Support G_SUB throughout the GlobalISel pipeline. It is exactly the same as G_ADD, nothing fancy. llvm-svn: 300546
* [ARM] Check for correct HW div when lowering divmodDiana Picus2017-04-181-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For subtargets that use the custom lowering for divmod, e.g. gnueabi, we used to check if the subtarget has hardware divide and then lower to a div-mul-sub sequence if true, or to a libcall if false. However, judging by the usage of hasDivide vs hasDivideInARMMode, it seems that hasDivide only refers to Thumb. For instance, in the ARMTargetLowering constructor, the code that specifies whether to use libcalls for (S|U)DIV looks like this: bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide() : Subtarget->hasDivideInARMMode(); In the case of divmod for arm-gnueabi, using only hasDivide() to determine what to do means that instead of lowering to __aeabi_idivmod to get the remainder, we lower to div-mul-sub and then further lower the div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but not in Thumb, we generate a libcall instead of using it (this is not an issue in practice since AFAICT none of the cores that we support have hardware divide in ARM but not Thumb). This patch fixes the code dealing with custom lowering to take into account the mode (Thumb or ARM) when deciding whether or not hardware division is available. Differential Revision: https://reviews.llvm.org/D32005 llvm-svn: 300536
* GlobalISel: Allow legalizing G_FADD to a libcallDiana Picus2017-04-112-6/+112
| | | | | | | | | Use the same handling in the generic legalizer code as for the other libcalls (G_FREM, G_FPOW). Enable it on ARM for float and double so we can test it. llvm-svn: 299931
* [SelectionDAG] Check CALLSEQ_BEGIN nodes in DelayForLiveRegsSam Parker2017-04-111-0/+136
| | | | | | | | | | | | | | | | | A fix for the bug reported in PR30911. The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as the last node to be unscheduled will gain access to the CallResource register. But when a node is being picked, only CALLSEQ_END nodes are checked against the CallResource and have their chains evaluated. This then means that other CALLSEQ_BEGIN nodes can be scheduled before the existing call sequence has been finalised. This patch adds a check against the FrameSetup nodes in DelayForLiveRegs to prevent this from happening. Differential Revision: https://reviews.llvm.org/D31536 llvm-svn: 299926
* [ARM, x86] add tests to show possible improvement for bool math; NFCSanjay Patel2017-04-101-0/+32
| | | | llvm-svn: 299897
* Add address space mangling to lifetime intrinsicsMatt Arsenault2017-04-103-11/+11
| | | | | | In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
* [ARM] GlobalISel: Support G_FPOW for float and doubleDiana Picus2017-04-102-3/+114
| | | | | | Legalize to a libcall. llvm-svn: 299841
* [ARM] Prefer BIC over BFC in ARM mode.Eli Friedman2017-04-077-19/+25
| | | | | | | | | | | | BIC is generally faster, and it can put the output in a different register from the input. We already do this in Thumb2 mode; not sure why the equivalent fix never got applied to ARM mode. Differential Revision: https://reviews.llvm.org/D31797 llvm-svn: 299803
* [ARM] GlobalISel: Test hard float properlyDiana Picus2017-04-071-16/+26
| | | | | | | | It turns out -float-abi=hard doesn't set the hard float calling convention for libcalls. We need to use a hard float triple instead (e.g. gnueabihf). llvm-svn: 299761
* [ARM] GlobalISel: Support frem for 64-bit valuesDiana Picus2017-04-072-0/+58
| | | | | | Legalize to a libcall. llvm-svn: 299756
* [ARM] GlobalISel: Support frem for 32-bit valuesDiana Picus2017-04-072-0/+48
| | | | | | | | Legalize to a libcall. On this occasion, also start allowing soft float subtargets. For the moment G_FREM is the only legal floating point operation for them. llvm-svn: 299753
* Turn on -addr-sink-using-gep by default.Eli Friedman2017-04-061-1/+0
| | | | | | | | | The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
* [SelectionDAG] [ARM CodeGen] Fix chain information of LowerMULHuihui Zhang2017-04-061-0/+115
| | | | | | | | | | | | | | In LowerMUL, the chain information is not preserved for the new created Load SDNode. For example, if a Store alias with one of the operand of Mul. The Load for that operand need to be scheduled before the Store. The dependence is recorded in the chain of Store, in TokenFactor. However, when lowering MUL, the SDNodes for the new Loads for VMULL are not updated in the TokenFactor for the Store. Thus the chain is not preserved for the lowered VMULL. llvm-svn: 299701
* Revert "[ARM] Add Kryo to available targets"Yi Kong2017-04-061-1/+0
| | | | | | | | This reverts commit 942d6e6f58bf7e63810dd7cbcbce1fdfa5ebc6d4. Build breakage. llvm-svn: 299689
* [SDAG] Fix visitAND optimization to deal with vector extract case again.Nirav Dave2017-04-061-0/+22
| | | | | | | | | | | | | | | Summary: Fix case elided by rL298920. Fixes PR32545. Reviewers: eli.friedman, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31759 llvm-svn: 299688
* [ARM] Add Kryo to available targetsYi Kong2017-04-061-0/+1
| | | | | | | | | | | | | | | | Summary: Host CPU detection now supports Kryo, so we need to recognize it in ARM target. Reviewers: mcrosier, t.p.northover, rengolin, echristo, srhines Reviewed By: t.p.northover, echristo Subscribers: aemerson Differential Revision: https://reviews.llvm.org/D31775 llvm-svn: 299674
* [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to ↵Sanjay Patel2017-04-051-14/+9
| | | | | | | | | | | | | | | | | bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542
* add/move codegen tests for and/or of setcc; NFCSanjay Patel2017-04-032-36/+79
| | | | llvm-svn: 299396
* [CodeGen] clean up and add tests for scalar and-of-setcc; NFCSanjay Patel2017-03-292-14/+36
| | | | | | https://bugs.llvm.org/show_bug.cgi?id=32401 llvm-svn: 299034
* Improve machine schedulers for in-order processorsJaved Absar2017-03-271-0/+86
| | | | | | | | | | | This patch enables schedulers to specify instructions that cannot be issued with any other instructions. It also fixes BeginGroup/EndGroup. Reviewed by: Andrew Trick Differential Revision: https://reviews.llvm.org/D30744 llvm-svn: 298885
* [ARM] Fix mixup between Lo and Hi in SMLALBB formation.Eli Friedman2017-03-251-84/+84
| | | | llvm-svn: 298752
* [ARM] Fix computeKnownBits for ARMISD::CMOVPirama Arumuga Nainar2017-03-231-0/+19
| | | | | | | | | | | | | | | | | | | Summary: The true and false operands for the CMOV are operands 0 and 1. ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2 instead. This can cause CMOV instructions to be incorrectly folded into BFI if value set by the CMOV is another CMOV, whose known bits are computed incorrectly. This patch fixes the issue and adds a test case. Reviewers: kristof.beyls, jmolloy Subscribers: llvm-commits, aemerson, srhines, rengolin Differential Revision: https://reviews.llvm.org/D31265 llvm-svn: 298624
* [GlobalISel] Fix shufflevector testsVolkan Keles2017-03-211-25/+25
| | | | | | | | clang-lld-x86_64-2stage fails because of the order of the instructions. `CHECK-DAG` directives should fix the problem. llvm-svn: 298367
* [GlobalISel] Translate shufflevectorVolkan Keles2017-03-211-0/+80
| | | | | | | | | | | | Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders Reviewed By: javed.absar Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30962 llvm-svn: 298347
* [ARM] Fix PR32130: Handle promotion of zero sized constants.Vadzim Dambrouski2017-03-201-0/+9
| | | | | | | | | | | The special case of zero sized values was previously not handled correctly. This patch handles this by not promoting if the size is zero. Patch by Tim Neumann. Differential Revision: https://reviews.llvm.org/D31116 llvm-svn: 298320
* [GlobalISel] Use the correct calling conv for callsDiana Picus2017-03-201-0/+17
| | | | | | | | | | | This commit adds a parameter that lets us pass in the calling convention of the call to CallLowering::lowerCall. This allows us to handle situations where the calling convetion of the callee is different from that of the caller. Differential Revision: https://reviews.llvm.org/D31039 llvm-svn: 298254
* [GlobalISel] Don't select trivially dead instructions.Ahmed Bougacha2017-03-191-2/+8
| | | | | | | | | | | | | | Folding instructions when selecting can cause them to become dead. Don't select these dead instructions (if they don't have other side effects, and don't define physical registers). Preserve existing tests by adding COPYs. In some tests, the G_CONSTANT vregs never get constrained to a class: the only use of the vreg was folded into another instruction, so the G_CONSTANT, now dead, never gets selected. llvm-svn: 298224
* [SelectionDAG] Remove redundant stores more aggressively.Eli Friedman2017-03-171-2/+1
| | | | | | | | | | | | | | | | Handle TokenFactors more aggressively in SDValue::reachesChainWithoutSideEffects. This isn't really a very effective change anymore because of other changes to chain handling, but it's a cheap check, and the expanded comments are still useful. It might be possible to loosen the hasOneUse() requirement with a deeper analysis, but a naive implementation of that check would be expensive. Differential Revision: https://reviews.llvm.org/D29845 llvm-svn: 298156
* [ARM] Use alias analysis in ARMPreAllocLoadStoreOpt.Eli Friedman2017-03-172-0/+58
| | | | | | | | | | This allows the optimization to rearrange loads and stores more aggressively. This doesn't really affect performance, but it helps codesize. Differential Revision: https://reviews.llvm.org/D30839 llvm-svn: 298021
* PR32288: More efficient encoding for DWARF expr subregister access.Adrian Prantl2017-03-162-3/+1
| | | | | | | | | | | | | | | | | | | | | | | Citing http://bugs.llvm.org/show_bug.cgi?id=32288 The DWARF generated by LLVM includes this location: 0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply 0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable to assume the DWARF consumer knows which part of a register logically holds the value (low bytes, high bytes, how many bytes, etc) for a primitive value like an integer. This patch gets rid of the redundant DW_OP_piece when a subregister is at offset 0. It also adds previously missing subregister masking when a subregister is followed by another operation. (This reapplies r297960 with two additional testcase updates). rdar://problem/31069390 https://reviews.llvm.org/D31010 llvm-svn: 297965
* [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.Jonas Paulsson2017-03-161-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930
* ARM: avoid clobbering register in v6 jump-table expansion.Tim Northover2017-03-151-0/+384
| | | | | | | | | | | If we got unlucky with register allocation and actual constpool placement, we could end up producing a tTBB_JT with an index that's already been clobbered. Technically, we might be able to fix this situation up with a MOV, but I think the constant islands pass is complex enough without having to deal with more weird edge-cases. llvm-svn: 297871
* [ARM] Enable SMLAL[B|T] iselSam Parker2017-03-151-41/+205
| | | | | | | | | | | Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Differential Revision: https://reviews.llvm.org/D30044 llvm-svn: 297809
* [ARM] Move SMULW[B|T] isel to DAG CombineSam Parker2017-03-141-0/+29
| | | | | | | | | | | | Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-03-146-121/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
* [ARM] GlobalISel: Support SP in regbankselectDiana Picus2017-03-131-0/+35
| | | | | | | We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621
* [DAGCombine] Simplify ISD::AND in GetDemandedBits.Eli Friedman2017-03-081-49/+28
| | | | | | | | | This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249
* SjLjEHPrepare: Fix the pass for swifterror argumentsArnold Schwaighofer2017-03-071-0/+27
| | | | | | | | | | | We cannot leave the identity copies 'select true, arg, undef' that this pass inserts for arguments to simplify handling of values on swifterror arguments. swifterror arguments have restrictions on their uses. rdar://30839288 llvm-svn: 297197
* [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each ↵Ranjeet Singh2017-03-071-0/+44
| | | | | | | | | | | | | | | | | | | | other"" The original patch r296865 was reverted as it broke the chromium builds for Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies r296865 with a fix to make sure it doesn't cause the build regression. The problem was that intrinsic selection on int_arm_get_fpscr was failing in ISel this was because the code to manually select this intrinsic still thought it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as it doesn't semantically match the definition in the tablegen code which says it does have side-effects, I've fixed this by updating the intrinsic type to INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on Hans original reproducer. Differential Revision: https://reviews.llvm.org/D30645 llvm-svn: 297137
* In Thumb1, materialize a move between low registers as a `movs`, if CPSR ↵Artyom Skrobov2017-03-073-27/+28
| | | | | | | | | | | | | | | | isn't live. Summary: Previously, it had always been materialized as a push/pop sequence. Reviewers: labrinea, jroelofs Reviewed By: jroelofs Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30648 llvm-svn: 297134
* GlobalISel: restrict G_EXTRACT instruction to just one operand.Tim Northover2017-03-064-12/+24
| | | | | | | A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100
* Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other"Hans Wennborg2017-03-031-23/+0
| | | | | | It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr". llvm-svn: 296926
* [ARM] fpscr read/write intrinsics not aware of each otherRanjeet Singh2017-03-031-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and write to the fpscr (Floating-Point Status and Control Register) register. A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which treats this intrinsic as a IntroNoMem which means it's not a memory access and doesn't have any other side-effects. Having this property on this intrinsic means that various optimizations can be done on this such as common sub-expression elimination with other reads. This can cause issues if there has been write to this register, e.g. void foo(int *p) { p[0] = __builtin_arm_get_fpscr(); __builtin_arm_set_fpscr(1); p[1] = __builtin_arm_get_fpscr(); } in the above example the second read is currently CSE'd into the first read, this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr effects the same register that __builtin_arm_get_fpscr reads from, to fix this problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is treated as a memory access. Differential Revision: https://reviews.llvm.org/D30542 llvm-svn: 296865
* [SDAG] Revert r296476 (and r296486, r296668, r296690).Chandler Carruth2017-03-036-81/+102
| | | | | | | | | | This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862
* [ARM] Fix insert point for store rescheduling.Eli Friedman2017-03-022-13/+57
| | | | | | | | | | | | | | | | In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would move stores to the wrong insert point. Re-commit with a fix to increment NumMove in the right place. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296815
* [DAGCombiner] avoid assertion when folding binops with opaque constantsSanjay Patel2017-03-021-0/+51
| | | | | | | | | | | | | This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768
* Revert r296708; causing test failures on ARM hosts.Eli Friedman2017-03-021-19/+13
| | | | | | | | | | | | | | | Original commit message: [ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. llvm-svn: 296718
* [ARM] Fix insert point for store rescheduling.Eli Friedman2017-03-011-13/+19
| | | | | | | | | | | | | In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296708
* [ARM] Check correct instructions for load/store rescheduling.Eli Friedman2017-03-013-16/+142
| | | | | | | | | | | | | | | | | | | | This code starts from the high end of the sorted vector of offsets, and works backwards: it tries to find contiguous offsets, process them, then pops them from the end of the vector. Most of the code agrees with this order of processing, but one loop doesn't: it instead processes elements from the low end of the vector (which are nodes with unrelated offsets). Fix that loop to process the correct elements. This has a few implications. One, we don't incorrectly return early when processing multiple groups of offsets in the same block (which allows rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert point for loads, so they're correctly sorted (which affects the scheduling of vldm-liveness.ll). I think it might also impact some of the heuristics slightly. Differential Revision: https://reviews.llvm.org/D30368 llvm-svn: 296701
* [DAGCombiner] fold binops with constant into select-of-constantsSanjay Patel2017-03-011-7/+5
| | | | | | | | | | | | | | | | | | This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699
OpenPOWER on IntegriCloud