summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-141-31/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
* [ARM] FP16: codegen support for VTRNSjoerd Meijer2018-08-091-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
* [ARM] FP16: support vector zip and unzipSjoerd Meijer2018-08-031-0/+4
| | | | | | | | This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835
* Remove trailing spaceFangrui Song2018-07-301-4/+4
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes.Eli Friedman2018-06-281-0/+4
| | | | | | | | | | | | We don't ever check these again (unless you're using -fno-integrated-as), so make sure the extracted bits are well-defined. I don't think it's possible to trigger any of the assertions on trunk, but it's difficult to prove. (The first one depends on DAGCombine to minimize the number of set bits in AND masks; I think the others are mathematically impossible to hit.) llvm-svn: 335931
* [NEON] Support vldNq intrinsics in AArch32 (LLVM part)Ivan A. Kosarev2018-06-271-55/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733
* ARM: convert ORR instructions to ADD where possible on Thumb.Tim Northover2018-06-201-0/+10
| | | | | | | | | | | | Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a 3-address encoding and a wider range of immediates. So, particularly when optimizing for code size (but it doesn't make things worse elsewhere) it's beneficial to select an OR operation to an ADD if we know overflow won't occur. This is made even better by LLVM's penchant for putting operations in canonical form by converting the other way. llvm-svn: 335119
* [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)Ivan A. Kosarev2018-06-101-3/+46
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361
* [ARM] Allow CMPZ transforms even if the input has multiple uses.Eli Friedman2018-06-081-1/+1
| | | | | | | | | | It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322
* [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)Ivan A. Kosarev2018-06-021-3/+46
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47120 llvm-svn: 333825
* Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"Ivan A. Kosarev2018-06-021-46/+3
| | | | | | | | The LLVM part was committed instead of the Clang part. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333824
* [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)Ivan A. Kosarev2018-06-021-3/+46
| | | | | | | | | We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333819
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-8/+8
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-191-1/+1
| | | | | | | Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898
* [ARM] Support for v4f16 and v8f16 vectorsSjoerd Meijer2018-03-191-0/+2
| | | | | | | | | | | | This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839
* Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""Nirav Dave2018-03-171-1/+1
| | | | | | as it times out building test-suite on PPC. llvm-svn: 327778
* [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"Nirav Dave2018-03-171-1/+1
| | | | | | | Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777
* Revert: r327172 "Correct load-op-store cycle detection analysis"Nirav Dave2018-03-101-1/+1
| | | | | | | | | | r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197
* [DAG] Enforce stricter NodeId invariant during Instruction selectionNirav Dave2018-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170
* [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WBFlorian Hahn2018-03-021-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570
* [ARM] Armv8.2-A FP16 code generation (part 1/3)Sjoerd Meijer2018-01-261-10/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is *not* supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512
* [ARM] Fix erroneous availability of SMMLS for Armv7-MAndre Vieira2018-01-121-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D41855 llvm-svn: 322360
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* [ARM] Split Arm jump table branch into i12 and rs suffixed versionsMomchil Velikov2017-11-151-167/+0
| | | | | | | | | This is a refactoring/cleanup of Arm `addrmode2` operand class. The patch removes it completely. Differential Revision: https://reviews.llvm.org/D39832 llvm-svn: 318291
* [ARM] Tidy up banked registers encodingJaved Absar2017-08-031-35/+4
| | | | | | | | | | Moves encoding (SYSm) information of banked registers to ARMSystemRegister.td, where it rightly belongs and forms a single point of reference in the code. Reviewed by: @fhahn, @rovka, @olista01 Differential Revision: https://reviews.llvm.org/D36219 llvm-svn: 309910
* [ARM] Unify handling of M-Class system registersJaved Absar2017-07-191-89/+11
| | | | | | | | | | | | | | | | This patch cleans up and fixes issues in the M-Class system register handling: 1. It defines the system registers and the encoding (SYSm values) in one place: a new ARMSystemRegister.td using SearchableTable, thereby removing the hand-coded values which existed in multiple places. 2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed! Ref: ARMv6/7/8M architecture reference manual. Reviewed by: @t.p.northover, @olist01, @john.brawn Differential Revision: https://reviews.llvm.org/D35209 llvm-svn: 308456
* [ARM] Allow rematerialization of ARM Thumb literal pool loadsSam Parker2017-07-141-3/+17
| | | | | | | | | | | | | | | | | | Constants are crucial for code size in the ARM Thumb-1 instruction set. The 16 bit instruction size often does not offer enough space for immediate arguments. This means that additional instructions are frequently used to load constants into registers. Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function. This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack. This patch fixes the rematerialization of literal pool loads in the ARM Thumb instruction set. Patch by Philip Ginsbach Differential Revision: https://reviews.llvm.org/D33936 llvm-svn: 308004
* ARM: avoid handing a deleted node back to TableGen during ISel.Tim Northover2017-05-021-0/+4
| | | | | | | | | | When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983
* ARM: handle post-indexed NEON ops where the offset isn't the access width.Tim Northover2017-04-201-9/+22
| | | | | | | | | | | Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878
* Fix use-after-frees on memory allocated in a Recycler.Benjamin Kramer2017-04-201-3/+3
| | | | | | | | This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867
* [ARM] Use TableGen patterns to select vtbl. NFC.Eli Friedman2017-04-191-91/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749
* [ARM] Replace some C++ selection code with TableGen patterns. NFC.Eli Friedman2017-03-141-57/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D30794 llvm-svn: 297768
* [ARM] Move SMULW[B|T] isel to DAG CombineSam Parker2017-03-141-142/+0
| | | | | | | | | | | | Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716
* Refactor the multiply-accumulate combines to act onArtyom Skrobov2017-03-101-32/+0
| | | | | | | | | | | | | | | | | ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]. Summary: This allows for some simplification because the combines are no longer limited to just one go at the node before it gets legalized into an ARM target-specific one. Reviewers: jmolloy, rogfer01 Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30401 llvm-svn: 297453
* In Thumb1 mode, the custom lowering for ARMISD::CMPZ could never emit tADDi3Artyom Skrobov2017-02-171-17/+14
| | | | | | | | | | | | Reviewers: jmolloy, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30097 llvm-svn: 295478
* [ARM] Fix incorrect mask bits in MSR encoding for write_register intrinsicJohn Brawn2017-02-101-10/+6
| | | | | | | | | | | In the encoding of system registers in the M-class MSR instruction the mask bits should be 2 for registers that don't take a _<bits> qualifier (the instruction is unpredictable otherwise), and should also be 2 if the register takes a _<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq. Differential Revision: https://reviews.llvm.org/D29828 llvm-svn: 294762
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-161-16/+58
| | | | | | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
* Revert 279703, it caused PR31404.Nico Weber2016-12-161-58/+16
| | | | llvm-svn: 289923
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlySjoerd Meijer2016-12-151-3/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-141-16/+58
| | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289703
* Remove a redundant condition found by PVS-Studio.Chandler Carruth2016-11-031-2/+2
| | | | | | | Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of stuff. llvm-svn: 285945
* Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"James Molloy2016-11-031-133/+4
| | | | | | This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2016-11-031-4/+133
| | | | | | | | | | | | | | | This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893
* [ARM] Predicate UMAAL selection on hasDSP.Sam Parker2016-10-271-1/+2
| | | | | | | | | | | | UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278
* [Thumb] Don't try and emit LDRH/LDRB from the constant poolJames Molloy2016-10-051-0/+1
| | | | | | | | This is not a valid encoding - these instructions cannot do PC-relative addressing. The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it. llvm-svn: 283323
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* ARM: check alignment before transforming ldr -> ldm (or similar).Tim Northover2016-09-191-8/+24
| | | | | | | | | ldm and stm instructions always require 4-byte alignment on the pointer, but we weren't checking this before trying to reduce code-size by replacing a post-indexed load/store with them. Unfortunately, we were also dropping this incormation in DAG ISel too, but that's easy enough to fix. llvm-svn: 281893
* getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-141-2/+2
| | | | llvm-svn: 281495
* Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"James Molloy2016-09-141-133/+4
| | | | | | This reverts commit r281323. It caused chromium test failures and a selfhost failure. llvm-svn: 281451
OpenPOWER on IntegriCloud