summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/ARM
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] Add patterns for CTLZ on MVEOliver Cruickshank2019-09-162-0/+10
| | | | | | | CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999
* [ARM] Fold VCMP into VPTDavid Green2019-09-162-18/+118
| | | | | | | | | | | | | | | MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982
* [ARM] Masked loads and storesDavid Green2019-09-154-0/+137
| | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
* [ARM] Add earlyclobber for cross beat MVE instructionsDavid Green2019-09-131-40/+39
| | | | | | | | | | | | | | | | | | | | | | | | | rL367544 added @earlyclobbers for the MVE VREV64 instruction. This adds the same for a number of other 32bit instructions that are similarly unpredictable if the destination equals the source (due to the cross beat nature of the instructions). This includes: VCADD.f32 VCADD.i32 VCMUL.f32 VHCADD.s32 VMULLT/B.s/u32 VQDMLADH{X}.s32 VQRDMLADH{X}.s32 VQDMLSDH{X}.s32 VQRDMLSDH{X}.s32 VQDMULLT/B.s32 with Qm and Rm No tests here as this would require intrinsics (or very interesting codegen) to manifest. The tests will follow naturally as the intrinsics are added. Differential Revision: https://reviews.llvm.org/D67462 llvm-svn: 371838
* [ARM] Add support for MVE vmaxv and vminvSam Tebbs2019-09-133-2/+35
| | | | | | | | This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827
* [Alignment] Move OffsetToAlignment to Alignment.hGuillaume Chatelet2019-09-121-4/+4
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67499 llvm-svn: 371742
* [Alignment][NFC] use llvm::Align for AsmPrinter::EmitAlignmentGuillaume Chatelet2019-09-111-7/+7
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dschuff, sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67443 llvm-svn: 371616
* [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes ↵Guillaume Chatelet2019-09-112-6/+7
| | | | | | | | | | | | | | | | | | | | | | mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608
* [Alignment] Use Align for TargetLowering::MinStackArgumentAlignmentGuillaume Chatelet2019-09-101-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67288 llvm-svn: 371498
* [ARM] Fix loads and stores for predicate vectorsDavid Green2019-09-092-18/+65
| | | | | | | | | | | | | | | | | | | | | | | | These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419
* [ARM] Remove some spurious MVE reduction instructions.Simon Tatham2019-09-091-79/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The family of 'dual-accumulating' vector multiply-add instructions (VMLADAV, VMLALDAV and VRMLALDAVH) can all operate on both signed and unsigned integer types, and they all have an 'exchange' variant (with an X in the name) that modifies which pairs of vector lanes in the two inputs are multiplied together. But there's a clause in the spec that says that the X variants //don't// operate on unsigned integer types, only signed. You can have X, or unsigned, or neither, but not both. We didn't notice that clause when we implemented the MC support for these instructions, so LLVM believes that things like VMLADAVX.U8 do exist, contradicting the spec. Here I fix that by conditioning them out in Tablegen. In order to do that, I've reversed the nesting order of the Tablegen multiclasses for those instructions. Previously, the innermost multiclass generated the X and not-X variants, and the one outside that generated the A and not-A variants. Now X is done by the outer multiclass, which allows me to bypass that one when I only want the two not-X variants. Changing the multiclass nesting order also changes the names of the instruction ids unless I make a special effort not to. I decided that while I was changing them anyway I'd make them look nicer; so now the instructions have names like MVE_VMLADAVs32 or MVE_VMLADAVaxs32, instead of cumbersome _noacc_noexch suffixes. The corresponding multiply-subtract instructions are unaffected. Those don't accept unsigned types at all, either in the spec or in LLVM. Reviewers: ostannard, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67214 llvm-svn: 371405
* [ARM][MVE] VCTP instruction selectionSam Parker2019-09-091-0/+9
| | | | | | | | Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395
* [ARM] Prevent generating NEON stack accesses under MVE.David Green2019-09-091-4/+8
| | | | | | | | | | | | We should not be generating Neon stack loads/stores even for these large registers. No test here because my understanding is we will only generate these QQPR regs for intrinsics and VLDn's. The tests will follow once those are available. Differential revision: https://reviews.llvm.org/D67169 llvm-svn: 371386
* [ARM][MVE] Decoding of uqrshl and sqrshl accepts unpredictable encodingsOliver Stannard2019-09-092-0/+8
| | | | | | | | | | Specify the Unpredictable bits, and return softfails when appropriate. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66939 llvm-svn: 371374
* [ARM][ParallelDSP] Fix for sext inputSam Parker2019-09-091-3/+9
| | | | | | | | | | The incoming accumulator value can be discovered through a sext, in which case there will be a mismatch between the input and the result. So sign extend the accumulator input if we're performing a 64-bit mac. Differential Revision: https://reviews.llvm.org/D67220 llvm-svn: 371370
* [ARM] Remove declaration of unimplemented function. NFC.David Green2019-09-081-2/+0
| | | | llvm-svn: 371331
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.Simon Pilgrim2019-09-071-1/+1
| | | | llvm-svn: 371302
* Change TargetLibraryInfo analysis passes to always require FunctionTeresa Johnson2019-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
* [ARM] Add patterns for VSUB with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231
* [ARM] Add patterns for VADD with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230
* [ARM] Add patterns for VMUL with q and r registersOliver Cruickshank2019-09-061-0/+9
| | | | | | | Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229
* [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selectionSam Tebbs2019-09-061-10/+48
| | | | | | | | | | | | This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218
* [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignmentGuillaume Chatelet2019-09-061-1/+2
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210
* [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignmentGuillaume Chatelet2019-09-061-1/+2
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200
* [ARM] MVE Tail PredicationSam Parker2019-09-064-1/+476
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
* [ARM] Add support for the s,j,x,N,O inline asm constraintsDavid Candler2019-09-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079
* [ARM] Fixup the creation of VPT blocksDavid Green2019-09-051-15/+20
| | | | | | | | | | This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064
* [LLVM][Alignment] Make functions using log of alignment explicitGuillaume Chatelet2019-09-056-29/+28
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
* [ARM][ParallelDSP] SExt mul for accumulationSam Parker2019-09-041-5/+14
| | | | | | | | | | For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851
* [GlobalISel][CallLowering] Add support for splitting types according to ↵Amara Emerson2019-09-031-5/+6
| | | | | | | | | | | | | | calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
* [MC] Pass through .code16/32/64 and .syntax unified for COFFReid Kleckner2019-09-031-10/+0
| | | | | | | | | | | | | | These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805
* [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operandsDavid Green2019-09-031-2/+2
| | | | | | | | | | | | | | The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745
* [ARM] Invert CSEL predicates if the opposite is a simpler constant to ↵David Green2019-09-034-30/+75
| | | | | | | | | | | | | | | | | | | | materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741
* [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.David Green2019-09-036-1/+92
| | | | | | | | | | | | Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739
* [ARM] Fix MVE ldst offset rangesDavid Green2019-09-031-19/+18
| | | | | | | | | | | | | | | We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731
* [ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodingsOliver Stannard2019-09-031-25/+29
| | | | | | | | | | | | | | | | Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729
* Bug fix on function epilog optimization (ARM backend)Oliver Stannard2019-09-031-2/+3
| | | | | | | | | | | | | | | To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728
* [ARM] Select vmlaSam Tebbs2019-09-031-0/+15
| | | | | | | | This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704
* [ARM] Use MQPR not QPR for MVE registersDavid Green2019-09-023-96/+98
| | | | | | | | | We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676
* [ARM] Remove MVE masked loads/storesDavid Green2019-09-013-127/+0
| | | | | | | | | These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607
* [ARM] MVE Masked loads and storesDavid Green2019-08-293-0/+127
| | | | | | | | | | | | | | | | | | | | | | | | Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329
* [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCallShiva Chen2019-08-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult | (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275
* [TargetLowering] Add buildLegalVectorShuffle facility to help build legal ↵Amaury Sechet2019-08-281-5/+4
| | | | | | | | | | | | | | | | shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190
* [ARM] Move MVEVPTBlockPass to a separate file. NFCDavid Green2019-08-283-143/+173
| | | | | | | | | This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187
* [MVE] VMOVX patternsDavid Green2019-08-281-2/+6
| | | | | | | | | | | | | | | This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184
* [ARM][ParallelDSP] Change search for mulsSam Parker2019-08-281-166/+185
| | | | | | | | | | | | | | | | | | | | rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171
* [MC] Minor cleanup to MCFixup::Kind handling. NFC.Sam Clegg2019-08-233-6/+6
| | | | | | | | | | Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720
* Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit cd53ff6, reapplying 7c6b229. llvm-svn: 369638
* Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"Hans Wennborg2019-08-221-7/+6
| | | | | | | | | | | | | | | | | It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636
* [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32Sam Tebbs2019-08-221-6/+7
| | | | | | | | | | | | | This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626
OpenPOWER on IntegriCloud