bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MC] Add parameter `Address` to MCInstPrinter::printInst	Fangrui Song	2020-01-06	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172
*	[ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering	David Green	2020-01-06	1	-2/+4
\| \| \| \| \| \| \| \| \|	The segmented stack lowering code appears to be using ARM opcodes under Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR seems wrong. Either way, using the correct thumb vs arm opcodes is more correct. Differential Revision: https://reviews.llvm.org/D72074
*	[ARM] Use correct TRAP opcode for thumb in FastISel	David Green	2020-01-06	1	-2/+6
\| \| \| \| \| \| \| \| \|	We were previously unconditionally using the ARM::TRAP opcode, even under Thumb. My understanding is that these are essentially the same thing (they both result in a trap under Thumb), but the ARM::TRAP opcode is marked as requiring IsARM, so it is more correct to use ARM::tTRAP. Differential Revision: https://reviews.llvm.org/D72075
*	[ARM,MVE] Fix many signedness errors in MVE intrinsics.	Simon Tatham	2020-01-06	1	-28/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270
*	[ARM,MVE] Generate the right instruction for vmaxnmq_m_f16.	Simon Tatham	2020-01-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Due to a copy-paste error in the isel patterns, the predicated version of this intrinsic was expanding to the `VMAXNMT.F32` instruction instead of `VMAXNMT.F16`. Similarly for vminnm. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72269
*	[NFC] Fix trivial typos in comments	James Henderson	2020-01-06	5	-5/+5
\| \| \| \| \| \| \| \|	Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
*	[ARM][MVE] More MVETailPredication debug messages. NFC.	Sjoerd Meijer	2020-01-06	2	-65/+96
\| \| \| \| \| \| \| \| \| \|	I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops. Differential Revision: https://reviews.llvm.org/D71549
*	[MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ↵	Fangrui Song	2020-01-05	1	-2/+5
\| \| \| \| \| \| \| \|	ARMELFObjectWriter::addTargetSectionFlags This simplifies the generic interface and also makes SHF_ARM_PURECODE more robust (fixes a TODO). Inspecting MCDataFragment contents covers more cases than MCObjectStreamer::EmitBytes.
*	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors	David Green	2020-01-05	5	-18/+45
\| \| \| \| \| \| \| \| \| \| \|	This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139
*	[ARM] Fill in FP16 FMA patterns	David Green	2020-01-05	1	-0/+21
\| \| \| \| \| \|	This adds fp16 variants of all the fma patterns in the ARM backend. Differential Revision: https://reviews.llvm.org/D72138
*	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."	Florian Hahn	2020-01-04	1	-1/+1
\| \| \| \| \|	This reverts commit 51ef53f3bd23559203fe9af82ff2facbfedc1db3, as it breaks some bots.
*	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).	Florian Hahn	2020-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
*	GlobalISel: Add type argument to getRegBankFromRegClass	Matt Arsenault	2020-01-03	2	-4/+5
\| \| \| \| \| \|	AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
*	Move tail call disabling code to target independent code	Reid Kleckner	2020-01-03	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
*	[ARM][NFC] Move tail predication checks	Sam Parker	2020-01-03	1	-69/+76
\| \| \| \| \|	Extract the tail predication validation checks out into their own LowOverHeadLoop method.
*	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵	QingShan Zhang	2020-01-03	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
*	DAG: Use TargetConstant for FENCE operands	Matt Arsenault	2020-01-02	1	-1/+1
\|
*	[ARM][Thumb][FIX] Add unwinding information to t4	Diogo Sampaio	2019-12-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add missing part of patch D71361. Now that the stack-frame can be operated using a addw/subw instruction, they should appear in the unwinding list. Reviewers: dmgreen, efriedma Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72000
*	[ARM] Sink splat to ICmp	David Green	2019-12-30	2	-2/+3
\| \| \| \| \| \| \| \| \|	This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
*	[ARM][THUMB2] Allow emitting T3 types of add and sub	Diogo Sampaio	2019-12-30	1	-42/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch allows to emit thumb2 add and sub instructions with 12 bit immediates in the emitT2RegPlusImmediate function. - Splitting parts of the D70680 Reviewers: eli.friedman, olista01, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71361
*	[SelectionDAG] Disallow indirect "i" constraint	Fangrui Song	2019-12-29	1	-4/+0
\| \| \| \| \| \| \| \| \|	This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.
*	[ARM] [Windows] Use COFF stubs for calls to extern_weak functions	Martin Storsjö	2019-12-23	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71720
*	Revert "[ARM] Improve codegen of volatile load/store of i64"	Victor Campos	2019-12-20	6	-158/+6
\| \| \| \|	This reverts commit bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce.
*	[ARM][MVE] Fixes for tail predication.	Sam Parker	2019-12-20	3	-12/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Fix an issue with the incorrect value being used for the number of elements being passed to [d\|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp. Differential Revision: https://reviews.llvm.org/D71609
*	[ARM][MVE] Tail predicate in the presence of vcmp	Sam Parker	2019-12-20	3	-76/+270
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in. The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst. Differential Revision: https://reviews.llvm.org/D71107
*	[ARM][MVE] Tail predicate bottom/top muls.	Sam Parker	2019-12-20	1	-0/+3
\| \| \| \| \| \|	Add VMULL and VQDMULL variants to our tail predication white list. Differential Revision: https://reviews.llvm.org/D71465
*	[ARM] Improve codegen of volatile load/store of i64	Victor Campos	2019-12-19	6	-6/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
*	[NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]	Anna Welker	2019-12-18	1	-0/+4
\| \| \| \| \| \| \|	Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610
*	[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.	Sjoerd Meijer	2019-12-16	3	-108/+118
\| \| \| \| \| \| \| \|	In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcodes of MVE instructions. This moves all these utility functions to ARMBaseInstrInfo. Diferential Revision: https://reviews.llvm.org/D71426
*	[ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm	Momchil Velikov	2019-12-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for `VLLDM` and `VLSTM`, which are currently defined with itineraries having a dynamic count of micro-ops. Assuming an optimistic case in which these instruction do not actually perform loads or stores, and with the idea that Armv8-m cores are supposed to use the new style scheduling models, this patch just sets the itinerary for those two instructions to `NoItinerary`. Differential Revision: https://reviews.llvm.org/D71266
*	[ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds ↵	Fangrui Song	2019-12-13	1	-3/+4
\| \| \| \|	after D71062
*	[ARM][MVE] Make VPT invalid for tail predication	Sam Parker	2019-12-13	1	-3/+0
\| \| \| \| \| \| \| \| \|	We've been marking VPT incompatible instructions as invalid for tail predication too, though this may not strictly be true. VPT are incompatible and, unless its the first predicate def in a loop, they shouldn't be compatible for tail predication either. Differential Revision: https://reviews.llvm.org/D71410
*	[ARM][MVE] Add vector reduction intrinsics with two vector operands	Mikhail Maltsev	2019-12-13	2	-40/+251
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062
*	[ARM][MVE] Add intrinsics for more immediate shifts.	Simon Tatham	2019-12-13	3	-73/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458
*	[ARM] Add custom strict fp conversion lowering when non-strict is custom	John Brawn	2019-12-13	1	-33/+64
\| \| \| \| \| \| \| \| \| \| \| \|	We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127
*	[NFC] Use EVT instead of bool for getSetCCInverse()	Alex Richardson	2019-12-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917
*	Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."	Sjoerd Meijer	2019-12-13	1	-28/+30
\| \| \| \| \| \| \| \|	This reverts commit 9468e3334ba54fbb1b209aaec662d7375451fa1f. There's a test that doesn't like this change. The RDA analysis gets invalided by changes in the block, which is not taken into account. Revert while I work on a fix for this.
*	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics.	Mark Murray	2019-12-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421
*	[ARM][MVE] findVCMPToFoldIntoVPS. NFC.	Sjoerd Meijer	2019-12-12	1	-30/+28
\| \| \| \| \| \| \|	This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can reimplement findVCMPToFoldIntoVPS with just a few calls to RDA. Differential Revision: https://reviews.llvm.org/D71330
*	[ARM][MVE] Sink vector shift operand	Sam Parker	2019-12-12	1	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommit e0b966643fc2. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841
*	Revert "[ARM][MVE] Sink vector shift operand"	Sam Parker	2019-12-12	2	-28/+3
\| \| \| \| \| \|	This reverts commit e0b966643fc2030442ffbae9b677247be697673b. Instruction selection is failing with expensive checks.
*	[ARM][MVE] Sink vector shift operand	Sam Parker	2019-12-12	2	-3/+28
\| \| \| \| \| \| \| \|	The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841
*	[IR] Split out target specific intrinsic enums into separate headers	Reid Kleckner	2019-12-11	5	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
*	Rename TTI::getIntImmCost for instructions and intrinsics	Reid Kleckner	2019-12-11	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381
*	[ARM][LowOverheadLoops] Remove dead loop update instructions.	Sjoerd Meijer	2019-12-11	1	-2/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007
*	[ARM][MVE] Add intrinsics for immediate shifts. (reland)	Simon Tatham	2019-12-11	1	-20/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065
*	[ARM][MVE] Refactor complex vector intrinsics [NFCI]	Mikhail Maltsev	2019-12-10	2	-197/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245
*	Revert "[ARM][MVE] Add intrinsics for immediate shifts."	Eric Christopher	2019-12-09	1	-32/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: 8d70f3c933a5b81a87a5ab1af0e3e98ee2cd7c67 ff4dceef9201c5ae3924e92f6955977f243ac71d d97b3e3e65cd77a81b39732af84a1a4229e95091
*	[ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, ↵	Mark Murray	2019-12-09	1	-68/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	VQDMULHQ, VQRDMULHQ intrinsics. Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen, miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71198
*	[ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT\|POLY) intrinsics.	Mark Murray	2019-12-09	2	-38/+96
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add VMULL[BT]Q_(INT\|POLY) intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71066