bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM][LowOverheadLoops] Allow all MVE instrs.	Sam Parker	2020-01-14	2	-0/+494
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead. This also fixes a previous test failure. Differential Revision: https://reviews.llvm.org/D72509
*	[ARM][LowOverheadLoops] Change predicate inspection	Sam Parker	2020-01-14	1	-0/+230
\| \| \| \| \| \| \| \| \| \|	Use the already provided helper function to get the operand type so that we can detect whether the vpr is being used as a predicate or not. Also use existing helpers to get the predicate indices when we converting the vpt blocks. This enables us to support both types of vpr predicate operand. Differential Revision: https://reviews.llvm.org/D72504
*	[ARM][MVE] Disallow VPSEL for tail predication	Sam Parker	2020-01-14	5	-0/+1183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the current way that we collect predicated instructions, we can't easily handle vpsel in tail predicated loops. There are a couple of issues: 1) It will use the VPR as a predicate operand, but doesn't have to be instead a VPT block, which means we can assert while building up the VPT block because we don't find another VPST to being a new one. 2) VPSEL still requires a VPR operand even after tail predicating, which means we can't remove it unless there is another instruction, such as vcmp, that can provide the VPR def. The first issue should be a relatively simple fix in the logic of the LowOverheadLoops pass, whereas the second will require us to represent the 'implicit' tail predication with an explicit value. Differential Revision: https://reviews.llvm.org/D72629
*	[ARM][MVE] MVE-I should not be disabled by -mfpu=none	Momchil Velikov	2020-01-09	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843
*	Revert "[ARM][LowOverheadLoops] Update liveness info"	Sam Parker	2020-01-09	13	-103/+108
\| \| \| \| \| \| \|	This reverts commit e93e0d413f3afa1df5c5f88df546bebcd1183155. There's some ordering problems on some on the buildbots which needs investigating.
*	[ARM][LowOverheadLoops] Update liveness info	Sam Parker	2020-01-09	13	-108/+103
\| \| \| \| \| \| \| \|	After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131
*	[NFC][ARM] Update tests	Sam Parker	2020-01-08	6	-142/+283
\| \| \| \|	Run the update_mir_test on some of the low-overhead loop tests.
*	[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold ↵	Roman Lebedev	2020-01-03	3	-33/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	[ARM][NFC] Update MIR test	Sam Parker	2020-01-03	1	-23/+40
\|
*	[ARM] Add +mve feature to mve tests. NFC	David Green	2020-01-01	3	-3/+3
\|
*	[ARM] Sink splat to ICmp	David Green	2019-12-30	2	-146/+151
\| \| \| \| \| \| \| \| \|	This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
*	[ARM][MVE] Fixes for tail predication.	Sam Parker	2019-12-20	4	-0/+540
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Fix an issue with the incorrect value being used for the number of elements being passed to [d\|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp. Differential Revision: https://reviews.llvm.org/D71609
*	[ARM][MVE] Tail predicate in the presence of vcmp	Sam Parker	2019-12-20	4	-82/+981
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in. The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst. Differential Revision: https://reviews.llvm.org/D71107
*	[ARM][LowOverheadLoops] Remove dead loop update instructions.	Sjoerd Meijer	2019-12-11	6	-6/+516
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007
*	[ARM] Enable MVE masked loads and stores	David Green	2019-12-09	4	-4/+4
\| \| \| \| \| \| \|	With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968
*	[MBP] Avoid tail duplication if it can't bring benefit	Guozhi Wei	2019-12-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead. To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks: make sure there is at least one duplication in current work set. the number of duplication should not exceed the number of successors. The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain. Differential Revision: https://reviews.llvm.org/D64376
*	[ARM,MVE] Rename and clean up VCTP IR intrinsics.	Simon Tatham	2019-12-02	5	-63/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592
*	[ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from ↵	Mark Murray	2019-11-27	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	VMUL* instructions. Add unit tests. Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70546
*	[Codegen][ARM] Add addressing modes from masked loads and stores	David Green	2019-11-26	4	-80/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
*	[ARM][ReachingDefs] Remove dead code in loloops.	Sam Parker	2019-11-26	4	-124/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register. For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations. Differential Revision: https://reviews.llvm.org/D70240
*	[ARM][ReachingDefs] RDA in LoLoops	Sam Parker	2019-11-26	2	-0/+305
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFromId, return a MachineInstr for which the given integer corresponds to. - hasSameReachingDef, return whether two MachineInstr use the same def of a register. - isRegUsedAfter, return whether a register is used after a given MachineInstr. These methods have been used in ARMLowOverhead to replace searching for uses/defs. Differential Revision: https://reviews.llvm.org/D70009
*	[ARM][ConstantIslands] Correct block size update	Sam Parker	2019-11-26	1	-0/+451
\| \| \| \| \| \| \| \| \|	When inserting a non-decrementing LE, the basic block was being resized to take into consideration that a tCMP and tBcc had been combined into one T1 instruction. This is not true in the LE case where we generate a CBN?Z and an LE. Differential Revision: https://reviews.llvm.org/D70536
*	[ARM][MVE] Enable narrow vectors for tail pred	Sam Parker	2019-11-19	3	-266/+837
\| \| \| \| \| \| \| \|	Remove the restriction, from the mve tail predication pass, that the all masked vectors instructions need to be 128-bits. This allows us to supported extending loads and truncating stores. Differential Revision: https://reviews.llvm.org/D69946
*	[ARM][MVE] Tail predication conversion	Sam Parker	2019-11-19	3	-52/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion, with the following restrictions: - Operates only on single block loops. - The loop can only contain a single vctp instruction. - No other instructions can write to the vpr. - We only allow a subset of the mve instructions in the loop. TODO: Pass the number of elements, not the number of iterations to dlstp/wlstp. Differential Revision: https://reviews.llvm.org/D69945
*	[ARM] Use isFMAFasterThanFMulAndFAdd for MVE	David Green	2019-11-04	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd, where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually available for scalar code. For MVE we don't have the non-fused version though. It makes more sense for isFMAFasterThanFMulAndFAdd to return true, allowing us to simplify some of the existing ISel patterns. The tests here are that non of the existing tests failed, and so we are still selecting VFMA and VFMS. The one test that changed shows we can now select from fast math flags, as opposed to just relying on the isFMADLegalForFAddFSub option. Differential Revision: https://reviews.llvm.org/D69115
*	[ARM][MVE] Change VPST to use, not def, VPR	Sam Parker	2019-10-17	3	-65/+153
\| \| \| \| \| \| \| \|	Unlike VPT, VPST just uses the current value of VPR.P0. Differential Revision: https://reviews.llvm.org/D69037 llvm-svn: 375087
*	[DAGCombine][ARM] Enable extending masked loads	Sam Parker	2019-10-17	1	-618/+202
\| \| \| \| \| \| \| \| \| \| \|	Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
*	[NFC][ARM][MVE] More tests	Sam Parker	2019-10-01	1	-0/+602
\| \| \| \| \| \|	Add some tail predication tests with fast math. llvm-svn: 373331
*	[NFC][ARM][MVE] More tests	Sam Parker	2019-09-30	1	-0/+2014
\| \| \| \| \| \|	Add some loop tests that cover different float operations and types. llvm-svn: 373192
*	[ARM][MVE] Change VCTP operand	Sam Parker	2019-09-30	8	-46/+63
\| \| \| \| \| \| \| \| \| \| \| \|	The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188
*	[NFC][ARM] Add some tail-predication tests	Sam Parker	2019-09-27	1	-0/+1757
\| \| \| \| \| \|	Use different data types for some simple loops. llvm-svn: 373064
*	[ARM][MVE] Remove old tail predicates	Sam Parker	2019-09-23	3	-0/+609
\| \| \| \| \| \| \| \| \| \| \|	Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567
*	[ARM][LowOverheadLoops] Use subs during revert.	Sam Parker	2019-09-23	4	-11/+10
\| \| \| \| \| \| \| \| \| \|	Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560
*	[ARM][LowOverheadLoops] Use tBcc when reverting	Sam Parker	2019-09-23	4	-5/+5
\| \| \| \| \| \| \| \| \|	Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557
*	[ARM] Fix for buildbots	Sam Parker	2019-09-19	1	-26/+24
\| \| \| \| \| \|	I had missed that massive.mir also needed updating. llvm-svn: 372303
*	[ARM] Fix for buildbots	Sam Parker	2019-09-17	4	-51/+97
\| \| \| \| \| \| \|	Add --verifymachineinstrs and update the remaining low overhead loop tests. llvm-svn: 372121
*	[ARM][LowOverheadLoops] Add LR def safety check	Sam Parker	2019-09-17	14	-261/+599
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong! So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible. Differential Revision: https://reviews.llvm.org/D67539 llvm-svn: 372111
*	[ARM] LE support in ConstantIslands	Sam Parker	2019-09-17	4	-0/+744
\| \| \| \| \| \| \| \| \| \| \| \| \|	The low-overhead branch extension provides a loop-end 'LE' instruction that performs no decrement nor compare, it just jumps backwards. This patch modifies the constant islands pass to try to insert LE instructions in place of a Thumb2 conditional branch, instead of shrinking it. This only happens if a cmp can be converted to a cbn/z and used to exit the loop. Differential Revision: https://reviews.llvm.org/D67404 llvm-svn: 372085
*	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes ↵	Guillaume Chatelet	2019-09-11	15	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608
*	[ARM] MVE Tail Predication	Sam Parker	2019-09-06	7	-0/+1505
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
*	[ARM][LowOverheadLoops] Fix generated code for "revert".	Eli Friedman	2019-08-15	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two issues: 1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR. My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier. Differential Revision: https://reviews.llvm.org/D66243 llvm-svn: 369069
*	[ARM][LowOverheadLoops] Revert after read/write	Sam Parker	2019-08-07	2	-0/+256
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130
*	[ARM][LowOverheadLoops] Revert non-header LE target	Sam Parker	2019-07-30	1	-0/+255
\| \| \| \| \| \| \| \| \|	Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296
*	[ARM][LowOverheadLoops] Add CPSR defs	Sam Parker	2019-07-26	12	-496/+417
\| \| \| \| \| \| \| \| \| \|	Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089
*	[ARM][LowOverheadLoops] Fix branch target codegen	Sam Parker	2019-07-23	1	-0/+513
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809
*	[ARM][LowOverheadLoops] Revert remaining pseudos	Sam Parker	2019-07-22	1	-0/+170
\| \| \| \| \| \| \| \| \| \| \|	ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691
*	[ARM][LowOverheadLoops] Correct offset checking	Sam Parker	2019-07-11	3	-2/+425
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting. llvm-svn: 365749
*	[ARM] WLS/LE Code Generation	Sam Parker	2019-07-01	10	-0/+1507
	Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733