bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Reland "[ARM] push LR before __gnu_mcount_nc"	Jian Cai	2019-08-16	1	-0/+41
\| \| \| \| \| \| \| \|	This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173
*	[ARM] Preserve liveness in ARMConstantIslands.	Eli Friedman	2019-08-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	We currently don't use liveness information after this point, but it can be useful to catch bugs using -verify-machineinstrs, and optimizations could potentially use this information in the future. Differential Revision: https://reviews.llvm.org/D66319 llvm-svn: 369162
*	Revert "[ARM] push LR before __gnu_mcount_nc"	Jian Cai	2019-08-16	1	-41/+0
\| \| \| \| \| \|	This reverts commit f4cf3b959333f62b7a7b2d7771f7010c9d8da388. llvm-svn: 369149
*	[ARM] push LR before __gnu_mcount_nc	Jian Cai	2019-08-16	1	-0/+41
\| \| \| \| \| \| \| \| \|	Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147
*	[GlobalISel] CSEMIRBuilder: Add support for G_GEP	Volkan Keles	2019-08-15	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds G_GEP to `shouldCSEOpc` so that it can be CSEd. It also refactors `translateGetElementPtr` by replacing `createGenericVirtualRegister` calls with types. Reviewers: aditya_nandakumar, arsenm, dsanders, paquette, aemerson Reviewed By: aditya_nandakumar Subscribers: wdng, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66316 llvm-svn: 369070
*	GlobalISel: Add more verifier checks for G_SHUFFLE_VECTOR	Matt Arsenault	2019-08-13	1	-0/+11
\| \| \| \|	llvm-svn: 368705
*	GlobalISel: Change representation of shuffle masks	Matt Arsenault	2019-08-13	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently shufflemasks get emitted as any other constant, and you end up with a bunch of virtual registers of G_CONSTANT with a G_BUILD_VECTOR. The AArch64 selector then asserts on anything that doesn't fit this pattern. This isn't an ideal representation, and should avoid legalization and have fewer opportunities for a representational error. Rather than invent a new shuffle mask operand type, similar to what ShuffleVectorSDNode does, just track the original IR Constant mask operand. I don't completely like the idea of adding another link to the IR, but MIR is already quite dependent on IR constants already, and this will allow sharing the shuffle mask utility functions with the IR. llvm-svn: 368704
*	Revert r368276 "[TargetLowering] SimplifyDemandedBits - call ↵	Hans Wennborg	2019-08-13	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660
*	Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"	Hans Wennborg	2019-08-12	6	-21/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It caused assertions to fire when building Chromium: lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed. See https://crbug.com/992871#c3 for how to reproduce. > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368579
*	[globalisel] Add G_SEXT_INREG	Daniel Sanders	2019-08-09	2	-29/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
*	[ARM][ParallelDSP] Replace SExt uses	Sam Parker	2019-08-09	3	-2/+492
\| \| \| \| \| \| \| \| \| \| \|	As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404
*	[MBP] Disable aggressive loop rotate in plain mode	Guozhi Wei	2019-08-08	6	-20/+21
\| \| \| \| \| \| \| \| \| \|	Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339
*	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits ↵	Simon Pilgrim	2019-08-08	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for ISD::EXTRACT_VECTOR_ELT This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368276
*	Reland: Fix and test inter-procedural register allocation for ARM	Oliver Stannard	2019-08-05	1	-0/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an explicit construction of the ArrayRef, gcc 5 and earlier don't seem to select the ArrayRef constructor which takes a C array when the construction is implicit. Original commit message: - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367819
*	Revert Fix and test inter-procedural register allocation for ARM	Douglas Yung	2019-08-02	1	-202/+0
\| \| \| \| \| \| \| \|	This reverts r367669 (git commit f6b00c279a5587a25876752a6ecd8da0bed959dc) This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233 llvm-svn: 367731
*	[IPRA][ARM] Disable no-CSR optimisation for ARM	Oliver Stannard	2019-08-02	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	This optimisation isn't generally profitable for ARM, because we can save/restore many registers in the prologue and epilogue using the PUSH and POP instructions, but mostly use individual LDR/STR instructions for other spills. Differential revision: https://reviews.llvm.org/D64910 llvm-svn: 367670
*	Fix and test inter-procedural register allocation for ARM	Oliver Stannard	2019-08-02	1	-0/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367669
*	[ARM] Regenerate BSWAP16 tests	Simon Pilgrim	2019-08-01	1	-18/+19
\| \| \| \|	llvm-svn: 367543
*	[ARM][ParallelDSP] Convert to function pass	Sam Parker	2019-07-31	3	-3/+81
\| \| \| \| \| \| \| \|	Run across a whole function, visiting each basic block one at a time. Differential Revision: https://reviews.llvm.org/D65324 llvm-svn: 367389
*	[ARM] Regenerate rotation tests	Simon Pilgrim	2019-07-29	1	-5/+8
\| \| \| \|	llvm-svn: 367214
*	Regenerate UXTB tests	Simon Pilgrim	2019-07-27	1	-30/+45
\| \| \| \|	llvm-svn: 367179
*	[TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG ↵	Simon Pilgrim	2019-07-26	1	-8/+7
\| \| \| \| \| \|	support. llvm-svn: 367096
*	[ARM][ParallelDSP] Regenerate multi-use-loads.ll test checks	Simon Pilgrim	2019-07-26	1	-40/+312
\| \| \| \|	llvm-svn: 367094
*	[CodeGen] Don't resolve the stack protector frame accesses until PEI	Francis Visoiu Mistrih	2019-07-25	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, stack protector loads and stores are resolved during LocalStackSlotAllocation (if the pass needs to run). When this is the case, the base register assigned to the frame access is going to be one of the vregs created during LocalStackSlotAllocation. This means that we are keeping a pointer to the stack protector slot, and we're using this pointer to load and store to it. In case register pressure goes up, we may end up spilling this pointer to the stack, which can be a security concern. Instead, leave it to PEI to resolve the frame accesses. In order to do that, we make all stack protector accesses go through frame index operands, then PEI will resolve this using an offset from sp/fp/bp. Differential Revision: https://reviews.llvm.org/D64759 llvm-svn: 367068
*	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold	Roman Lebedev	2019-07-24	2	-1042/+735
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
*	[ARM] Make sure that the constant pool does not keep in the middle of an IT ↵	Simi Pallipurath	2019-07-24	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905
*	[ARM][ParallelDSP] Fix pointer operand reordering	Sam Parker	2019-07-24	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \|	While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881
*	[IPRA][ARM] Make use of the "returned" parameter attribute	Oliver Stannard	2019-07-22	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \|	ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669
*	[ARM] Move MVE VPT block tests into the Thumb2 directory. NFC	David Green	2019-07-21	6	-492/+0
\| \| \| \|	llvm-svn: 366655
*	[MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs.	Kai Luo	2019-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Current PRE hoists common computations into CMBB = DT->findNearestCommonDominator(MBB, MBB1). However, if CMBB is in a hot loop body, we might get performance degradation. Differential Revision: https://reviews.llvm.org/D64394 llvm-svn: 366570
*	Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions	Oliver Stannard	2019-07-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We'd like to remove this whole function, because these are properties of functions, not the target as a whole. These two are easy to remove because they are only used for emitting ARM build attributes, which expects them to represent the defaults for the whole module, not just the last function generated. This is needed to get correct build attributes when using IPRA on ARM, because IPRA causes resetTargetOptions to get called before ARMAsmPrinter::emitAttributes. Differential revision: https://reviews.llvm.org/D64929 llvm-svn: 366562
*	[IPRA] Don't rely on non-exact function definitions	Oliver Stannard	2019-07-19	1	-0/+44
\| \| \| \| \| \| \| \| \|	If a function definition is not exact, then the linker could select a differently-compiled version of it, which could use different registers. https://reviews.llvm.org/D64909 llvm-svn: 366557
*	[ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine	Diogo N. Sampaio	2019-07-18	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PerformVMOVRRDCombine ommits adding a offset of 4 to the PointerInfo, when converting a f64 = load[M] to {i32, i32} = {load[M], load[M + 4]} Which would allow the machine scheduller to break dependencies with the second load. - pr42638 Reviewers: eli.friedman, dmgreen, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64870 llvm-svn: 366423
*	[PEI] Don't re-allocate a pre-allocated stack protector slot	Francis Visoiu Mistrih	2019-07-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LocalStackSlotPass pre-allocates a stack protector and makes sure that it comes before the local variables on the stack. We need to make sure that later during PEI we don't re-allocate a new stack protector slot. If that happens, the new stack protector slot will end up being after the local variables that it should be protecting. Therefore, we would have two slots assigned for two different stack protectors, one at the top of the stack, and one at the bottom. Since PEI will overwrite the assigned slot for the stack protector, the load that is used to compare the value of the stack protector will use the slot assigned by PEI, which is wrong. For this, we need to check if the object is pre-allocated, and re-use that pre-allocated slot. Differential Revision: https://reviews.llvm.org/D64757 llvm-svn: 366371
*	[CodeGen] Add stack protector tests where the guard gets re-assigned	Francis Visoiu Mistrih	2019-07-17	1	-0/+14
\| \| \| \| \| \|	In preparation of a fix, add tests for multiple backends. llvm-svn: 366370
*	[ARM] Adjust how NEON shifts are lowered	David Green	2019-07-15	2	-40/+53
\| \| \| \| \| \| \| \| \| \| \| \|	This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051
*	[ARM][ParallelDSP] Change the search for smlads	Sam Parker	2019-07-11	2	-1/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two functional changes have been made here: - Now search up from any add instruction to find the chains of operations that we may turn into a smlad. This allows the generation of a smlad which doesn't accumulate into a phi. - The search function has been corrected to stop it falsely searching up through an invalid path. The bulk of the changes have been making the Reduction struct a class and making it more C++y with getters and setters. Differential Revision: https://reviews.llvm.org/D61780 llvm-svn: 365740
*	Move three folds for FADD, FSUB and FMUL in the DAG combiner away from ↵	Michael Berg	2019-07-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unsafe to more aligned checks that reflect context Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm Reviewed By: arsenm Subscribers: michele.scandale, wdng, javed.absar Differential Revision: https://reviews.llvm.org/D64450 llvm-svn: 365679
*	[ARM] Add support for MSVC stack cookie checking	Martin Storsjo	2019-07-07	1	-0/+20
\| \| \| \| \| \| \| \|	Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283
*	RegUsageInfoCollector: Skip AMDGPU entry point functions	Matt Arsenault	2019-07-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure if it's worth it or not to add a hook to disable the pass for an arbitrary function. This pass is taking up to 5% of compile time in tiny programs by iterating through all of the physical registers in every register class. This pass should be rewritten in terms of regunits. For now, skip doing anything for entry point functions. The vast majority of functions in the real world aren't callable, so just not running this will give the majority of the benefit. llvm-svn: 365255
*	[ARM] Favour PL/MI over GE/LT when possible	David Green	2019-07-04	7	-193/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The arm condition codes for GE is N==V (and for LT is N!=V). If the source of flags cannot set V (overflow), such as a cmp against #0, then we can use the simpler PL and MI conditions that only check N. As these PL/MI conditions are simpler than GE/LT, other passes like the peephole optimiser can have a better time optimising away the redundant CMPs. The exception is the VSEL instruction, which cannot take the PL code, so there the transform favours GE. Differential Revision: https://reviews.llvm.org/D64160 llvm-svn: 365117
*	[ARM] Added testing for D64160. NFC	David Green	2019-07-04	4	-156/+437
\| \| \| \| \| \| \|	Adds some extra vsel testing and regenerates long shift and saturation bitop tests. llvm-svn: 365116
*	[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize	Oliver Stannard	2019-07-03	2	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like: R0-R3, R12, LR, R4-R11 As a result, a lot of instructs that use R12/LR will be wide instrs. This patch changes the allocation order to: R0-R7, R12, LR, R8-R11 for thumb2 and -Osize. In most cases, there is no extra push/pop instrs as they will be folded into existing ones. There might be slight performance impact due to more stack usage, so we only enable it when opt for min size. https://reviews.llvm.org/D30324 llvm-svn: 365014
*	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)	Roman Lebedev	2019-07-03	2	-265/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010
*	[SCEV][LSR] Prevent using undefined value in binops	Eugene Leviant	2019-07-03	1	-0/+251
\| \| \| \| \| \| \| \| \| \| \|	On some occasions ReuseOrCreateCast may convert previously expanded value to undefined. That value may be passed by SCEVExpander as an argument to InsertBinop making IV chain undefined. Differential revision: https://reviews.llvm.org/D63928 llvm-svn: 365009
*	CodeGen: Set hasSideEffects = 0 on BUNDLE	Matt Arsenault	2019-07-03	1	-14/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984
*	[NFC][Codegen][X86][AArch64][ARM][PowerPC] Recommit: Add test coverage for ↵	Roman Lebedev	2019-07-02	2	-0/+1103
\| \| \| \| \| \| \| \| \| \| \|	"add-of-inc" vs "sub-of-not" I initially committed it with --check-prefix instead of --check-prefixes (again, shame on me, and utils/update_*.py not complaining!) and did not have a moment to understand the failure, so i reverted it initially in rL64939. llvm-svn: 364945
*	Revert "[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for ↵	Roman Lebedev	2019-07-02	2	-1103/+0
\| \| \| \| \| \| \| \| \| \|	"add-of-inc" vs "sub-of-not"" Some test failures i don't have a moment to investigate. This reverts commit r364930. llvm-svn: 364939
*	[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for ↵	Roman Lebedev	2019-07-02	2	-0/+1103
\| \| \| \| \| \| \| \| \| \|	"add-of-inc" vs "sub-of-not" As it is pointed out in https://reviews.llvm.org/D63992, before we get to pick canonical variant in middle-end we should ensure best codegen in backend. llvm-svn: 364930
*	[ARM] Stop using scalar FP instructions in integer-only MVE mode.	Simon Tatham	2019-07-02	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you compile with `-mattr=+mve` (enabling integer MVE instructions but not floating-point ones), then the scalar FP //registers// exist and it's legal to move things in and out of them, load and store them, but it's not legal to do arithmetic on them. In D60708, the calls to `addRegisterClass` in ARMISelLowering that enable use of the scalar FP registers became conditionalised on `Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so that loads, stores and moves of those registers would work. But I didn't realise that that would also enable all the operations on those types by default. Now, if the target doesn't have basic VFP, we follow up those `addRegisterClass` calls by turning back off all the nontrivial operations you can perform on f32 and f64. That causes several knock-on failures, which are fixed by allowing the `VMOVDcc` and `VMOVScc` instructions to be selected even if all you have is `HasFPRegs`, and adjusting several checks for 'is this a double in a single-precision-only world?' to the more general 'is this any FP type we can't do arithmetic on?'. Between those, the whole of the `float-ops.ll` and `fp16-instructions.ll` tests can now run in MVE-without-FP mode and generate correct-looking code. One odd side effect is that I had to relax the check lines in that test so that they permit test functions like `add_f` to be generated as tailcalls to software FP library functions, instead of ordinary calls. Doing that is entirely legal, but the mystery is why this is the first RUN line that's needed the relaxation: on the usual kind of non-FP target, no tailcalls ever seem to be generated. Going by the llc messages, I think `SoftenFloatResult` must be perturbing the code generation in some way, but that's as much as I can guess. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63938 llvm-svn: 364909