bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits ↵	Simon Pilgrim	2020-01-04	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887
*	Move tail call disabling code to target independent code	Reid Kleckner	2020-01-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
*	[AArch64][test] Merge arm64-$i.ll Linux tests into $i.ll	Fangrui Song	2020-01-03	9	-1694/+64
\| \| \| \| \| \|	Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D72061
*	[DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448)	Roman Lebedev	2020-01-03	2	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)' Even if we don't manage to fold `~` into B, we have likely formed `ANDN` node. Also, this way there's less similar-but-duplicate folds. Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	[NFC][X86][AArch64] Add 'A - (A & B)' pattern tests (PR44448)	Roman Lebedev	2020-01-03	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)' Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	[DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)	Roman Lebedev	2020-01-03	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	[NFC][DAGCombine][X86][AArch64] Tests for 'A - (A & (B - 1))' pattern (PR44448)	Roman Lebedev	2020-01-03	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask The main motivational pattern involes pointer-typed values, so this transform can't really be done in middle-end. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	Change dbg-*-tag-offset tests to use llvm-dwarfdump.	Evgenii Stepanov	2020-01-02	2	-19/+18
\| \| \| \| \| \| \| \| \| \|	Reviewers: dblaikie Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72023
*	[AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32	Andrzej Warzynski	2020-01-02	2	-204/+204
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724
*	DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nsz	Matt Arsenault	2019-12-31	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was increasing the number of instructions when fsub was legalized on AMDGPU with no signed zeros enabled. This fold should be guarded by hasOneUse, and I don't think getNode should be doing that. The same fold is already done as a regular combine through isNegatibleForFree. This does require duplicating, even though isNegatibleForFree does this combine already (and properly checks hasOneUse) to avoid one PPC regression. In the regression, the outer fneg has nsz but the fsub operand does not. isNegatibleForFree only sees the operand, and doesn't see it's used from a nsz context. A nsz parameter needs to be added and threaded through isNegatibleForFree to avoid this.
*	[AArch64] add test for fsub+fneg; NFC	Sanjay Patel	2019-12-31	1	-0/+16
\| \| \| \|	D72015 proposes to restrict the current behavior.
*	Migrate function attribute "no-frame-pointer-elim"="false" to ↵	Fangrui Song	2019-12-24	11	-18/+18
\| \| \| \|	"frame-pointer"="none" as cleanups after D56351
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	30	-34/+34
\| \| \| \|	as cleanups after D56351
*	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵	Sanjay Patel	2019-12-23	1	-16/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
*	[AArch64] [Windows] Use COFF stubs for calls to extern_weak functions	Martin Storsjö	2019-12-23	3	-17/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71721
*	[AArch64] match splat of bitcasted extract subvector to DUPLANE	Sanjay Patel	2019-12-22	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672
*	[AArch64] Respect reserved registers while renaming in LdSt opt.	Florian Hahn	2019-12-21	1	-0/+89
\| \| \| \| \| \|	We cannot pick reserved registers as rename registers. Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
*	[AArch64][SVE] Replace integer immediate intrinsics with splat vector variant	Danilo Carvalho Grael	2019-12-20	3	-285/+335
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71614
*	[AArch64][SVE] Correct intrinsics and patterns for logical predicate ↵	Paul Walker	2019-12-20	3	-386/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions In general SVE intrinsics are considered predicated and merging with everything else having suitable decoration. For predicated zeroing operations (like the predicate logical instructions) we use the "_z" suffix. After this change all intrinsics use their expected names (i.e. orr instead of or and eor instead of xor). I've removed intrinsics and patterns for condition code setting instructions as that data is not returned as part of the intrinsic. The expectation is to ask for a cc flag explicitly. For example: a = and_z(pg, p1, p2) cc = ptest_<flag>(pg, a) With the code generator expected to use "s" variants of instructions when available. Differential Revision: https://reviews.llvm.org/D71715
*	[AArch64] add more tests for extract-bitcast-splat; NFC	Sanjay Patel	2019-12-20	1	-2/+44
\| \| \| \| \|	Goes with D71672 - we should be able to handle casting to a wider type as well as casting to a narrower type.
*	[AArch64][SVE] Fold constant multiply of element count	Cullen Rhodes	2019-12-20	1	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: E.g. %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31) %mul = mul i64 %0, <const> Should emit: cntw x0, all, mul #<const> For <const> in the range 1-16. Patch by Kerry McLaughlin Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71014
*	[AArch64][SVE] Add intrnisics for saturating scalar arithmetic	Andrzej Warzynski	2019-12-20	4	-0/+1188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The following intrnisics are added: * @llvm.aarch64.sve.sqdec{b\|h\|w\|d\|p} * @llvm.aarch64.sve.sqinc{b\|h\|w\|d\|p} * @llvm.aarch64.sve.uqdec{b\|h\|w\|d\|p} * @llvm.aarch64.sve.uqinc{b\|h\|w\|d\|p} For every intrnisic there a scalar variants (with n32 or n64 suffix) and vector variants (no suffix). Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen, efriedma Subscribers: eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71252
*	Recommit "[AArch64][SVE] Add permutation and selection intrinsics"	Cullen Rhodes	2019-12-20	1	-0/+1636
\| \| \| \| \| \| \| \|	Recommit 23c28c40436143006be740533375c036d11c92cd (reverted in dcb48f50bdfa0fa47b62d089b6ed999d857fc9f8) with a fix for an assert "Request for a fixed size on a scalable object" being triggered in `LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the TypeSize object.
*	[AArch64][SVE] Add intrinsics for binary narrowing operations	Andrzej Warzynski	2019-12-20	1	-0/+512
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The following intrinsics for binary narrowing shift righ operations are added: * @llvm.aarch64.sve.shrnb * @llvm.aarch64.sve.uqshrnb * @llvm.aarch64.sve.sqshrnb * @llvm.aarch64.sve.sqshrunb * @llvm.aarch64.sve.uqrshrnb * @llvm.aarch64.sve.sqrshrnb * @llvm.aarch64.sve.sqrshrunb * @llvm.aarch64.sve.shrnt * @llvm.aarch64.sve.uqshrnt * @llvm.aarch64.sve.sqshrnt * @llvm.aarch64.sve.sqshrunt * @llvm.aarch64.sve.uqrshrnt * @llvm.aarch64.sve.sqrshrnt * @llvm.aarch64.sve.sqrshrunt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71552
*	Revert "[AArch64][SVE] Add permutation and selection intrinsics"	Cullen Rhodes	2019-12-19	1	-1636/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 23c28c40436143006be740533375c036d11c92cd. It caused build failures in the following expensive checks builders: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700 Reverting for now whilst I figure what the issue is.
*	[AArch64][SVE] Add permutation and selection intrinsics	Cullen Rhodes	2019-12-19	1	-0/+1636
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following intrinsics: * @llvm.aarch64.sve.clasta * @llvm.aarch64.sve.clasta_n * @llvm.aarch64.sve.clastb * @llvm.aarch64.sve.clastb_n * @llvm.aarch64.sve.compact * @llvm.aarch64.sve.ext * @llvm.aarch64.sve.lasta * @llvm.aarch64.sve.lastb * @llvm.aarch64.sve.rev * @llvm.aarch64.sve.splice * @llvm.aarch64.sve.tbl * @llvm.aarch64.sve.trn1 * @llvm.aarch64.sve.trn2 * @llvm.aarch64.sve.uzp1 * @llvm.aarch64.sve.uzp2 * @llvm.aarch64.sve.zip1 * @llvm.aarch64.sve.zip2 Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71401
*	[AArch64][SVE] Implement pfirst and pnext intrinsics	Cullen Rhodes	2019-12-19	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71472
*	[AArch64][SVE] Implement ptrue intrinsic	Cullen Rhodes	2019-12-19	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally, huntergr, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71457
*	Revert "[AArch64][SVE] Replace integer immediate intrinsics with splat ↵	Danilo Carvalho Grael	2019-12-18	3	-335/+285
\| \| \| \| \| \| \| \| \| \|	vector variant" This reverts commit 830e08b98bcb427136443093c282b25328137cf0 and eb1857ce0da481caf82271e6d0c9fc745dfab26f. This commit leads to an unexpected failure on test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll. The review will need more changes before its re-commited.
*	[AArch64][SVE] Fix gather scatter dag combine test.	Danilo Carvalho Grael	2019-12-18	1	-10/+6
\|
*	[AArch64][SVE] Replace integer immediate intrinsics with splat vector variant	Danilo Carvalho Grael	2019-12-18	2	-275/+329
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71614
*	[AArch64] add tests for bitcasted DUPLANE; NFC	Sanjay Patel	2019-12-18	1	-0/+40
\| \| \| \|	See D63815 for context/motivation.
*	[AArch64] update test checks; NFC	Sanjay Patel	2019-12-18	1	-491/+221
\| \| \| \| \|	The common prefix reduces a bunch of replication; not sure why it didn't happen before.
*	[AArch64] match fcvtl2 with bitcasted extract	Sanjay Patel	2019-12-18	1	-124/+45
\| \| \| \| \| \| \| \| \| \| \| \| \|	This should eliminate a regression seen in D63815. If we are FP extending the high half extract of a vector, we should be able to peek through a bitcast sitting between the extract and extend. This replaces tablegen patterns with a more general DAG to DAG override, so we can handle any casted type. Differential Revision: https://reviews.llvm.org/D71515
*	[AArch64] Improve codegen of volatile load/store of i128	Victor Campos	2019-12-18	2	-4/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of generating two i64 instructions for each load or store of a volatile i128 value (two LDRs or STRs), now emit a single LDP or STP. Reviewers: labrinea, t.p.northover, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69559
*	[AArch64] Enable clustering memory accesses to fixed stack objects	Jay Foad	2019-12-18	4	-26/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: r347747 added support for clustering mem ops with FI base operands including support for fixed stack objects in shouldClusterFI, but apparently this was never tested. This patch fixes shouldClusterFI to work with scaled as well as unscaled load/store instructions, and fixes the ordering of memory ops in MemOpInfo::operator< to ensure that memory addresses always increase, regardless of which direction the stack grows. Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71334
*	[AArch64][GlobalISel]: Fix a crash in GlobalIsel in dealing with 16bit ↵	Xiaoqing Wu	2019-12-17	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	uadd.with.overflow. Summary: AArch64 doesn't support uadd.with.overflow.i16 natively. This change adds a legalization rule to convert the 32bit add result to 16bit. This should fix PR43981. Reviewers: arsenm, qcolombet, paquette, aemerson Reviewed By: paquette Subscribers: wdng, rovka, kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71587
*	[FPEnv][LegalizeTypes][LegalizeDAG][AArch64] Few fixes/improvements for ↵	Craig Topper	2019-12-17	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	legalizing fp<->int conversion nodes. This started with adding a test to support get code coverage on ScalarizeVecOp_UnaryOp_StrictFP by copying an existing AArch64 test and using constrained sitofp/uitofp intrinsics. This found 3 separate issues: -ScalarizeVecOp_UnaryOp_StrictFP needs to do its own replacement because the caller can't handle replacing multiple results. -Missing integer promotion support for sitofp/uitofp -Chain result not always assigned in ExpandLegalINT_TO_FP. Committing them together so I can add the test case.
*	[SDAG] adjust isNegatibleForFree calculation to avoid crashing	Sanjay Patel	2019-12-17	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch adds a hack to work-around the case where we probably no longer detect that either multiply operand of an FMA isNegatibleForFree which is assumed to be true when we started rewriting the expression. Differential Revision: https://reviews.llvm.org/D70975
*	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called ↵	Sanjay Patel	2019-12-17	1	-18/+0
\| \| \| \| \| \| \|	from getNegatedExpression()" This reverts commit 36b1232ec5f370ab9fe8fcff0458d2fca5ca9b7f. Need to adjust commit message - that was a leftover from the earlier version.
*	[SDAG] remove use restriction in isNegatibleForFree() when called from ↵	Sanjay Patel	2019-12-17	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch adds a hack to work-around the case where we probably no longer detect that either multiply operand of an FMA isNegatibleForFree which is assumed to be true when we started rewriting the expression. Differential Revision: https://reviews.llvm.org/D70975
*	[AArch64] add tests for fcvtl2; NFC	Sanjay Patel	2019-12-17	1	-8/+183
\|
*	PostRA Machine Sink should take care of COPY defining register that is a ↵	alex-t	2019-12-17	2	-8/+8
\| \| \| \| \| \|	sub-register by another COPY source operand Differential Revision: https://reviews.llvm.org/D71132
*	Fix assertion failure in getMemOperandWithOffsetWidth	Kristof Beyls	2019-12-17	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes an assertion failure that triggers inside getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr that is not a memory operation. Different backends implement getMemOperandWithOffset differently: some return false on non-memory MachineInstrs, others assert. The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck relies on getMemOperandWithOffset to return false on non-memory MachineInstrs, instead of asserting. This patch updates the documentation on getMemOperandWithOffset that it should return false on any MachineInstr it cannot handle, instead of asserting. It also adapts the in-tree backends accordingly where necessary. Differential Revision: https://reviews.llvm.org/D71359
*	[AArch64][SVE] Add patterns for logical immediate operations.	Danilo Carvalho Grael	2019-12-16	1	-0/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add pattern matching for the following SVE logical vector and immediate instructions: - and/bic, orr/orn, eor/eon. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71483
*	Reland [AArch64][MachineOutliner] Return address signing for outlined functions	David Tellenbach	2019-12-16	12	-0/+1031
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reland after fixing a bug that allowed outlining of SP modifying instructions that invalidated return address signing. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635
*	[AArch64][SVE2] Add intrinsics for binary narrowing operations	Andrzej Warzynski	2019-12-16	1	-0/+278
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The following intrinsics for binary narrowing add and sub operations are added: * @llvm.aarch64.sve.addhnb * @llvm.aarch64.sve.addhnt * @llvm.aarch64.sve.raddhnb * @llvm.aarch64.sve.raddhnt * @llvm.aarch64.sve.subhnb * @llvm.aarch64.sve.subhnt * @llvm.aarch64.sve.rsubhnb * @llvm.aarch64.sve.rsubhnt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71424
*	[AArch64] Enable emission of stack maps for non-Mach-O binaries on AArch64.	Kristof Beyls	2019-12-16	1	-0/+492
\| \| \| \| \| \| \| \| \| \|	The emission of stack maps in AArch64 binaries has been disabled for all binary formats except Mach-O since rL206610, probably mistakenly, as far as I can tell. This patch reverts this to its intended state. Differential Revision: https://reviews.llvm.org/D70069 Patch by Loic Ottet.
*	[Aarch64][SVE] Add intrinsics for scatter stores	Andrzej Warzynski	2019-12-16	5	-0/+702
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds the following SVE intrinsics for scatter stores: * 64-bit offsets: * @llvm.aarch64.sve.st1.scatter (unscaled) * @llvm.aarch64.sve.st1.scatter.index (scaled) * 32-bit unscaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset) * 32-bit scaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset) * vector base + immediate: * @llvm.aarch64.sve.st1.scatter.imm Reviewers: rengolin, efriedma, sdesmalen Reviewed By: efriedma, sdesmalen Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71074
*	[DAG] Add SimplifyDemandedBits support for BSWAP	Sanjay Patel	2019-12-15	1	-6/+9
\| \| \| \| \| \| \| \| \|	This exposes a shortcoming for AArch64, and that is tracked by PR40881: https://bugs.llvm.org/show_bug.cgi?id=40881 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D58017