bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[Attributor] Move pass after InstCombine to futher eliminate null pointer checks	Dávid Bolvanský	2019-11-27	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: PR44149 Reviewers: jdoerfert Subscribers: mehdi_amini, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70737
*	[x86] make SLM extract vector element more expensive than default	Sanjay Patel	2019-11-27	7	-849/+1308
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
*	[InstCombine] add tests for copysign; NFC	Sanjay Patel	2019-11-27	1	-0/+41
\|
*	[Attributor] Handle special case when offset equals zero in nonnull deduction	Hideto Ueno	2019-11-27	1	-4/+2
\|
*	Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first ↵	Eric Christopher	2019-11-26	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pass at the O1 described there."" This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4 Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
*	[InstSimplify] fold copysign with same args to the arg	Sanjay Patel	2019-11-26	1	-4/+2
\| \| \| \| \| \| \|	This is correct for any value including NaN/inf. We don't have this fold directly in the backend either, but x86 manages to get it after converting things to bitops.
*	[InstSimplify] add tests for copysign; NFC	Sanjay Patel	2019-11-26	1	-0/+21
\|
*	[ConstFolding] move tests for copysign; NFC	Sanjay Patel	2019-11-26	1	-49/+0
\| \| \| \|	InstCombine doesn't have any transforms for copysign currently.
*	[InferFuncAttributes][Attributor] add tests for 'dereferenceable'; NFC	Sanjay Patel	2019-11-26	1	-0/+28
\| \| \| \| \| \| \|	Pulling a couple of extra tests out of D64258 before abandoning in favor of D70714
*	[InstCombine] Optimize some memccpy calls to memcpy/null	Dávid Bolvanský	2019-11-26	1	-15/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: return memccpy(d, "helloworld", 'r', 20) => return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8 Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68089
*	[Attributor] Track a GEP Instruction in align deduction	Hideto Ueno	2019-11-26	4	-7/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch enables us to track GEP instruction in align deduction. If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that ``` A + Offset = C * Q = B ``` So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70392
*	Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at ↵	Muhammad Omair Javaid	2019-11-26	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the O1 described there." This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
*	As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 ↵	Eric Christopher	2019-11-25	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
*	Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵	Florian Hahn	2019-11-24	2	-0/+290
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	order recurrences." This version contains 2 fixes for reported issues: 1. Make sure we do not try to sink terminator instructions. 2. Make sure we bail out, if we try to sink an instruction that needs to stay in place for another recurrence. Original message: If the recurrence PHI node has a single user, we can sink any instruction without side effects, given that all users are dominated by the instruction computing the incoming value of the next iteration ('Previous'). We can sink instructions that may cause traps, because that only causes the trap to occur later, but not on any new paths. With the relaxed check, we also have to make sure that we do not have a direct cycle (meaning PHI user == 'Previous), which indicates a reduction relation, which potentially gets missed by ReductionDescriptor. As follow-ups, we can also sink stores, iff they do not alias with other instructions we move them across and we could also support sinking chains of instructions and multiple users of the PHI. Fixes PR43398. Reviewers: hsaito, dcaballe, Ayal, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D69228
*	[LoopInterchange] Adjust assertions when updating successors.	Florian Hahn	2019-11-24	1	-0/+145
\| \| \| \| \| \| \| \| \| \| \|	Currently the assertion in updateSuccessor is overly strict in some cases and overly relaxed in other cases. For branches to the inner and outer loop preheader it is too strict, because they can either be unconditional branches or conditional branches with duplicate targets. Both cases are fine and we can allow updating multiple successors. On the other hand, we have to at least update one successor. This patch adds such an assertion.
*	[InstCombine] remove identity shuffle simplification for mask with undefs	Sanjay Patel	2019-11-24	10	-28/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246
*	Revert "[InlineCost] Fix infinite loop in indirect call evaluation"	Ehud Katz	2019-11-23	1	-55/+0
\| \| \| \| \| \| \|	This reverts commit 854e956219e78cb8d7ef3b021d7be6b5d6b6af04. It broke tests: Transforms/Inline/redundant-loads.ll Transforms/SampleProfile/inline-callee-update.ll
*	[InlineCost] Fix infinite loop in indirect call evaluation	Ehud Katz	2019-11-23	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently every time we encounter an indirect call of a known function, we try to evaluate the inline cost of that function. In case of a recursion, that evaluation never stops. The solution presented is to evaluate only the indirect call of the function, while any further indirect calls (of a known function) will be treated just as direct function calls, which, actually, never tries to evaluate the call. Fixes PR35469. Differential Revision: https://reviews.llvm.org/D69349
*	[InstCombine] Fix call guard difference with dbg	Davide Italiano	2019-11-22	1	-0/+1
\| \| \| \| \| \|	Patch by Chris Ye! Differential Revision: https://reviews.llvm.org/D68004
*	[SLP] Enhance SLPVectorizer to vectorize vector aggregate	Anton Afanasyev	2019-11-22	1	-14/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`. This patch allows `findBuildAggregate()` to consider vector aggregates as well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`. Fixes vector part of llvm.org/PR42022 Reviewers: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70068
*	[SLP][Test] Precommit tests for D70068 and D70587. NFC.	Anton Afanasyev	2019-11-22	1	-0/+288
\|
*	[JumpThreading] Use profile data even with the new pass manager	Kazu Hirata	2019-11-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without this patch, the jump threading pass ignores profiling data whenever we invoke the pass with the new pass manager. Specifically, JumpThreadingPass::run calls runImpl with class variable HasProfileData always set to false. In turn, runImpl sets HasProfileData to false again: HasProfileData = HasProfileData_; In the end, we don't use profiling data at all with the new pass manager. This patch fixes the problem by passing F.hasProfileData() to runImpl. The bug appears to have been introduced at: https://reviews.llvm.org/D41461 which removed local variable HasProfileData in JumpThreadingPass::run even though there was one more use left in the same function. As a result, the remaining use ended referring to the class variable instead. Note that F.hasProfileData is an extremely lightweight function, so I don't see the need to cache its result. Once this patch is approved, I'm planning to stop caching the result of F.hasProfileData in runOnFunction. Reviewers: wmi, eli.friedman Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70509
*	[LoopPred] Robustly handle partially unswitched loops	Philip Reames	2019-11-21	1	-0/+111
\| \| \| \|	We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed. Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop. LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
*	Clang-trunk Generates Wrong Debug values with -O1	Vedant Kumar	2019-11-21	1	-0/+126
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bit-Tracking Dead Code Elimination (bdce) do not mark dbg.value as undef after deleting instruction. which shows invalid state of variable in debugger. This patches fixes this by marking the dbg.value as undef which depends on dead instruction. This fixes https://bugs.llvm.org/show_bug.cgi?id=41925 Patch by kamlesh kumar! Differential Revision: https://reviews.llvm.org/D70040
*	Broaden the definition of a "widenable branch"	Philip Reames	2019-11-21	4	-0/+386
\| \| \| \| \| \| \| \| \| \| \| \|	As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality. Broaden the definition in two ways: Allow swapped operands to the br (and X, WC()) form Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC() The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop. Differential Revision: https://reviews.llvm.org/D70502
*	[LV] PreferPredicateOverEpilog respecting option	Sjoerd Meijer	2019-11-21	1	-0/+18
\| \| \| \| \| \| \| \|	Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not to predicate the loop. Differential Revision: https://reviews.llvm.org/D70382
*	Precommit tests for forthcoming widenable.condition transforms	Philip Reames	2019-11-20	1	-0/+156
\|
*	Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update ↵	Eric Christopher	2019-11-20	2	-19/+19
\| \| \| \| \| \| \| \| \| \|	testcases. After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 8a0aa5310bccbb42d16d11db090419fcefdd1376.
*	Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction ↵	Eric Christopher	2019-11-20	1	-10/+9
\| \| \| \| \| \| \| \|	patterns"" as there were testcase changes after that need to also be reverted. This reverts commit cd8748a15f2d18861b3548eb26ed2b52e5ee50b4.
*	Temporarily Revert "[SLP] allow forming 2-way reduction patterns"	Eric Christopher	2019-11-20	1	-9/+10
\| \| \| \| \| \| \| \|	After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 7ff57705ba196ce649d6034614b3b9df57e1f84f.
*	[SLP] reduce duplicate CHECK lines in tests; NFC	Sanjay Patel	2019-11-20	1	-392/+208
\|
*	[tests] Autogen a test to eliminate spurious diff from following patch	Philip Reames	2019-11-19	1	-20/+20
\|
*	[GuardWidening] Remove WidenFrequentBranches transform	Philip Reames	2019-11-19	1	-820/+0
\| \| \| \|	This code has never been enabled. While it is tested, it's complicating some refactoring. If we decide to re-implement this, doing it in SimplifyCFG would probably make more sense anyways.
*	[LoopPred] Generalize profitability check to handle unswitch output	Philip Reames	2019-11-19	1	-0/+81
\| \| \| \|	Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit). Generalize the "likely very rare exit" check slightly to handle this form.
*	[ValueTracking] Add a basic version of isKnownNonInfinity and use it to ↵	Benjamin Kramer	2019-11-19	1	-3/+36
\| \| \| \|	detect more NoNaNs
*	llvm/ObjCARC: Eliminate inlined AutoreleaseRV calls	Duncan P. N. Exon Smith	2019-11-19	2	-2/+293
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pair up inlined AutoreleaseRV calls with their matching RetainRV or ClaimRV. - RetainRV cancels out AutoreleaseRV. Delete both instructions. - ClaimRV is a peephole for RetainRV+Release. Delete AutoreleaseRV and replace ClaimRV with Release. This avoids problems where more aggressive inlining triggers memory regressions. This patch is happy to skip over non-callable instructions and non-ARC intrinsics looking for the pair. It is likely sound to also skip over opaque function calls, but that's harder to reason about, and it's not relevant to the goal here: if there's an opaque function call splitting up a pair, it's very unlikely that a handshake would have happened dynamically without inlining. Note that this patch also subsumes the previous logic that looked backwards from ReleaseRV. https://reviews.llvm.org/D70370 rdar://problem/46509586
*	[SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)	Sanjay Patel	2019-11-19	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 1st attempt was reverted because it revealed an existing bug where we could produce invalid IR (use of value before definition). That should be fixed with: rG39de82ecc9c2 The bug manifests as replacing a reduction operand with an undef value. The problem appears to be limited to cases where a min/max reduction has extra uses of the compare operand to the select. In the general case, we are tracking "ExternallyUsedValues" and an "IgnoreList" of the reduction operations, but those may not apply to the final compare+select in a min/max reduction. For that, we use replaceAllUsesWith (RAUW) to ensure that the new vectorized reduction values are transferred to all subsequent users. Differential Revision: https://reviews.llvm.org/D70148
*	[ARM] MVE interleaving load and stores.	David Green	2019-11-19	2	-224/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392
*	[ARM] Add and update a lot of VLDn tests. NFC	David Green	2019-11-19	2	-509/+1693
\|
*	[SLP] fix insertion point for min/max reduction	Sanjay Patel	2019-11-19	1	-1/+1
\| \| \| \| \| \|	As discussed in D70148 (and caused a revert of the original commit): if we insert at the select, then we can produce invalid IR because the replacement for the compare may have uses before the select.
*	[SLP] add test for reduction miscompile; NFC	Sanjay Patel	2019-11-19	1	-0/+30
\| \| \| \|	See D70148 for discussion.
*	[AMDGPU] Tune inlining parameters for AMDGPU target (part 2)	dfukalov	2019-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Most of IR instructions got better code size estimations after commit 47a5c36b. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391
*	Temporarily revert "[SLP] fix miscompile on min/max reductions with extra ↵	Eric Christopher	2019-11-18	2	-2/+2
\| \| \| \| \| \| \| \|	uses (PR43948)" as it causes an ICE on valid. A testcase was followed up on the original thread. This reverts commit a3e61946c5bd7bdfab15af76b292e52d6ffa27f7.
*	[SLP] reduce duplicated check lines in tests; NFC	Sanjay Patel	2019-11-18	1	-313/+163
\|
*	[LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits	Philip Reames	2019-11-18	1	-0/+758
\| \| \| \| \| \| \| \| \| \| \| \|	This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars. The core notions of the transform are as follows: If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a profitability question as to what conditions to fold into the widenable branch. To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities. Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold. Differential Revision: https://reviews.llvm.org/D69830
*	[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.	Simon Tatham	2019-11-18	1	-0/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you're writing C code using the ACLE MVE intrinsics that passes the result of a vcmp as input to a predicated intrinsic, e.g. mve_pred16_t pred = vcmpeqq(v1, v2); v_out = vaddq_m(v_inactive, v3, v4, pred); then clang's codegen for the compare intrinsic will create calls to `@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an `mve_pred16_t` integer representation, and then the next intrinsic will call `@llvm.arm.mve.pred.i2v` to convert it straight back again. This will be visible in the generated code as a `vmrs`/`vmsr` pair that move the predicate value pointlessly out of `p0` and back into it again. To prevent that, I've added InstCombine rules to remove round trips of the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine about the known and demanded bits of those intrinsics. As a result, you now get just the generated code you wanted: vpt.u16 eq, q1, q2 vaddt.u16 q0, q3, q4 Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70313
*	[InstCombine] prevent crashing/assert on shift constant expression (PR44028)	Sanjay Patel	2019-11-17	1	-0/+17
\| \| \| \| \|	The binary operator cast implies an instruction, but the matcher for shift does not: https://bugs.llvm.org/show_bug.cgi?id=44028
*	[ConstantFold] Handle identity folds at top of ConstantFoldBinaryInst	Florian Hahn	2019-11-17	1	-14/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we miss folds with undef and identity values for binary ops that do not fold to undef in general. We can generalize the identity simplifications and do them before checking for undef in particular. Alive checks: * OR - https://rise4fun.com/Alive/8OsK * AND - https://rise4fun.com/Alive/e3tE This will also allow us to remove some now redundant cases throughout the function, but I would like to do this as follow-up. That should make tracking down potential issues easier. Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70169
*	[Attributor] Use nofree argument attribute for heap-to-stack conversion	Stefan Stipanovic	2019-11-17	1	-1/+19
\| \| \| \| \| \| \| \|	Reviewers: jdoerfert, uenoku Subscribers: Differential Revision: https://reviews.llvm.org/D70140
*	[SimplifyCFG] propagate fast-math-flags (FMF) from phi to select	Sanjay Patel	2019-11-17	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to/extension of D70208 (rGee0882bdf866), but this one may finally allow closing motivating bugs. This is another step towards having FMF apply only to FP values rather than those + fcmp. See PR38086 for one of the original discussions/motivations: https://bugs.llvm.org/show_bug.cgi?id=38086 And the test here is derived from PR39535: https://bugs.llvm.org/show_bug.cgi?id=39535 Currently, we lose FMF when converting any phi to select in SimplifyCFG. There are a small number of similar changes needed to correct within SimplifyCFG, so it should be quick to patch this pass up. FMF was extended to select and phi with: D61917 D67564