bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.	Simon Pilgrim	2017-05-07	2	-24/+24
\| \| \| \| \| \|	Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378
*	[SCEV] createAddRecFromPHI: Optimize for the most common case.	Michael Zolotukhin	2017-05-03	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The existing implementation creates a symbolic SCEV expression every time we analyze a phi node and then has to remove it, when the analysis is finished. This is very expensive, and in most of the cases it's also unnecessary. According to the data I collected, ~60-70% of analyzed phi nodes (measured on SPEC) have the following form: PN = phi(Start, OP(Self, Constant)) Handling such cases separately significantly speeds this up. Reviewers: sanjoy, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32663 llvm-svn: 302096
*	Support arbitrary address space pointers in masked gather/scatter intrinsics.	Elad Cohen	2017-05-03	2	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a non-default address space pointer we fail with a "Calling a function with a bad singature!" assertion. This patch solves this by adding the 'vector of pointers' argument as an overloaded type which will determine the address space. Differential revision: https://reviews.llvm.org/D31490 llvm-svn: 302018
*	Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts	Sanjoy Das	2017-05-01	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	In cases where an instruction (a call site, say) is RAUW'ed with some other value (this is possible via the `returned` attribute, for instance), we want the slot in UnknownInsts to point to the original Instruction we wanted to track, not the value it got replaced by. Fixes PR32587. This relands r301426. llvm-svn: 301814
*	Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC	Sanjoy Das	2017-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: programUndefinedIfPoison makes more sense, given what the function does; and I'm about to add a function with a name similar to isKnownNotFullPoison (so do the rename to avoid confusion). Reviewers: broune, majnemer, bjarke.roune Reviewed By: broune Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30444 llvm-svn: 301776
*	Reverts commit r301424, r301425 and r301426	Sanjoy Das	2017-04-26	1	-25/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429
*	Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts	Sanjoy Das	2017-04-26	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In cases where an instruction (a call site, say) is RAUW'ed with some other value (this is possible via the `returned` attribute, amongst other things), we want the slot in UnknownInsts to point to the original Instruction we wanted to track, not the value it got replaced by. Fixes PR32587. Reviewers: davide Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32268 llvm-svn: 301426
*	[IVUsers] Don't bail out of normalizing non-affine add recs	Sanjoy Das	2017-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In a previous change I changed SCEV's normalization / denormalization to work with non-affine add recs. So the bailout in IVUsers can be removed. Reviewers: atrick, efriedma Reviewed By: atrick Subscribers: davide, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32105 llvm-svn: 301298
*	[SCEV] Fix exponential time complexity by caching	Sanjoy Das	2017-04-24	1	-0/+57
\| \| \| \|	llvm-svn: 301149
*	Revert r300746 (SCEV analysis for or instructions).	Eli Friedman	2017-04-20	1	-38/+0
\| \| \| \| \| \| \| \|	There have been multiple reports of this causing problems: a compile-time explosion on the LLVM testsuite, and a stack overflow for an opencl kernel. llvm-svn: 300928
*	[SCEV] Make SCEV or modeling more aggressive.	Eli Friedman	2017-04-19	1	-0/+38
\| \| \| \| \| \| \| \| \| \|	Use haveNoCommonBitsSet to figure out whether an "or" instruction is equivalent to addition. This handles more cases than just checking for a constant on the RHS. Differential Revision: https://reviews.llvm.org/D32239 llvm-svn: 300746
*	AMDGPU: Change DivergenceAnalysis for function arguments	Matt Arsenault	2017-04-19	1	-1/+26
\| \| \| \| \| \|	Stop assuming all functions are kernels. llvm-svn: 300719
*	[BPI] Use metadata info before any other heuristics	Serguei Katkov	2017-04-17	1	-0/+225
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Metadata potentially is more precise than any heuristics we use, so it makes sense to use first metadata info if it is available. However it makes sense to examine it against other strong heuristics like unreachable one. If edge coming to unreachable block has higher probability then it is expected by unreachable heuristic then we use heuristic and remaining probability is distributed among other reachable blocks equally. An example where metadata might be more strong then unreachable heuristic is as follows: it is possible that there are two branches and for the branch A metadata says that its probability is (0, 2^25). For the branch B the probability is (1, 2^25). So the expectation is that first edge of B is hotter than first edge of A because first edge of A did not executed at least once. If first edge of A points to the unreachable block then using the unreachable heuristics we'll set the probability for A to (1, 2^20) and now edge of A becomes hotter than edge of B. This is unexpected behavior. This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214 Reviewers: sanjoy, junbuml, vsk, chandlerc Reviewed By: chandlerc Subscribers: llvm-commits, reames, davidxl Differential Revision: https://reviews.llvm.org/D30631 llvm-svn: 300440
*	[Analysis] Support bitreverse in -demanded-bits pass	Brian Gesiak	2017-04-13	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: * Add a bitreverse case in the demanded bits analysis pass. * Add tests for the bitreverse (and bswap) intrinsic in the demanded bits pass. * Add a test case to the BDCE tests: that manipulations to high-order bits are eliminated once the bits are reversed and then right-shifted. Reviewers: mkuper, jmolloy, hfinkel, trentxintong Reviewed By: jmolloy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31857 llvm-svn: 300215
*	Remove readnone from invariant.group.barrier	Piotr Padlewski	2017-04-12	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Readnone attribute would cause CSE of two barriers with the same argument, which is invalid by example: struct Base { virtual int foo() { return 42; } }; struct Derived1 : Base { int foo() override { return 50; } }; struct Derived2 : Base { int foo() override { return 100; } }; void foo() { Base *x = new Base{}; new (x) Derived1{}; int a = std::launder(x)->foo(); new (x) Derived2{}; int b = std::launder(x)->foo(); } Here 2 calls of std::launder will produce @llvm.invariant.group.barrier, which would be merged into one call, causing devirtualization to devirtualize second call into Derived1::foo() instead of Derived2::foo() Reviewers: chandlerc, dberlin, hfinkel Subscribers: llvm-commits, rsmith, amharc Differential Revision: https://reviews.llvm.org/D31531 llvm-svn: 300101
*	[SystemZ] Fix target specific tests	Renato Golin	2017-04-12	1	-0/+2
\| \| \| \|	llvm-svn: 300078
*	[SystemZ] Updated test fp-cast.ll	Jonas Paulsson	2017-04-12	1	-58/+58
\| \| \| \| \| \|	This did not get included in the previous commit for SystemZ cost functions. llvm-svn: 300053
*	[SystemZ] TargetTransformInfo cost functions implemented.	Jonas Paulsson	2017-04-12	13	-0/+8096
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
*	[BPI] Refactor post domination calculation and simple fix for ColdCall	Serguei Katkov	2017-04-12	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been split out of heuristics itself. Update of the data happens now for each basic block (before update for PostDominatedByColdCall might be skipped if unreachable or matadata heuristic handled this basic block). This separation allows re-ordering of heuristics without loosing the post-domination information. Reviewers: sanjoy, junbuml, vsk, chandlerc, reames Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31701 llvm-svn: 300029
*	MemorySSA: Move to Analysis, from Transforms/Utils. It's used as	Daniel Berlin	2017-04-11	22	-0/+1510
\| \| \| \| \| \| \| \|	Analysis, it has Analysis passes, and once NewGVN is made an Analysis, this removes the cross dependency from Analysis to Transform/Utils. NFC. llvm-svn: 299980
*	Add address space mangling to lifetime intrinsics	Matt Arsenault	2017-04-10	2	-7/+7
\| \| \| \| \| \|	In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
*	[ScalarEvolution] Re-enable Predicate implication from operations	Max Kazantsev	2017-03-31	2	-0/+381
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build. The reason of the crash was type mismatch between either a or b and RHS in the following situation: LHS = sext(a +nsw b) > RHS. This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type. But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this situation we don't need to create any non-constant SCEVs. This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not go further into range analysis etc (because in some situations these analyzes succeed even when the passed arguments have wrong types, what should not normally happen). The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong usage of predicates in recursive invocations. The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll Reviewers: reames, apilipenko, anna, sanjoy Reviewed By: sanjoy Subscribers: mzolotukhin, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D31238 llvm-svn: 299205
*	AMDGPU: Add all atomicrmw fields to atomic.inc/dec	Matt Arsenault	2017-03-30	1	-12/+12
\| \| \| \| \| \|	Add scope, order, isVolatile llvm-svn: 299122
*	Revert "[ScalarEvolution] Re-enable Predicate implication from operations"	Max Kazantsev	2017-03-24	2	-358/+0
\| \| \| \| \| \| \| \|	This reverts commit rL298690 Causes failures on clang. llvm-svn: 298693
*	[ScalarEvolution] Re-enable Predicate implication from operations	Max Kazantsev	2017-03-24	2	-0/+358
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build. The reason of the crash was type mismatch between either a or b and RHS in the following situation: LHS = sext(a +nsw b) > RHS. This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type. But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this situation we don't need to create any non-constant SCEVs. This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not go further into range analysis etc (because in some situations these analyzes succeed even when the passed arguments have wrong types, what should not normally happen). The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong usage of predicates in recursive invocations. The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll llvm-svn: 298690
*	Model ashr(shl(x, n), m) as mul(x, 2^(n-m)) when n > m	Zhaoshi Zheng	2017-03-23	2	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \|	Given below case: %y = shl %x, n %z = ashr %y, m when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV expression. llvm-svn: 298631
*	[LVI] Add an LVI printer pass to capture test LVI cache after transformations	Anna Thomas	2017-03-22	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adding a printer pass for printing the LVI cache values after transformations that use LVI. This will help us in identifying cases where LVI invariants are violated, or transforms that leave LVI in an incorrect state. Right now, I have added two test cases to show that the printer pass is working. I will be adding more test cases in a later change, once this change is checked in upstream. Reviewers: reames, dberlin, sanjoy, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30790 llvm-svn: 298542
*	Revert "[ScalarEvolution] Predicate implication from operations"	Max Kazantsev	2017-03-22	1	-334/+0
\| \| \| \| \| \| \| \|	This reverts commit rL298481 Fails clang-with-lto-ubuntu build. llvm-svn: 298489
*	[ScalarEvolution] Predicate implication from operations	Max Kazantsev	2017-03-22	1	-0/+334
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows SCEV predicate analysis to prove implication of some expression predicates from context predicates related to arguments of those expressions. It introduces three new rules: For addition: (A >X && B >= 0) \|\| (B >= 0 && A > X) ===> (A + B) > X. For division: (A > X) && (0 < B <= X + 1) ===> (A / B > 0). (A > X) && (-B <= X < 0) ===> (A / B >= 0). Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0". They can also be combined with the same context, to prove more complex expressions like "if X > 1 then X/2 + 1 > 1". Diffirential Revision: https://reviews.llvm.org/D30887 Reviewed by: sanjoy llvm-svn: 298481
*	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel	Matt Arsenault	2017-03-21	16	-107/+107
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
*	[ConstantFolding] Fix to prevent constant folding having to repeatedly scan ↵	David Green	2017-03-21	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \|	operands. NFCI After the loop unroll threshold was increased in r295538, very large constant expressions can be created. This prevents them from having to be recursively scanned, leading to a compile time blow-up. Differential Revision: https://reviews.llvm.org/D30689 llvm-svn: 298356
*	[SCEV] Fix trip multiple calculation	Eli Friedman	2017-03-20	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If loop bound containing calculations like min(a,b), the Scalar Evolution API getSmallConstantTripMultiple returns 4294967295 "-1" as the trip multiple. The problem is that, SCEV use -1 * umax to represent umin. The multiple constant -1 was returned, and the logic of guarding against huge trip counts was skipped. Because -1 has 32 active bits. The fix attempt to factor more general cases. First try to get the greatest power of two divisor of trip count expression. In case overflow happens, the trip count expression is still divisible by the greatest power of two divisor returned. Returns 1 if not divisible by 2. Patch by Huihui Zhang <huihuiz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D30840 llvm-svn: 298301
*	[SCEV] Compute affine range in another way to avoid bitwidth extending.	Michael Zolotukhin	2017-03-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This approach has two major advantages over the existing one: 1. We don't need to extend bitwidth in our computations. Extending bitwidth is a big issue for compile time as we often end up working with APInts wider than 64bit, which is a slow case for APInt. 2. When we zero extend a wrapped range, we lose some information (we replace the range with [0, 1 << src bit width)). Thus, avoiding such extensions better preserves information. Correctness testing: I ran 'ninja check' with assertions that the new implementation of getRangeForAffineAR gives the same results as the old one (this functionality is not present in this patch). There were several failures - I inspected them manually and found out that they all are caused by the fact that we're returning more accurate results now (see bullet (2) above). Without such assertions 'ninja check' works just fine, as well as SPEC2006. Compile time testing: CTMark/Os: - mafft/pairlocalalign -16.98% - tramp3d-v4/tramp3d-v4 -12.72% - lencod/lencod -11.51% - Bullet/bullet -4.36% - ClamAV/clamscan -3.66% - 7zip/7zip-benchmark -3.19% - sqlite3/sqlite3 -2.95% - SPASS/SPASS -2.74% - Average -5.81% Performance testing: The changes are expected to be neutral for runtime performance. Reviewers: sanjoy, atrick, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30477 llvm-svn: 297992
*	[BasicTTIImpl] Bugfix in getIntrinsicInstrCost()	Jonas Paulsson	2017-03-16	1	-0/+66
\| \| \| \| \| \| \| \| \|	Don't call getScalarizationOverhead(RetTy, true, false) if RetTy is void type. Review: Hal Finkel https://reviews.llvm.org/D31024 llvm-svn: 297954
*	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars	Simon Pilgrim	2017-03-15	1	-28/+24
\| \| \| \| \| \|	Prep work for PR31810 llvm-svn: 297876
*	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved	Jonas Paulsson	2017-03-14	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
*	[ConstantFold] Fix defect in constant folding computation for GEP	Javed Absar	2017-03-08	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the array indexes are all determined by GVN to be constants, a call is made to constant-folding to optimize/simplify the address computation. The constant-folding, however, makes a mistake in that it sometimes reads back stale Idxs instead of NewIdxs, that it re-computed in previous iteration. This leads to incorrect addresses coming out of constant-folding to GEP. A test case is included. The error is only triggered when indexes have particular patterns that the stale/new index updates interplay matters. Reviewers: Daniel Berlin Differential Revision: https://reviews.llvm.org/D30642 llvm-svn: 297317
*	Fix minor typo introduce in r297014	Tobias Grosser	2017-03-06	1	-1/+1
\| \| \| \|	llvm-svn: 297020
*	New Test-Case for Region Analysis	Tobias Grosser	2017-03-06	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While working on improvements to region info analysis, this test case caused an incorrect region bb2 => bb3 to be detected. Reviewers: grosser Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30652 llvm-svn: 297014
*	New Test-Case for Region Analysis	Tobias Grosser	2017-03-05	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While working on improvements to the region info analysis, this test case caused an incorrect region 1 => 2 to be detected. It is incorrect because entry has an outgoing edge to 3. This is interesting because 1 dom 2 and 2 pdom 1, which should have been enough to prevent incoming forward edges into the region and outgoing forward edges from the region. Reviewers: grosser Subscribers: llvm-commits Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Differential Revision: https://reviews.llvm.org/D30603 llvm-svn: 296988
*	Revert "Fix PR 24415 (at least), by making our post-dominator tree behavior ↵	Tobias Grosser	2017-03-02	11	-113/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sane." and also "clang-format GenericDomTreeConstruction.h, since the current formatting makes it look like their is a bug in the loop indentation, and there is not" This reverts commit r296535. There are still some open design questions which I would like to discuss. I revert this for Daniel (who gave the OK), as he is on vacation. llvm-svn: 296812
*	[BasicAA] Take attributes into account when requesting modref info for a ↵	Igor Laevsky	2017-03-01	1	-0/+42
\| \| \| \| \| \| \| \|	call site Differential Revision: https://reviews.llvm.org/D29989 llvm-svn: 296617
*	Fix PR 24415 (at least), by making our post-dominator tree behavior sane.	Daniel Berlin	2017-02-28	11	-31/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, our post-dom tree tries to ignore and remove the effects of infinite loops. It fails miserably at this, because it tries to do it ahead of time, and thus can only detect self-loops, and any other type of infinite loop, it pretends doesn't exist at all. This can, in a bunch of cases, lead to wrong answers and a completely empty post-dom tree. Wrong answer: ``` declare void foo() define internal void @f() { entry: br i1 undef, label %bb35, label %bb3.i bb3.i: call void @foo() br label %bb3.i bb35.loopexit3: br label %bb35 bb35: ret void } ``` We get: ``` Inorder PostDominator Tree: [1] <<exit node>> {0,7} [2] %bb35 {1,6} [3] %bb35.loopexit3 {2,3} [3] %entry {4,5} ``` This is a trivial modification of the testcase for PR 6047 Note that we pretend bb3.i doesn't exist. We also pretend that bb35 post-dominates entry. While it's true that it does not exit in a theoretical sense, it's not really helpful to try to ignore the effect and pretend that bb35 post-dominates entry. Worse, we pretend the infinite loop does nothing (it's usually considered a side-effect), and doesn't even exist, even when it calls a function. Sadly, this makes it impossible to use when you are trying to move code safely. All compilers also create virtual or real single exit nodes (including us), and connect infinite loops there (which this patch does). In fact, others have worked around our behavior here, to the point of building their own post-dom trees: https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing out the region infrastructure is near-useless for them with postdom in this state :( Completely empty post-dom tree: ``` define void @spam() #0 { bb: br label %bb1 bb1: ; preds = %bb1, %bb br label %bb1 bb2: ; No predecessors! ret void } ``` Printing analysis 'Post-Dominator Tree Construction' for function 'foo': =============================-------------------------------- Inorder PostDominator Tree: [1] <<exit node>> {0,1} :( (note that even if you ignore the effects of infinite loops, bb2 should be present as an exit node that post-dominates nothing). This patch changes post-dom to properly handle infinite loops and does root finding during calculation to prevent empty tress in such cases. We match gcc's (and the canonical theoretical) behavior for infinite loops (find the backedge, connect it to the exit block). Testcases coming as soon as i finish running this on a ton of random graphs :) Reviewers: chandlerc, davide Subscribers: bryant, llvm-commits Differential Revision: https://reviews.llvm.org/D29705 llvm-svn: 296535
*	[ValueTracking] Make poison propagation more aggressive	Sanjoy Das	2017-02-22	3	-17/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Motivation: fix PR31181 without regression (the actual fix is still in progress). However, the actual content of PR31181 is not relevant here. This change makes poison propagation more aggressive in the following cases: 1. poision * Val == poison, for any Val. In particular, this changes existing intentional and documented behavior in these two cases: a. Val is 0 b. Val is 2^k * N 2. poison << Val == poison, for any Val 3. getelementptr is poison if any input is poison I think all of these are justified (and are axiomatically true in the new poison / undef model): 1a: we need poison * 0 to be poison to allow transforms like these: A * (B + C) ==> A * B + A * C If poison * 0 were 0 then the above transform could not be allowed since e.g. we could have A = poison, B = 1, C = -1, making the LHS poison * (1 + -1) = poison * 0 = 0 and the RHS poison * 1 + poison * -1 = poison + poison = poison 1b: we need e.g. poison * 4 to be poison since we want to allow A * 4 ==> A + A + A + A If poison * 4 were a value with all of their bits poison except the last four; then we'd not be able to do this transform since then if A were poison the LHS would only be "partially" poison while the RHS would be "full" poison. 2: Same reasoning as (1b), we'd like have the following kinds transforms be legal: A << 1 ==> A + A Reviewers: majnemer, efriedma Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30185 llvm-svn: 295809
*	[PPC] Give unaligned memory access lower cost on processor that supports it	Guozhi Wei	2017-02-17	2	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost. This patch fixes pr31492. Differential Revision: https://reviews.llvm.org/D28630 This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug. llvm-svn: 295506
*	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics	Matt Arsenault	2017-02-16	1	-22/+0
\| \| \| \| \| \|	Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
*	[SCEV] Cache results during GetMinTrailingZeros query	Igor Laevsky	2017-02-14	1	-0/+63
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29759 llvm-svn: 295060
*	[ValueTracking] use nonnull argument attribute to eliminate null checks	Sanjay Patel	2017-02-12	1	-7/+50
\| \| \| \| \| \| \| \| \| \| \|	Enhancing value tracking's analysis of null-ness was suggested in D27855, so here's a first attempt at that. This is part of solving: https://llvm.org/bugs/show_bug.cgi?id=28430 Differential Revision: https://reviews.llvm.org/D28204 llvm-svn: 294897
*	[LV/LoopAccess] Check statically if an unknown dependence distance can be	Dorit Nuzman	2017-02-12	2	-4/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	proven larger than the loop-count This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible). For vectorization it is sufficient to prove that the dependence distance is >= VF; But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving distance >= LoopCount also guarantees that distance >= VF. This check is also equivalent to the Strong SIV Test. Reviewers: mkuper, anemet, sanjoy Differential Revision: https://reviews.llvm.org/D28044 llvm-svn: 294892
*	[X86] Add costs for non-AVX512 single-source permutation integer shuffles	Michael Kuperstein	2017-02-02	1	-20/+20
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932