bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[BasicAA] Make BasicAA a cfg pass.	Alina Sbirlea	2020-06-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Part of the changes in D44564 made BasicAA not CFG only due to it using PhiAnalysisValues which may have values invalidated. Subsequent patches (rL340613) appear to have addressed this limitation. BasicAA should not be invalidated by non-CFG-altering passes. A concrete example is MemCpyOpt which preserves CFG, but we are testing it invalidates BasicAA. llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM Reviewers: john.brawn, sebpop, hfinkel, brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74353 (cherry picked from commit 0cecafd647ccd9d0acc5968d4d6e80c1cbdee275)
*	[SCEV] accurate range for addrecexpr with nuw flag	Zheng Chen	2020-01-12	1	-2/+2
\| \| \| \| \| \| \| \| \|	If addrecexpr has nuw flag, the value should never be less than its start value and start value does not required to be SCEVConstant. Reviewed By: nikic, sanjoy Differential Revision: https://reviews.llvm.org/D71690
*	[SCEV] more accurate range for addrecexpr with nsw flag.	Zheng Chen	2020-01-11	1	-4/+4
\| \| \| \| \| \|	Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D72436
*	[SCEV] [NFC] add more test cases for range of addrecexpr with nsw flag	Zheng Chen	2020-01-10	1	-4/+65
\|
*	[SCEV] [NFC] add testcase for constant range for addrecexpr with nsw flag	Zheng Chen	2020-01-09	1	-0/+19
\|
*	[CostModel][X86] Add missing scalar i64->f32 uitofp costs	Simon Pilgrim	2020-01-06	1	-11/+11
\|
*	Migrate function attribute "no-frame-pointer-elim"="false" to ↵	Fangrui Song	2019-12-24	5	-5/+5
\| \| \| \|	"frame-pointer"="none" as cleanups after D56351
*	Migrate function attribute "no-frame-pointer-elim-non-leaf" to ↵	Fangrui Song	2019-12-24	1	-2/+2
\| \| \| \|	"frame-pointer"="non-leaf" as cleanups after D56351
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	7	-11/+11
\| \| \| \|	as cleanups after D56351
*	[SCEV] add testcase for get accurate range for addrecexpr with nuw flag	czhengsz	2019-12-22	1	-0/+20
\|
*	[DDG] Data Dependence Graph - Ordinals	Bardia Mahjour	2019-12-19	3	-226/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch associates ordinal numbers to the DDG Nodes allowing the builder to order nodes within a pi-block in program order. The algorithm works by simply assuming the order in which the BBList is fed into the builder. The builder already relies on the blocks being in program order so that it can compute the dependencies correctly. Similarly the order of instructions in their parent basic blocks determine their program order. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tags: #llvm Differential Revision: https://reviews.llvm.org/D70986
*	[SCEV] NFC - add testcase for get accurate range for AddExpr	czhengsz	2019-12-19	1	-0/+21
\|
*	[AMDGPU] Implemented fma cost analysis	Stanislav Mekhanoshin	2019-12-18	1	-0/+120
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71676
*	[AMDGPU] Fixed cost model for packed 16 bit ops	Stanislav Mekhanoshin	2019-12-17	7	-102/+258
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71622
*	[BasicAA] Use GEP as context for computeKnownBits in aliasGEP.	Florian Hahn	2019-12-12	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to use assumptions, computeKnownBits needs a context instruction. We can use the GEP, if it is an instruction. We already pass the assumption cache, but it cannot be used without a context instruction. Reviewers: anemet, asbirlea, hfinkel, spatel Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D71264
*	[ValueTracking] Pointer is known nonnull after load/store	Danila Kutenin	2019-12-11	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \|	If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177
*	[ValueTracking] Add tests for non-null check after load/store; NFC	Danila Kutenin	2019-12-11	1	-17/+110
\| \| \| \|	Tests for D71177.
*	[DA] Improve dump to show source and sink of the dependence	Bardia Mahjour	2019-12-11	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current da printer shows the dependence without indicating which instructions are being considered as the src vs dst. It also silently ignores call instructions, despite the fact that they create confused dependence edges to other memory instructions. This patch addresses these two issues plus a couple of minor non-functional improvements. Authored By: bmahjour Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc Reviewed By: dmgreen, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D71088
*	[ConstantFold][SVE] Fix constant folding for shufflevector.	Eli Friedman	2019-12-09	1	-0/+11
\| \| \| \| \| \| \|	Don't try to fold away shuffles which can't be folded. Fix creation of shufflevector constant expressions. Differential Revision: https://reviews.llvm.org/D71147
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[ARM] Additional tests and minor formatting. NFC	David Green	2019-12-09	1	-0/+96
\| \| \| \| \| \|	This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
*	[x86] add cost model special-case for insert/extract from element 0	Sanjay Patel	2019-12-06	4	-62/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023
*	[ConstantFold][SVE] Skip scalable vectors in ↵	Huihui Zhang	2019-12-05	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ConstantFoldInsertElementInstruction. Summary: Should not constant fold insertelement instruction for scalable vector type. Reviewers: huntergr, sdesmalen, spatel, levedev.ri, apazos, efriedma, willlovett Reviewed By: efriedma, spatel Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70985
*	[DDG] Data Dependence Graph - Topological Sort (Memory Leak Fix)	Bardia Mahjour	2019-12-03	4	-349/+356
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes the memory leak in bec37c3fc766a7b97f8c52c181c325fd47b75259 and re-delivers the reverted patch. In this patch the DDG DAG is sorted topologically to put the nodes in the graph in the order that would satisfy all dependencies. This helps transformations that would like to generate code based on the DDG. Since the DDG is a DAG a reverse-post-order traversal would give us the topological ordering. This patch also sorts the basic blocks passed to the builder based on program order to ensure that the dependencies are computed in the correct direction. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tags: #llvm Differential Revision: https://reviews.llvm.org/D70609
*	Reland "b19ec1eb3d0c [BPI] Improve unreachable/ColdCall heurstics to handle ↵	Taewook Oh	2019-12-02	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \|	loops." Summary: b19ec1eb3d0c has been reverted because of the test failures with PowerPC targets. This patch addresses the issues from the previous commit. Test Plan: ninja check-all. Confirmed that CodeGen/PowerPC/pr36292.ll and CodeGen/PowerPC/sms-cpy-1.ll pass Subscribers: llvm-commits
*	Autogenerate test/Analysis/ValueTracking/non-negative-phi-bits.ll test	Roman Lebedev	2019-12-02	1	-1/+1
\| \| \| \|	Forgot to stage this change into 0f22e783a038b6983f0fe161eef6cf2add3a4156 commit.
*	[PowerPC] Separate Features that are known to be Power9 specific from Future CPU	Stefan Pintilie	2019-11-27	1	-0/+16
\| \| \| \| \| \| \| \|	The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466
*	Revert b19ec1eb3d0c	taewookoh	2019-11-27	2	-44/+0
\| \| \| \| \| \|	Summary: This reverts commit b19ec1eb3d0c as it fails powerpc tests Subscribers: llvm-commits
*	[x86] make SLM extract vector element more expensive than default	Sanjay Patel	2019-11-27	4	-255/+1197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
*	[BPI] Improve unreachable/ColdCall heurstics to handle loops.	Taewook Oh	2019-11-27	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While updatePostDominatedByUnreachable attemps to find basic blocks that are post-domianted by unreachable blocks, it currently cannot handle loops precisely, because it doesn't use the actual post dominator tree analysis but relies on heuristics of visiting basic blocks in post-order. More precisely, when the entire loop is post-dominated by the unreachable block, current algorithm fails to detect the entire loop as post-dominated by the unreachable because when the algorithm reaches to the loop latch it fails to tell all its successors (including the loop header) will "eventually" be post-domianted by the unreachable block, because the algorithm hasn't visited the loop header yet. This makes BPI for the loop latch to assume that loop backedges are taken with 100% of probability. And because of this, block frequency info sometimes marks virtually dead loops (which are post dominated by unreachable blocks) super hot, because 100% backedge-taken probability makes the loop iteration count the max value. updatePostDominatedByColdCall has the exact same problem as well. To address this problem, this patch makes PostDominatedByUnreachable/PostDominatedByColdCall to be computed with the actual post-dominator tree. Reviewers: skatkov, chandlerc, manmanren Reviewed By: skatkov Subscribers: manmanren, vsk, apilipenko, Carrot, qcolombet, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70104
*	[ConstFolding] move tests for copysign; NFC	Sanjay Patel	2019-11-26	1	-0/+53
\| \| \| \|	InstCombine doesn't have any transforms for copysign currently.
*	Revert "[DDG] Data Dependence Graph - Topological Sort"	Bardia Mahjour	2019-11-25	4	-356/+349
\| \| \| \| \| \|	Revert for now to look into the failures on x86 This reverts commit bec37c3fc766a7b97f8c52c181c325fd47b75259.
*	[DDG] Data Dependence Graph - Topological Sort	bmahjour	2019-11-25	4	-349/+356
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In this patch the DDG DAG is sorted topologically to put the nodes in the graph in the order that would satisfy all dependencies. This helps transformations that would like to generate code based on the DDG. Since the DDG is a DAG a reverse-post-order traversal would give us the topological ordering. This patch also sorts the basic blocks passed to the builder based on program order to ensure that the dependencies are computed in the correct direction. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tags: #llvm Differential Revision: https://reviews.llvm.org/D70609
*	[Tests] Autogenerate a bunch of SCEV trip count tests for readability. Will ↵	Philip Reames	2019-11-21	9	-260/+443
\| \| \| \|	likely merge some of these files soon.
*	[SCEV] Add a mode to skip classification when printing analysis	Philip Reames	2019-11-21	1	-113/+1
\| \| \| \|	For the various trip-count tests, the classification isn't useful and makes the auto-generated tests super verbose. By skipping it, we make the auto-gen tests closer to the manually written ones. Up next: auto-genning a bunch of the existings tests.
*	[SCEV] Be robust against IR generated by simple-loop-unswitch	Philip Reames	2019-11-21	1	-48/+64
\| \| \| \| \| \|	Simple loop unswitch likes to leave around unsimplified and/or/xors. SCEV today bails out on these idioms which is unfortunate in general, and specifically for the unswitch interaction. Differential Revision: https://reviews.llvm.org/D70459
*	[MemorySSA] Moving at the end often means before terminator.	Alina Sbirlea	2019-11-20	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Moving accesses in MemorySSA at InsertionPlace::End, when an instruction is moved into a block, almost always means insert at the end of the block, but before the block terminator. This matters when the block terminator is a MemoryAccess itself (an invoke), and the insertion must be done before the terminator for the update to be correct. Insert an additional position: InsertionPlace:BeforeTerminator and update current usages where this applies. Resolves PR44027.
*	[MemorySSA] Update analysis when the terminator is a memory instruction.	Alina Sbirlea	2019-11-20	1	-0/+63
\| \| \| \| \|	Update MemorySSA when moving the terminator instruction, as that may be a memory touching instruction. Resolves PR44029.
*	Precommit test showing oppurtunity when computing exit tests of unsimplified IR	Philip Reames	2019-11-19	1	-0/+461
\| \| \| \|	If we partially unswitch a loop, we leave around the (and i1 X, true) or (or i1 X, false) forms. At the moment, this inhibits SCEVs ability to compute trip counts, patch forthcoming.
*	AMDGPU: Split test functions to avoid dependency on subtarget	Matt Arsenault	2019-11-19	1	-57/+155
\| \| \| \| \|	Prepare this test for moving tthe denormal setting out of the subtarget features.
*	[ConstantFold] Handle identity folds at top of ConstantFoldBinaryInst	Florian Hahn	2019-11-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we miss folds with undef and identity values for binary ops that do not fold to undef in general. We can generalize the identity simplifications and do them before checking for undef in particular. Alive checks: * OR - https://rise4fun.com/Alive/8OsK * AND - https://rise4fun.com/Alive/e3tE This will also allow us to remove some now redundant cases throughout the function, but I would like to do this as follow-up. That should make tracking down potential issues easier. Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70169
*	[ConstantFold] Add some tests for binops with constants and undefs.	Florian Hahn	2019-11-17	1	-0/+50
\| \| \| \|	Precommit tests for D70169.
*	[LoopCacheAnalysis]: Fix assertion failure during cost computation	Rachel Craik	2019-11-15	1	-0/+35
\| \| \| \| \| \| \| \|	Ensure the stride and trip count have the same type before multiplying them during reference cost calculation Reviewed By: jdoefert Differential Revision: https://reviews.llvm.org/D70192
*	Remove commented out CHECK-NEXT to try and appease ↵	Simon Pilgrim	2019-11-13	1	-1/+0
\| \| \| \|	llvm-clang-x86_64-expensive-checks-win buildbot
*	[X86] Remove setOperationAction for FP_TO_SINT v8i16.	Craig Topper	2019-11-12	1	-12/+12
\| \| \| \| \| \| \| \|	This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
*	[GlobalsAA] Reenable test.	Alina Sbirlea	2019-11-12	1	-3/+1
\|
*	Temporarily disable test.	Alina Sbirlea	2019-11-12	1	-0/+2
\|
*	[GlobalsAA] Restrict ModRef result if any internal method has its address taken.	Alina Sbirlea	2019-11-12	3	-0/+147
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If there are any internal methods whose address was taken, conclude there is nothing known in relation of any other internal method and a global. Reviewers: nlopes, sanjoy.google Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69690
*	[DDG] Data Dependence Graph - Pi Block	bmahjour	2019-11-08	4	-301/+342
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG nodes that are part of a strongly-connected component of the graph. Replacing all the SCCs with pi-blocks results in an acyclic representation of the DDG. For example if we have: {a -> b}, {b -> c, d}, {c -> a} the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows: {p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a} In this implementation the edges between nodes that are part of the pi-block are preserved. The crossing edges (edges where one end of the edge is in the set of nodes belonging to an SCC and the other end is outside that set) are replaced with corresponding edges to/from the pi-block node instead. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D68827
*	[CostModel] Fixed isExtractSubvectorMask for undef index off end	Tim Renouf	2019-11-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf