bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add support for TFE/LWE in image intrinsics	David Stuttard	2018-11-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
*	Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply ↵	Martin Storsjo	2018-11-29	2	-1362/+9
\| \| \| \| \| \| \| \| \| \| \|	r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867
*	[CVP] auto-generate complete test checks; NFC	Sanjay Patel	2018-11-29	5	-289/+1009
\| \| \| \|	llvm-svn: 347866
*	[NFC] Add two XFAIL tests from PR39783	Max Kazantsev	2018-11-29	2	-0/+279
\| \| \| \|	llvm-svn: 347845
*	[LoopStrengthReduce] ComplexityLimit as an option	Sam Parker	2018-11-29	2	-0/+120
\| \| \| \| \| \| \| \|	Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843
*	[Inliner] Modify the merging of min-legal-vector-width attribute to better ↵	Craig Topper	2018-11-29	1	-1/+16
\| \| \| \| \| \| \| \| \| \|	handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute. llvm-svn: 347841
*	[Inliner] Add test for merging of min-legal-vector-width function attribute.	Craig Topper	2018-11-29	1	-0/+29
\| \| \| \| \| \|	This should have been added in r337844, but apparently was I failed to 'git add' the file. llvm-svn: 347840
*	[DebugInfo] IR/Bitcode changes for DISubprogram flags.	Paul Robinson	2018-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \|	Packing the flags into one bitcode word will save effort in adding new flags in the future. Differential Revision: https://reviews.llvm.org/D54755 llvm-svn: 347806
*	[DebugInfo] Give inlinable calls DILocs (PR39807)	Jeremy Morse	2018-11-28	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782
*	[LICM] Enable control flow hoisting by default	John Brawn	2018-11-28	2	-11/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778
*	[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix	John Brawn	2018-11-28	1	-0/+1351
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776
*	[InstCombine] Combine saturating add/sub with constant operands	Nikita Popov	2018-11-28	1	-52/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
*	[InstCombine] Canonicalize ssub.sat to sadd.sat	Nikita Popov	2018-11-28	1	-28/+28
\| \| \| \| \| \| \| \| \|	Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
*	[ValueTracking] Determine always-overflow condition for unsigned sub	Nikita Popov	2018-11-28	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771
*	[InstCombine] Use known overflow information for saturating add/sub	Nikita Popov	2018-11-28	1	-18/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770
*	[InstCombine] Canonicalize const arg for saturating adds	Nikita Popov	2018-11-28	1	-6/+6
\| \| \| \| \| \| \| \| \|	If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769
*	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.	Alexey Bataev	2018-11-28	1	-0/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
*	[InstCombine] Add tests for saturating add/sub; NFC	Nikita Popov	2018-11-27	1	-0/+669
\| \| \| \| \| \|	These are baseline tests for D54534. llvm-svn: 347700
*	[PartialInliner] Make PHIs free in cost computation.	Florian Hahn	2018-11-27	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D54748 llvm-svn: 347683
*	Add missing REQUIRES: asserts	Max Kazantsev	2018-11-27	1	-0/+1
\| \| \| \|	llvm-svn: 347644
*	[LoopSimplifyCFG] Fix corner case with duplicating successors	Max Kazantsev	2018-11-27	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \|	It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: https://reviews.llvm.org/D54849 Reviewed By: anna llvm-svn: 347640
*	[InstCombine] add tests for rotate/bswap equality; NFC	Sanjay Patel	2018-11-27	1	-0/+23
\| \| \| \|	llvm-svn: 347618
*	[ICP] Remove incompatible attributes at indirect-call promoted callsites.	Xin Tong	2018-11-26	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605
*	Revert "[TTI] Reduction costs only need to include a single extract element ↵	Fedor Sergeev	2018-11-26	3	-133/+282
\| \| \| \| \| \| \| \| \| \|	cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
*	[IPSCCP] Use input operand instead of OriginalOp for ssa_copy.	Florian Hahn	2018-11-25	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \|	OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524
*	[InstCombine] Determine demanded and known bits for funnel shifts	Nikita Popov	2018-11-24	1	-19/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515
*	Revert unapproved commit	Joel Jones	2018-11-24	1	-1077/+0
\| \| \| \|	llvm-svn: 347511
*	[AArch64] Enable libm vectorized functions via SLEEF	Joel Jones	2018-11-24	1	-0/+1077
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 llvm-svn: 347510
*	[InstCombine] Simplify funnel shift with zero/undef operand to shift	Nikita Popov	2018-11-23	1	-9/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505
*	[NFC] Add test that demonstrates buggy behavior on term folding of ↵	Max Kazantsev	2018-11-23	1	-0/+29
\| \| \| \| \| \|	LoopSimplifyCFG llvm-svn: 347488
*	Disable LoopSimplifyCFG terminator folding by default	Max Kazantsev	2018-11-23	2	-6/+6
\| \| \| \|	llvm-svn: 347486
*	[LoopSimplifyCFG] Don't delete LCSSA Phis	Max Kazantsev	2018-11-23	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \|	When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484
*	[NFC] Add verification flags to tests	Max Kazantsev	2018-11-23	1	-3/+3
\| \| \| \|	llvm-svn: 347483
*	[InstCombine] Add tests for funnel shift with zero operand; NFC	Nikita Popov	2018-11-21	1	-0/+36
\| \| \| \| \| \|	These are additional baseline tests for D54778. llvm-svn: 347414
*	[MergeFuncs] Generate alias instead of thunk if possible	Nikita Popov	2018-11-21	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision: https://reviews.llvm.org/D53285 llvm-svn: 347407
*	[PM] Port Scalarizer to the new pass manager.	Mikael Holmen	2018-11-21	10	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by: markus (Markus Lavin) Reviewers: chandlerc, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: llvm-commits, Ka-Ka, bjope Differential Revision: https://reviews.llvm.org/D54695 llvm-svn: 347392
*	[NFC] More complex tests for LoopSimplifyCFG	Max Kazantsev	2018-11-21	1	-0/+345
\| \| \| \|	llvm-svn: 347384
*	[NFC] Add some sophisticated tests on LoopSimplifyCFG	Max Kazantsev	2018-11-21	1	-47/+724
\| \| \| \|	llvm-svn: 347381
*	[LVI] run transfer function for binary operator even when the RHS isn't a ↵	John Regehr	2018-11-21	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \|	constant LVI was symbolically executing binary operators only when the RHS was constant, missing the case where we have a ConstantRange for the RHS, but not an actual constant. Tested using check-all and by bootstrapping. Compile time is not impacted measurably. Differential Revision: https://reviews.llvm.org/D19859 llvm-svn: 347379
*	[InstCombine] add tests for funnel shifts; NFC	Sanjay Patel	2018-11-20	1	-0/+177
\| \| \| \| \| \| \| \|	These are included in D54666, so adding them first with baseline results. Patch by: @nikic (Nikita Popov) llvm-svn: 347333
*	[InstSimplify] fold funnel shifts with undef operands	Sanjay Patel	2018-11-20	1	-8/+4
\| \| \| \| \| \| \| \|	Splitting these off from the D54666. Patch by: nikic (Nikita Popov) llvm-svn: 347332
*	[InstSimplify] add tests for funnel shift with undef operands; NFC	Sanjay Patel	2018-11-20	1	-0/+40
\| \| \| \| \| \| \| \| \|	These are part of D54666, so adding them here before the patch to show the baseline (currently unoptimized) results. Patch by: @nikic (Nikita Popov) llvm-svn: 347331
*	[InstructionSimplify] Add support for saturating add/sub	Sanjay Patel	2018-11-20	1	-82/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported: sat(X + 0) -> X sat(X + undef) -> -1 sat(X uadd MAX) -> MAX (and commutative variants) sat(X - 0) -> X sat(X - X) -> 0 sat(X - undef) -> 0 sat(undef - X) -> 0 sat(0 usub X) -> 0 sat(X usub MAX) -> 0 Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54532 llvm-svn: 347330
*	[LoopSink] Add preheader to alias set	Guozhi Wei	2018-11-20	1	-0/+37
\| \| \| \| \| \| \| \| \| \|	This patch fixes PR39695. The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. Differential Revision: https://reviews.llvm.org/D54659 llvm-svn: 347325
*	[PatternMatch] Handle undef vectors consistently	Sanjay Patel	2018-11-20	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the issue noticed in D54532. The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match scalar undefs (as expected), but do match vector undefs. This may lead to optimization inconsistencies in rare cases. There is only one existing test for which output changes, reverting the change from D53205. The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I think that the new output is technically worse than the previous one, it is consistent with scalar, and I don't think it's really important either way (generally that undef should have been folded away prior to reassociation.) I've also added another test case for this issue based on InstructionSimplify. It took some effort to find that one, as in most cases undef folds are either checked first -- and in the cases where they aren't it usually happens to not make a difference in the end. This is the only case I was able to come up with. Prior to this patch the test case simplified to undef in the scalar case, but zeroinitializer in the vector case. Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54631 llvm-svn: 347318
*	Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches ↵	Max Kazantsev	2018-11-20	1	-9/+149
\| \| \| \| \| \| \| \| \| \| \|	and switches" The initial version of patch lacked Phi nodes updates in destinations of removed edges. This version contains this update and tests on this situation. Differential Revision: https://reviews.llvm.org/D54021 llvm-svn: 347289
*	Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches ↵	Benjamin Kramer	2018-11-19	1	-49/+9
\| \| \| \| \| \| \| \|	and switches" This reverts commits r347183 & r347184. Crashes while building libxml. llvm-svn: 347260
*	[InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi	Vedant Kumar	2018-11-19	1	-0/+26
\| \| \| \| \| \| \| \| \|	Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi improves backtrace quality. Fixes llvm.org/PR38083. llvm-svn: 347257
*	Revert "[LICM] Make LICM able to hoist phis"	Benjamin Kramer	2018-11-19	2	-1175/+9
\| \| \| \| \| \|	This reverts commit r347190. llvm-svn: 347225
*	[LV] Avoid vectorizing unsafe dependencies in uniform address	Anna Thomas	2018-11-19	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220