bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of ↵	Dmitri Gribenko	2019-09-10	1	-0/+218
\| \| \| \| \| \| \| \| \|	CodeGen into opt pipeline."" This reverts commit r371502, it broke tests (clang/test/CodeGenCXX/auto-var-init.cpp). llvm-svn: 371507
*	Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into ↵	Clement Courbet	2019-09-10	1	-218/+0
\| \| \| \| \| \| \| \|	opt pipeline." With a fix for sanitizer breakage (see explanation in D60318). llvm-svn: 371502
*	[PowerPC][NFC] Avoid checking non-relevant .cfi instructions	Jinsong Ji	2019-08-30	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is brought up in https://reviews.llvm.org/D64662?id=209923#inline-599490 CFI information are non-relevant to quite some testcases, we should get rid of checking them when its unecessary. This patch avoid generating cfi info in testcases that are not testing prolog/epilog or exception handling. Reviewers: kbarton, hfinkel, nemanjai, #powerpc Reviewed By: hfinkel Subscribers: MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67016 llvm-svn: 370505
*	Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into ↵	Clement Courbet	2019-06-26	1	-0/+220
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416
*	[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.	Clement Courbet	2019-06-26	1	-220/+0
\| \| \| \| \| \| \| \| \|	This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412
*	[DAGCombiner] form 'not' ops ahead of shifts (PR39657)	Sanjay Patel	2018-11-22	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
*	[PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler	Stefan Pintilie	2018-07-04	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	We want to run the Machine Scheduler instead of the List Scheduler after RA. Checked with a performance run on a Power 9 machine with SPEC 2006 and while some benchmarks improved and others degraded the geomean was slightly improved with the Machine Scheduler. Differential Revision: https://reviews.llvm.org/D45265 llvm-svn: 336295
*	Generalize MergeBlockIntoPredecessor. Replace uses of ↵	Alina Sbirlea	2018-06-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183
*	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih	2017-12-04	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
*	[PowerPC] Recommit r314244 with refactoring and off by default	Nemanja Ivanovic	2017-11-30	1	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-commits everything that was pulled in r314244. The transformation is off by default (patch to enable it to follow). The code is refactored to have a single entry-point and provide fine-grained control over patterns that it selects. This patch also fixes the bugs in the original code. Everything that failed with the original patch has been re-tested with this patch (with the transformation turned on). So the patch to turn this on is soon to follow. Differential Revision: https://reviews.llvm.org/D38575 llvm-svn: 319434
*	[PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended	Hiroshi Inoue	2017-10-16	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass. If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated. One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated. void int_func(int); void ii_test(int a) { if (a & 1) return int_func(a); } Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG. Differential Revision: https://reviews.llvm.org/D31319 llvm-svn: 315888
*	[MC] Suppress .Lcfi labels when emitting textual assembly	Reid Kleckner	2017-10-10	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - \| llvm-mc \| llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259
*	[PowerPC] Reverting sequence of patches for elimination of comparison ↵	Nemanja Ivanovic	2017-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions In the past while, I've committed a number of patches in the PowerPC back end aimed at eliminating comparison instructions. However, this causes some failures in proprietary source and these issues are not observed in SPEC or any open source packages I've been able to run. As a result, I'm pulling the entire series and will refactor it to: - Have a single entry point for easy control - Have fine-grained control over which patterns we transform A side-effect of this is that test cases for these patches (and modified by them) are XFAIL-ed. This is a temporary measure as it is counter-productive to remove/modify these test cases and then have to modify them again when the refactored patch is recommitted. The failure will be investigated in parallel to the refactoring effort and the recommit will either have a fix for it or will leave this transformation off by default until the problem is resolved. llvm-svn: 314244
*	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT	Nemanja Ivanovic	2017-09-23	1	-4/+3
\| \| \| \| \| \| \| \|	As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314055
*	[CGP] eliminate a sub instruction in memcmp expansion	Sanjay Patel	2017-06-27	1	-18/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471
*	[CGP, PowerPC] try to constant fold before creating loads for memcmp expansion	Sanjay Patel	2017-06-19	1	-25/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the last step needed to avoid regressions for x86 before we flip the switch to allow expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings, so we need to do that here too. FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1 constant tests because its alignment requirements are too strict (shouldn't require alignment for a constant operand). Differential Revision: https://reviews.llvm.org/D34071 llvm-svn: 305734
*	[PowerPC] add memcmp test with one constant operand and equality cmp; NFC	Sanjay Patel	2017-06-09	1	-3/+29
\| \| \| \|	llvm-svn: 305131
*	[CGP] don't expand a memcmp with nobuiltin attribute	Sanjay Patel	2017-06-08	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007
*	[PowerPC] add memcmp test with nobuiltin attr; NFC	Sanjay Patel	2017-06-08	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	In SDAG, we don't expand libcalls with a nobuiltin attribute. It's not clear if that's correct from the existing code comment: "Don't do the check if marked as nobuiltin for some reason." ...adding a test here either way to show that there is currently a different behavior implemented in the CGP-based expansion. llvm-svn: 304991
*	[CGP / PowerPC] avoid multi-block overhead for simple memcmp expansion	Sanjay Patel	2017-06-08	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test diff for PowerPC shows we can better optimize if this case is one block. For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this: _cmp_eq8: movq (%rdi), %rax cmpq (%rsi), %rax je LBB23_1 ## BB#2: ## %res_block movl $1, %ecx jmp LBB23_3 LBB23_1: xorl %ecx, %ecx LBB23_3: ## %endblock xorl %eax, %eax testl %ecx, %ecx sete %al retq We get this: cmp_eq8: movq (%rdi), %rcx xorl %eax, %eax cmpq (%rsi), %rcx sete %al retq And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally. Differential Revision: https://reviews.llvm.org/D34005 llvm-svn: 304987
*	[CGP] avoid zext/trunc of a memcmp expansion compare	Sanjay Patel	2017-06-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This could be viewed as another shortcoming of the DAGCombiner: when both operands of a compare are zexted from the same source type, we should be able to compare the original types. The effect on PowerPC perf is likely unnoticeable, but there's a visible regression for x86 if we feed the suboptimal IR for memcmp expansion to the DAG: _cmp_eq4_zexted_to_i64: movl (%rdi), %ecx movl (%rsi), %edx xorl %eax, %eax cmpq %rdx, %rcx sete %al _cmp_eq4_better: movl (%rdi), %ecx xorl %eax, %eax cmpl (%rsi), %ecx sete %al llvm-svn: 304923
*	[CGP / PowerPC] use direct compares if there's only one load per block in ↵	Sanjay Patel	2017-06-07	1	-20/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memcmp() expansion I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle. I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time limitation in the DAG. We might have to just avoid those special-cases here in CGP. But regardless of that, I think this is a win for the more general cases. http://rise4fun.com/Alive/cbQ Differential Revision: https://reviews.llvm.org/D33963 llvm-svn: 304849
*	[PowerPC] auto-generate full checks and increase test coverage	Sanjay Patel	2017-06-06	1	-77/+160
\| \| \| \| \| \| \|	3 of the tests were testing exactly the same thing: memcmp(x, y, 16) != 0. I changed that to test 4, 7, and 16 bytes, so we can see how those differ. llvm-svn: 304838
*	[PPC] Inline expansion of memcmp	Zaara Syeda	2017-05-31	1	-0/+121
	This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313