bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Added a test that demonstrates a ug in Scatter scheduling.	Elena Demikhovsky	2017-09-10	1	-0/+23
\| \| \| \| \| \| \|	The bug is going to be fixed in an upcomming patch. llvm-svn: 312883
*	[X86] Add v2i4 store test case (PR20012)	Simon Pilgrim	2017-09-09	1	-0/+17
\| \| \| \|	llvm-svn: 312874
*	[X86] Add v2i2 test case (PR20011)	Simon Pilgrim	2017-09-09	1	-0/+33
\| \| \| \|	llvm-svn: 312873
*	[X86][FMA] Regenerate FMA tests	Simon Pilgrim	2017-09-09	4	-715/+910
\| \| \| \|	llvm-svn: 312871
*	Merge isKnownNonNull into isKnownNonZero	Nuno Lopes	2017-09-09	9	-16/+27
\| \| \| \| \| \| \| \| \|	It now knows the tricks of both functions. Also, fix a bug that considered allocas of non-zero address space to be always non null Differential Revision: https://reviews.llvm.org/D37628 llvm-svn: 312869
*	[X86][SSE] i32 vector multiplications test cases from PR6399	Simon Pilgrim	2017-09-09	1	-0/+472
\| \| \| \|	llvm-svn: 312868
*	[X86][MOVBE] Fix typo in MOVBE scheduling test names	Simon Pilgrim	2017-09-09	1	-21/+21
\| \| \| \| \| \|	Copy+paste is not your friend llvm-svn: 312867
*	[X86] Don't disable slow INC/DEC if optimizing for size	Craig Topper	2017-09-09	1	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866
*	[DivRemPairs] split tests per target to account for bots that don't build ↵	Sanjay Patel	2017-09-09	5	-364/+606
\| \| \| \| \| \|	for all targets llvm-svn: 312863
*	[DivRempairs] add a pass to optimize div/rem pairs (PR31028)	Sanjay Patel	2017-09-09	3	-0/+366
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862
*	PPC: Don't select lxv/stxv for insufficiently aligned stack slots.	Kyle Butt	2017-09-09	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lxv/stxv instructions require an offset that is 0 % 16. Previously we were selecting lxv/stxv for loads and stores to the stack where the offset from the slot was a multiple of 16, but the stack slot was not 16 or more byte aligned. When the frame gets lowered these transform to r(1\|31) + slot + offset. If slot is not aligned, slot + offset may not be 0 % 16. Now we require 16 byte or more alignment for select lxv/stxv to stack slots. Includes a testcase that shows both sufficiently and insufficiently aligned stack slots. llvm-svn: 312843
*	bpf: fix test failures due to previous bpf change of assembly code syntax	Yonghong Song	2017-09-09	5	-6/+6
\| \| \| \| \|	Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 312840
*	[TargetTransformInfo] Add a new public interface getInstructionCost	Guozhi Wei	2017-09-08	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832
*	Migrate llvm-symbolizer tests to not use %T	David Blaikie	2017-09-08	14	-28/+69
\| \| \| \| \| \|	(context around the %T removal here: https://reviews.llvm.org/D35396 ) llvm-svn: 312828
*	[llvm-cov] Use portable output redirection in a test	Vedant Kumar	2017-09-08	1	-1/+1
\| \| \| \| \| \|	A follow-up to a test fix (r312825). llvm-svn: 312826
*	[llvm-cov] Try to appease a Windows bot	Vedant Kumar	2017-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a Windows bot, I see a FileCheck error where the source being matched over no longer exists, i.e it seems like it's FileCheck'ing some stale output: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/4747 You can see "// CHECK: [[@LINE]]\|{{ +}Marker at 19:3 = 1" in the FileCheck stderr, but that CHECK line doesn't exist. Remove the input file to FileCheck before running the test, to try and appease the bot. llvm-svn: 312825
*	[llvm-cov] Disable name-compression in a test binary	Vedant Kumar	2017-09-08	1	-0/+0
\| \| \| \| \| \| \| \| \| \|	This should fix the lld bot: The Buildbot has detected a new failure on builder llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast while building cfe. Full details are available at: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/16993 llvm-svn: 312821
*	AMDGPU: Recompute scc liveness	Matt Arsenault	2017-09-08	1	-0/+60
\| \| \| \| \| \| \| \|	The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819
*	[Coverage] Build sorted and unique segments	Vedant Kumar	2017-09-08	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A coverage segment contains a starting line and column, an execution count, and some other metadata. Clients of the coverage library use segments to prepare line-oriented reports. Users of the coverage library depend on segments being unique and sorted in source order. Currently this is not guaranteed (this is why the clang change which introduced deferred regions was reverted). This commit documents the "unique and sorted" condition and asserts that it holds. It also fixes the SegmentBuilder so that it produces correct output in some edge cases. Testing: I've added unit tests for some edge cases. I've also checked that the new SegmentBuilder implementation is fully covered. Apart from running check-profile and the llvm-cov tests, I've successfully used a stage1 llvm-cov to prepare a coverage report for an instrumented clang binary. Differential Revision: https://reviews.llvm.org/D36813 llvm-svn: 312817
*	[llvm-cov] Fix a lifetime issue	Vedant Kumar	2017-09-08	1	-0/+4
\| \| \| \| \| \| \|	This fixes an issue where a std::string was moved to a constructor which accepted a StringRef. llvm-svn: 312816
*	[Coverage] Report errors when reading malformed source regions	Vedant Kumar	2017-09-08	5	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Each source region has a start and end location. Report an error when the end location does not precede the begin location. The old lineExecutionCounts.covmapping test actually had a buggy source region in it. This commit introduces a regenerated copy of the coverage and moves the old copy to malformedRegions.covmapping, for a test. Differential Revision: https://reviews.llvm.org/D37387 llvm-svn: 312814
*	[llvm-cov] Unify region marker placement between text/html modes	Vedant Kumar	2017-09-08	2	-7/+50
\| \| \| \| \| \| \| \| \| \| \| \|	Make sure that the text and html emitters always emit the same set of region markers, and avoid emitting redundant markers for line segments which don't end on the line they start on. This is related to D35925, and depends on D36014 Differential Revision: https://reviews.llvm.org/D36020 llvm-svn: 312813
*	[X86] Simplify the slow-incdec test and add test cases with optsize.	Craig Topper	2017-09-08	1	-73/+60
\| \| \| \| \| \|	I think we want to consider using inc/dec with optsize. llvm-svn: 312804
*	Fix a bug for rL312641.	Wei Mi	2017-09-08	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \|	rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. However on arm-none-eabi platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't have return value. The fix is to check the libcall name after expansion to match "memcpy/memset/memmove" before allowing those intrinsic to be tail calls. llvm-svn: 312799
*	Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits	Krzysztof Parzyszek	2017-09-08	1	-0/+37
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37600 llvm-svn: 312797
*	[SLP] Support for horizontal min/max reduction.	Alexey Bataev	2017-09-08	2	-1181/+390
\| \| \| \| \| \| \| \| \| \| \| \| \|	SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791
*	[X86] Added PR31045 test case	Simon Pilgrim	2017-09-08	1	-0/+89
\| \| \| \| \| \|	Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633 llvm-svn: 312786
*	Re-enable "[IRCE] Identify loops with latch comparison against current IV value"	Max Kazantsev	2017-09-08	1	-0/+245
\| \| \| \| \| \| \| \|	Re-applying after the found bug was fixed. Differential Revision: https://reviews.llvm.org/D36215 llvm-svn: 312783
*	[X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum ↵	Jatin Bhateja	2017-09-08	1	-0/+40
\| \| \| \| \| \| \| \|	and maxnum' Differential Revision: https://reviews.llvm.org/D37614 llvm-svn: 312778
*	diff --git a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp ↵	Max Kazantsev	2017-09-08	1	-182/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp index f72a808..9fa49fd 100644 --- a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp +++ b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp @@ -450,20 +450,10 @@ struct LoopStructure { // equivalent to: // // intN_ty inc = IndVarIncreasing ? 1 : -1; - // pred_ty predicate = IndVarIncreasing - // ? IsSignedPredicate ? ICMP_SLT : ICMP_ULT - // : IsSignedPredicate ? ICMP_SGT : ICMP_UGT; + // pred_ty predicate = IndVarIncreasing ? ICMP_SLT : ICMP_SGT; // - // - // for (intN_ty iv = IndVarStart; predicate(IndVarBase, LoopExitAt); - // iv = IndVarNext) + // for (intN_ty iv = IndVarStart; predicate(iv, LoopExitAt); iv = IndVarBase) // ... body ... - // - // Here IndVarBase is either current or next value of the induction variable. - // in the former case, IsIndVarNext = false and IndVarBase points to the - // Phi node of the induction variable. Otherwise, IsIndVarNext = true and - // IndVarBase points to IV increment instruction. - // Value IndVarBase; Value IndVarStart; @@ -471,13 +461,12 @@ struct LoopStructure { Value LoopExitAt; bool IndVarIncreasing; bool IsSignedPredicate; - bool IsIndVarNext; LoopStructure() : Tag(""), Header(nullptr), Latch(nullptr), LatchBr(nullptr), LatchExit(nullptr), LatchBrExitIdx(-1), IndVarBase(nullptr), IndVarStart(nullptr), IndVarStep(nullptr), LoopExitAt(nullptr), - IndVarIncreasing(false), IsSignedPredicate(true), IsIndVarNext(false) {} + IndVarIncreasing(false), IsSignedPredicate(true) {} template <typename M> LoopStructure map(M Map) const { LoopStructure Result; @@ -493,7 +482,6 @@ struct LoopStructure { Result.LoopExitAt = Map(LoopExitAt); Result.IndVarIncreasing = IndVarIncreasing; Result.IsSignedPredicate = IsSignedPredicate; - Result.IsIndVarNext = IsIndVarNext; return Result; } @@ -841,42 +829,21 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE, return false; }; - // `ICI` can either be a comparison against IV or a comparison of IV.next. - // Depending on the interpretation, we calculate the start value differently. + // `ICI` is interpreted as taking the backedge if the next* value of the + // induction variable satisfies some constraint. - // Pair {IndVarBase; IsIndVarNext} semantically designates whether the latch - // comparisons happens against the IV before or after its value is - // incremented. Two valid combinations for them are: - // - // 1) { phi [ iv.start, preheader ], [ iv.next, latch ]; false }, - // 2) { iv.next; true }. - // - // The latch comparison happens against IndVarBase which can be either current - // or next value of the induction variable. const SCEVAddRecExpr IndVarBase = cast<SCEVAddRecExpr>(LeftSCEV); bool IsIncreasing = false; bool IsSignedPredicate = true; - bool IsIndVarNext = false; ConstantInt StepCI; if (!IsInductionVar(IndVarBase, IsIncreasing, StepCI)) { FailureReason = "LHS in icmp not induction variable"; return None; } - const SCEV IndVarStart = nullptr; - // TODO: Currently we only handle comparison against IV, but we can extend - // this analysis to be able to deal with comparison against sext(iv) and such. - if (isa<PHINode>(LeftValue) && - cast<PHINode>(LeftValue)->getParent() == Header) - // The comparison is made against current IV value. - IndVarStart = IndVarBase->getStart(); - else { - // Assume that the comparison is made against next IV value. - const SCEV StartNext = IndVarBase->getStart(); - const SCEV Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE)); - IndVarStart = SE.getAddExpr(StartNext, Addend); - IsIndVarNext = true; - } + const SCEV StartNext = IndVarBase->getStart(); + const SCEV Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE)); + const SCEV IndVarStart = SE.getAddExpr(StartNext, Addend); const SCEV Step = SE.getSCEV(StepCI); ConstantInt One = ConstantInt::get(IndVarTy, 1); @@ -1060,7 +1027,6 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE, Result.IndVarIncreasing = IsIncreasing; Result.LoopExitAt = RightValue; Result.IsSignedPredicate = IsSignedPredicate; - Result.IsIndVarNext = IsIndVarNext; FailureReason = nullptr; @@ -1350,9 +1316,8 @@ LoopConstrainer::RewrittenRangeInfo LoopConstrainer::changeIterationSpaceEnd( BranchToContinuation); NewPHI->addIncoming(PN->getIncomingValueForBlock(Preheader), Preheader); - auto FixupValue = - LS.IsIndVarNext ? PN->getIncomingValueForBlock(LS.Latch) : PN; - NewPHI->addIncoming(FixupValue, RRI.ExitSelector); + NewPHI->addIncoming(PN->getIncomingValueForBlock(LS.Latch), + RRI.ExitSelector); RRI.PHIValuesAtPseudoExit.push_back(NewPHI); } @@ -1735,10 +1700,7 @@ bool InductiveRangeCheckElimination::runOnLoop(Loop L, LPPassManager &LPM) { } LoopStructure LS = MaybeLoopStructure.getValue(); const SCEVAddRecExpr IndVar = - cast<SCEVAddRecExpr>(SE.getSCEV(LS.IndVarBase)); - if (LS.IsIndVarNext) - IndVar = cast<SCEVAddRecExpr>(SE.getMinusSCEV(IndVar, - SE.getSCEV(LS.IndVarStep))); + cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep))); Optional<InductiveRangeCheck::Range> SafeIterRange; Instruction ExprInsertPt = Preheader->getTerminator(); diff --git a/test/Transforms/IRCE/latch-comparison-against-current-value.ll b/test/Transforms/IRCE/latch-comparison-against-current-value.ll deleted file mode 100644 index afea0e6..0000000 --- a/test/Transforms/IRCE/latch-comparison-against-current-value.ll +++ /dev/null @@ -1,182 +0,0 @@ -; RUN: opt -verify-loop-info -irce-print-changed-loops -irce -S < %s 2>&1 \| FileCheck %s - -; Check that IRCE is able to deal with loops where the latch comparison is -; done against current value of the IV, not the IV.next. - -; CHECK: irce: in function test_01: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting> -; CHECK: irce: in function test_02: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting> -; CHECK-NOT: irce: in function test_03: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting> -; CHECK-NOT: irce: in function test_04: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting> - -; SLT condition for increasing loop from 0 to 100. -define void @test_01(i32* %arr, i32* %a_len_ptr) #0 { - -; CHECK: test_01 -; CHECK: entry: -; CHECK-NEXT: %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0 -; CHECK-NEXT: [[COND2:%[^ ]+]] = icmp slt i32 0, %exit.mainloop.at -; CHECK-NEXT: br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit -; CHECK: loop: -; CHECK-NEXT: %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ] -; CHECK-NEXT: %idx.next = add nuw nsw i32 %idx, 1 -; CHECK-NEXT: %abc = icmp slt i32 %idx, %exit.mainloop.at -; CHECK-NEXT: br i1 true, label %in.bounds, label %out.of.bounds.loopexit1 -; CHECK: in.bounds: -; CHECK-NEXT: %addr = getelementptr i32, i32* %arr, i32 %idx -; CHECK-NEXT: store i32 0, i32* %addr -; CHECK-NEXT: %next = icmp slt i32 %idx, 100 -; CHECK-NEXT: [[COND3:%[^ ]+]] = icmp slt i32 %idx, %exit.mainloop.at -; CHECK-NEXT: br i1 [[COND3]], label %loop, label %main.exit.selector -; CHECK: main.exit.selector: -; CHECK-NEXT: %idx.lcssa = phi i32 [ %idx, %in.bounds ] -; CHECK-NEXT: [[COND4:%[^ ]+]] = icmp slt i32 %idx.lcssa, 100 -; CHECK-NEXT: br i1 [[COND4]], label %main.pseudo.exit, label %exit -; CHECK-NOT: loop.preloop: -; CHECK: loop.postloop: -; CHECK-NEXT: %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ] -; CHECK-NEXT: %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1 -; CHECK-NEXT: %abc.postloop = icmp slt i32 %idx.postloop, %exit.mainloop.at -; CHECK-NEXT: br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit - -entry: - %len = load i32, i32* %a_len_ptr, !range !0 - br label %loop - -loop: - %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ] - %idx.next = add nsw nuw i32 %idx, 1 - %abc = icmp slt i32 %idx, %len - br i1 %abc, label %in.bounds, label %out.of.bounds - -in.bounds: - %addr = getelementptr i32, i32* %arr, i32 %idx - store i32 0, i32* %addr - %next = icmp slt i32 %idx, 100 - br i1 %next, label %loop, label %exit - -out.of.bounds: - ret void - -exit: - ret void -} - -; ULT condition for increasing loop from 0 to 100. -define void @test_02(i32* %arr, i32* %a_len_ptr) #0 { - -; CHECK: test_02 -; CHECK: entry: -; CHECK-NEXT: %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0 -; CHECK-NEXT: [[COND2:%[^ ]+]] = icmp ult i32 0, %exit.mainloop.at -; CHECK-NEXT: br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit -; CHECK: loop: -; CHECK-NEXT: %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ] -; CHECK-NEXT: %idx.next = add nuw nsw i32 %idx, 1 -; CHECK-NEXT: %abc = icmp ult i32 %idx, %exit.mainloop.at -; CHECK-NEXT: br i1 true, label %in.bounds, label %out.of.bounds.loopexit1 -; CHECK: in.bounds: -; CHECK-NEXT: %addr = getelementptr i32, i32* %arr, i32 %idx -; CHECK-NEXT: store i32 0, i32* %addr -; CHECK-NEXT: %next = icmp ult i32 %idx, 100 -; CHECK-NEXT: [[COND3:%[^ ]+]] = icmp ult i32 %idx, %exit.mainloop.at -; CHECK-NEXT: br i1 [[COND3]], label %loop, label %main.exit.selector -; CHECK: main.exit.selector: -; CHECK-NEXT: %idx.lcssa = phi i32 [ %idx, %in.bounds ] -; CHECK-NEXT: [[COND4:%[^ ]+]] = icmp ult i32 %idx.lcssa, 100 -; CHECK-NEXT: br i1 [[COND4]], label %main.pseudo.exit, label %exit -; CHECK-NOT: loop.preloop: -; CHECK: loop.postloop: -; CHECK-NEXT: %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ] -; CHECK-NEXT: %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1 -; CHECK-NEXT: %abc.postloop = icmp ult i32 %idx.postloop, %exit.mainloop.at -; CHECK-NEXT: br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit - -entry: - %len = load i32, i32* %a_len_ptr, !range !0 - br label %loop - -loop: - %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ] - %idx.next = add nsw nuw i32 %idx, 1 - %abc = icmp ult i32 %idx, %len - br i1 %abc, label %in.bounds, label %out.of.bounds - -in.bounds: - %addr = getelementptr i32, i32* %arr, i32 %idx - store i32 0, i32* %addr - %next = icmp ult i32 %idx, 100 - br i1 %next, label %loop, label %exit - -out.of.bounds: - ret void - -exit: - ret void -} - -; Same as test_01, but comparison happens against IV extended to a wider type. -; This test ensures that IRCE rejects it and does not falsely assume that it was -; a comparison against iv.next. -; TODO: We can actually extend the recognition to cover this case. -define void @test_03(i32* %arr, i64* %a_len_ptr) #0 { - -; CHECK: test_03 - -entry: - %len = load i64, i64* %a_len_ptr, !range !1 - br label %loop - -loop: - %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ] - %idx.next = add nsw nuw i32 %idx, 1 - %idx.ext = sext i32 %idx to i64 - %abc = icmp slt i64 %idx.ext, %len - br i1 %abc, label %in.bounds, label %out.of.bounds - -in.bounds: - %addr = getelementptr i32, i32* %arr, i32 %idx - store i32 0, i32* %addr - %next = icmp slt i32 %idx, 100 - br i1 %next, label %loop, label %exit - -out.of.bounds: - ret void - -exit: - ret void -} - -; Same as test_02, but comparison happens against IV extended to a wider type. -; This test ensures that IRCE rejects it and does not falsely assume that it was -; a comparison against iv.next. -; TODO: We can actually extend the recognition to cover this case. -define void @test_04(i32* %arr, i64* %a_len_ptr) #0 { - -; CHECK: test_04 - -entry: - %len = load i64, i64* %a_len_ptr, !range !1 - br label %loop - -loop: - %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ] - %idx.next = add nsw nuw i32 %idx, 1 - %idx.ext = sext i32 %idx to i64 - %abc = icmp ult i64 %idx.ext, %len - br i1 %abc, label %in.bounds, label %out.of.bounds - -in.bounds: - %addr = getelementptr i32, i32* %arr, i32 %idx - store i32 0, i32* %addr - %next = icmp ult i32 %idx, 100 - br i1 %next, label %loop, label %exit - -out.of.bounds: - ret void - -exit: - ret void -} - -!0 = !{i32 0, i32 50} -!1 = !{i64 0, i64 50} llvm-svn: 312775
*	Fix a crash when emitting debug info for multi-reg function arguments	Adrian Prantl	2017-09-08	1	-0/+51
\| \| \| \| \| \| \| \| \|	by reusing more of the existing machinery This is a follow-up to r312169. Thanks to Björn Pettersson for the testcase! llvm-svn: 312773
*	[XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC	Dean Michael Berris	2017-09-08	3	-0/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes code-gen for XRay in PPC. The regression wasn't caught by codegen tests which we add in this change. What happened was the following: - For tail exits, we used to unconditionally prepend the returns/exits with a pseudo-instruction that gets lowered to the instrumentation sled (and leave the actual return/exit instruction as-is). - Changes to the XRay instrumentation pass caused the tail exits to suddenly also emit the tail exit pseudo-instruction, since the check for whether a return instruction was also a call instruction meant it was a tail exit instruction. - None of the tests caught the regression either due to non-existent tests, or the tests being disabled/removed for continuous breakage. This change re-introduces some of the basic tests and verifies that we're back to a state that allows the back-end to generate appropriate XRay instrumented binaries for PPC in the presence of tail exits. Reviewers: echristo, timshen Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D37570 llvm-svn: 312772
*	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to	Chandler Carruth	2017-09-08	1	-0/+1423
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 llvm-svn: 312768
*	WholeProgramDevirt: When promoting for single-impl devirt, also rename the ↵	Peter Collingbourne	2017-09-08	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \|	comdat. This is required when targeting COFF, as the comdat name must match one of the names of the symbols in the comdat. Differential Revision: https://reviews.llvm.org/D37550 llvm-svn: 312767
*	[x86] Extend the manual ISel of `add` and `sub` with both RMW memory	Chandler Carruth	2017-09-07	4	-91/+679
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operands and used flags to support matching immediate operands. This is a bit trickier than register operands, and we still want to fall back on a register operands even for things that appear to be "immediates" when they won't actually select into the operation's immediate operand. This also requires us to handle things like selecting `sub` vs. `add` to minimize the number of bits needed to represent the immediate, and picking the shortest immediate encoding. In order to that, we in turn need to scan to make sure that CF isn't used as it will get inverted. The end result seems very nice though, and we're now generating optimal instruction sequences for these patterns IMO. A follow-up patch will further expand this to other operations with RMW memory operands. But handing `add` and `sub` are useful starting points to flesh out the machinery and make sure interesting and complex cases can be handled. Thanks to Craig Topper who provided a few fixes and improvements to this patch in addition to the review! Differential Revision: https://reviews.llvm.org/D37139 llvm-svn: 312764
*	Don't call exit from cl::PrintHelpMessage.	Rafael Espindola	2017-09-07	1	-0/+1
\| \| \| \| \| \| \| \| \|	Most callers were not expecting the exit(0) and trying to exit with a different value. This also adds back the call to cl::PrintHelpMessage in llvm-ar. llvm-svn: 312761
*	Revert r312318, r312325, r312424, r312489	Richard Trieu	2017-09-07	1	-57/+0
\| \| \| \| \| \| \| \| \| \|	r312318 - Debug info for variables whose type is shrinked to bool r312325, r312424, r312489 - Test case for r312318 Revision 312318 introduced a null dereference bug. Details in https://bugs.llvm.org/show_bug.cgi?id=34490 llvm-svn: 312758
*	[llvm-objcopy] Add support for special section indexes in symbol table ↵	Petr Hosek	2017-09-07	3	-0/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	greater than SHN_LORESERVE As is indexes above SHN_LORESERVE will not be handled correctly because they'll be treated as indexes of sections rather than special values that should just be copied. This change adds support to copy them though. Patch by Jake Ehrlich Differential Revision: https://reviews.llvm.org/D37393 llvm-svn: 312756
*	llvm-ar: exit with 1 if there is an error.	Rafael Espindola	2017-09-07	1	-0/+4
\| \| \| \| \| \|	This is pr34396. llvm-svn: 312752
*	[DWARF] Line 0 should not have a discriminator.	Paul Robinson	2017-09-07	1	-0/+39
\| \| \| \| \| \| \| \|	It's meaningless and takes up extra space in the line table. Differential Revision: https://reviews.llvm.org/D37364 llvm-svn: 312751
*	Fix llvm-xray tests to avoid subshells	Reid Kleckner	2017-09-07	2	-18/+5
\| \| \| \| \| \| \| \| \|	We already uses pipefail to detect failure of a redirected command, so the "\|\| echo failure" construct was unnecessary. These tests run and pass on Windows now. llvm-svn: 312747
*	[yaml2obj][ELF] Add support for symbol indexes greater than SHN_LORESERVE	Petr Hosek	2017-09-07	2	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|	Right now Symbols must be either undefined or defined in a specific section. Some symbols have section indexes like SHN_ABS however. This change adds support for outputting symbols that have such section indexes. Patch by Jake Ehrlich Differential Revision: https://reviews.llvm.org/D37391 llvm-svn: 312745
*	[XRay][tools] Disable windows for tests that use an unsupported shell redirect.	Keith Wyss	2017-09-07	2	-0/+10
\| \| \| \| \| \| \|	The tests are filechecking against stderr and use some magic to make stdout go away and pipe stderr to FileCheck. This broke bots on windows. llvm-svn: 312739
*	[CUDA] Added rudimentary support for CUDA-9 and sm_70.	Artem Belevich	2017-09-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734
*	[XRay][tools] Function call stack based analysis tooling for XRay traces	Keith Wyss	2017-09-07	4	-0/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Second try after fixing a code san problem with iterator reference types. This change introduces a subcommand to the llvm-xray tool called "stacks" which allows for analysing XRay traces provided as inputs and accounting time to stacks instead of just individual functions. This gives us a more precise view of where in a program the latency is actually attributed. The tool uses a trie data structure to keep track of the caller-callee relationships as we process the XRay traces. In particular, we keep track of the function call stack as we enter functions. While we're doing this we're adding nodes in a trie and indicating a "calls" relatinship between the caller (current top of the stack) and the callee (the new top of the stack). When we push function ids onto the stack, we keep track of the timestamp (TSC) for the enter event. When exiting functions, we are able to account the duration by getting the difference between the timestamp of the exit event and the corresponding entry event in the stack. This works even if we somehow miss the exit events for intermediary functions (i.e. if the exit event is not cleanly associated with the enter event at the top of the stack). The output of the tool currently provides just the top N leaf functions that contribute the most latency, and the top N stacks that have the most frequency. In the future we can provide more sophisticated query mechanisms and potentially an export to database feature to make offline analysis of the stack traces possible with existing tools. Differential revision: D34863 llvm-svn: 312733
*	AMDGPU: Start selecting v_mad_mix_f32	Matt Arsenault	2017-09-07	1	-0/+409
\| \| \| \|	llvm-svn: 312732
*	AMDGPU: Handle non-temporal loads and stores	Konstantin Zhuravlyov	2017-09-07	5	-9/+527
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729
*	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer	Konstantin Zhuravlyov	2017-09-07	1	-0/+163
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725
*	[ARM] Remove redundant vcvt patterns.	Benjamin Kramer	2017-09-07	1	-0/+19
\| \| \| \| \| \| \| \|	These don't add any value as they're just compositions of existing patterns. However, they can confuse the cost logic in ISel, leading to duplicated vcvt instructions like in PR33199. llvm-svn: 312724
*	[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess ↵	Michael Zuckerman	2017-09-07	2	-243/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(VF{8\|16\|32} stride 3). This patch expands the support of lowerInterleavedload to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) and we plan to include the store (deinterleved side). The patch goal is to optimize the following sequence: a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 into a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 Reviewers 1. zvi 2. igor 3. guyblank 4. dorit 5. Ayal llvm-svn: 312722