bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AArch64] Improve load/store optimizer to handle LDUR + LDR.	Chad Rosier	2016-02-04	1	-22/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790
*	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic	Michael Zuckerman	2016-02-04	3	-11/+42
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789
*	[ScheduleDagInstrs] Improved comments	Jonas Paulsson	2016-02-04	1	-9/+9
\| \| \| \|	llvm-svn: 259783
*	[X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.	Simon Pilgrim	2016-02-04	1	-60/+84
\| \| \| \|	llvm-svn: 259771
*	[X86] Use hash table in LEA optimization pass.	Andrey Turetskiy	2016-02-04	1	-150/+247
\| \| \| \| \| \| \| \|	Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time. Differential Revision: http://reviews.llvm.org/D16404 llvm-svn: 259770
*	[Support] Use range-based for loop. NFC	Craig Topper	2016-02-04	1	-3/+1
\| \| \| \|	llvm-svn: 259763
*	[Support] Use hexdigit instead of manually coding the same thing. NFC	Craig Topper	2016-02-04	1	-2/+2
\| \| \| \|	llvm-svn: 259762
*	[PGO] Profile interface cleanup	Xinliang David Li	2016-02-04	1	-4/+3
\| \| \| \| \| \| \|	- Remove unused valuemapper parameter - add totalcount optional parameter llvm-svn: 259756
*	[NVPTX] Disable performance optimizations when OptLevel==None	Jingyue Wu	2016-02-04	1	-21/+36
\| \| \| \| \| \| \| \| \| \|	Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
*	[SCEV] Try to reuse existing value during SCEV expansion	Wei Mi	2016-02-04	3	-9/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736
*	Fix undefined behavior when compiling in C++14 mode (with sized deletion	Richard Smith	2016-02-04	1	-0/+8
\| \| \| \| \| \| \|	enabled): ensure that we do not invoke the sized deallocator for MemoryBuffer subclasses that have tail-allocated data. llvm-svn: 259735
*	[codeview] Don't attempt a cross-section label diff	Reid Kleckner	2016-02-04	1	-5/+11
\| \| \| \| \| \| \| \|	This only comes up when we're trying to find the next .cv_loc label. Fixes PR26467 llvm-svn: 259733
*	[libFuzzer] hot fix a test	Kostya Serebryany	2016-02-04	1	-1/+1
\| \| \| \|	llvm-svn: 259732
*	[libFuzzer] don't write the test unit when a leak is detected (since we ↵	Kostya Serebryany	2016-02-04	4	-0/+16
\| \| \| \| \| \|	don't know which unit causes the leak) llvm-svn: 259731
*	[SimplifyCFG] Fix for "endless" loop after dead code removal (Alternative to	Gerolf Hoflehner	2016-02-03	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D16251) Summary: This is a simpler fix to the problem than the dominator approach in http://reviews.llvm.org/D16251. It adds only values into the gather() while loop that have been seen before. The actual endless loop is in the constant compare gather() routine in Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the queue: %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i Here is what happens at the IR level: for.cond.i: ; preds = %if.end6.i, %if.end.i54 %ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ] %ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<< %cmp2.i = icmp ult i32 %ix.0.i, %11 br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit if.end6.i: ; preds = %for.body.i %cmp10.i = icmp ugt i32 %conv.i, %add9.i %.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<< When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i. The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i (Note the first ‘or’ operand is now %.ret.0.off0.i, and NOT %ret.0.off0.i). And now there is use of .ret.0.off0.i before a definition which triggers the “endless” loop in gather(): while(!DFT.empty()) { V = DFT.pop_back_val(); // V is .ret.0.off0.i if (Instruction *I = dyn_cast<Instruction>(V)) { // If it is a \|\| (or && depending on isEQ), process the operands. if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) { DFT.push_back(I->getOperand(1)); // This is now .ret.0.off0.i also DFT.push_back(I->getOperand(0)); continue; // “endless loop” for .ret.0.off0.i } Reviewers: reames, ahatanak Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16839 llvm-svn: 259730
*	[InstrProfiling] Fix a comment (NFC)	Vedant Kumar	2016-02-03	1	-1/+1
\| \| \| \|	llvm-svn: 259727
*	Minor code cleanups. NFC.	Junmo Park	2016-02-03	1	-18/+18
\| \| \| \|	llvm-svn: 259725
*	rangify; NFCI	Sanjay Patel	2016-02-03	1	-159/+129
\| \| \| \|	llvm-svn: 259722
*	clean up; NFC	Sanjay Patel	2016-02-03	1	-15/+13
\| \| \| \|	llvm-svn: 259720
*	Fix pointers to go on the right hand side. NFC.	Ana Pazos	2016-02-03	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixed pointers to go on the right hand side following coding guidelines. NFC. Patch by Mandeep Singh Grang. Reviewers: majnemer, arsenm, sanjoy Differential Revision: http://reviews.llvm.org/D16866 llvm-svn: 259703
*	[LoopStrengthReduce] Don't rewrite PHIs with incoming values from CatchSwitches	David Majnemer	2016-02-03	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	Bail out if we have a PHI on an EHPad that gets a value from a CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is no good place to stick any instructions. This fixes PR26373. llvm-svn: 259702
*	[ScalarEvolutionExpander] Simplify findInsertPointAfter	David Majnemer	2016-02-03	1	-8/+6
\| \| \| \| \| \| \|	No functional change is intended. The loop could only execute, at most, once. llvm-svn: 259701
*	[codeview] Remove EmitLabelDiff in favor emitAbsoluteSymbolDiff	Reid Kleckner	2016-02-03	1	-18/+4
\| \| \| \|	llvm-svn: 259700
*	[codeview] Use the MCStreamer interface directly instead of AsmPrinter	Reid Kleckner	2016-02-03	2	-101/+100
\| \| \| \| \| \| \| \| \|	This is mostly about having shorter lines and standardizing on one interface, but it also avoids some needless indirection. No functional change. llvm-svn: 259697
*	[DWARFDebug] Fix another case of overlapping ranges	Keno Fischer	2016-02-03	1	-13/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In r257979, I added code to ensure that we wouldn't merge DebugLocEntries if the pieces they describe overlap. Unfortunately, I failed to cover the case, where there may have multiple active Expressions in the entry, in which case we need to make sure that no two values overlap before we can perform the merge. This fixed PR26148. Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D16742 llvm-svn: 259696
*	Address NDEBUG-related linkage issues for Value::assertModuleIsMaterialized()	Todd Fiala	2016-02-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IR/Value class had a linkage issue present when LLVM was built as a library, and the LLVM library build time had different settings for NDEBUG than the client of the LLVM library. Clients could get into a state where the LLVM lib expected Value::assertModuleIsMaterialized() to be inline-defined in the header but clients expected that method to be defined in the LLVM library. See this llvm-commits thread for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160201/329667.html llvm-svn: 259695
*	[SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior	Tim Shen	2016-02-03	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch consists of two parts: a performance fix in DAGCombiner.cpp and a correctness fix in SelectionDAG.cpp. The test case tests the bug that's uncovered by the performance fix, and fixed by the correctness fix. The performance fix keeps the containers required by the hasPredecessorHelper (which is a lazy DFS) and reuse them. Since hasPredecessorHelper is called in a loop, the overall efficiency reduced from O(n^2) to O(n), where n is the number of SDNodes. The correctness fix keeps iterating the neighbor list even if it's time to early return. It will return after finishing adding all neighbors to Worklist, so that no neighbors are discarded due to the original early return. llvm-svn: 259691
*	ARM: support TLS for WoA	Saleem Abdulrasool	2016-02-03	5	-0/+62
\| \| \| \| \| \| \| \| \| \| \|	Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676
*	Revert r259662, which caused regressions on polly tests.	Wei Mi	2016-02-03	3	-127/+9
\| \| \| \|	llvm-svn: 259675
*	[InstCombine] Revert r238452: Fold IntToPtr and PtrToInt into preceding loads.	Quentin Colombet	2016-02-03	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to git bisect, this is the root cause of a miscompile for Regex in libLLVMSupport. I am still working on reducing a test case. The actual bug may be elsewhere and this commit just exposed it. Anyway, at the moment, to reproduce, follow these steps: 1. Build clang and libLTO in release mode. 2. Create a new build directory <stage2> and cd into it. 3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts using -O2 -flto 4. Run llvm-extract -ralias '.bar' -S test/Other/extract-alias.ll Result: program doesn't contain global named '.bar'! Expected result: @a0a0bar = alias void ()* @bar @a0bar = alias void ()* @bar declare void @bar() Note: In step #3, if you don't use lto or asserts, the miscompile disappears. llvm-svn: 259674
*	[ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten.	Jonas Paulsson	2016-02-03	1	-347/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommited, after some fixing with test cases. Updated test cases: test/CodeGen/AArch64/arm64-misched-memdep-bug.ll test/CodeGen/AArch64/tailcall_misched_graph.ll Temporarily disabled test cases: test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated) test/CodeGen/PowerPC/vsx-fma-m.ll test/CodeGen/PowerPC/vsx-fma-sp.ll http://reviews.llvm.org/D8705 Reviewers: Hal Finkel, Andy Trick. llvm-svn: 259673
*	[SCEV] Try to reuse existing value during SCEV expansion	Wei Mi	2016-02-03	3	-9/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662
*	[ARM] Move GNUEABI divmod to __aeabi_divmod*	Renato Golin	2016-02-03	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657
*	[MachineCopyPropagation] Fix comment. NFC	Jun Bum Lim	2016-02-03	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Reviewers: MatzeB, qcolombet, jmolloy, mcrosier Subscribers: llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D16806 llvm-svn: 259656
*	[mips] Remove redundant inclusions of MipsAnalyzeImmediate.h	Daniel Sanders	2016-02-03	9	-8/+1
\| \| \| \|	llvm-svn: 259655
*	[DemandedBits] Revert r249687 due to PR26071	James Molloy	2016-02-03	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken. From the writeup in PR26071: What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs. This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails. The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one. llvm-svn: 259649
*	Fix for PR 26381	Nemanja Ivanovic	2016-02-03	1	-1/+1
\| \| \| \| \| \|	Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645
*	[mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections	Simon Atanasyan	2016-02-03	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641
*	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to ↵	Simon Pilgrim	2016-02-03	4	-124/+121
\| \| \| \| \| \| \| \| \| \| \| \|	EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635
*	Fix a typo in comment	Xinliang David Li	2016-02-03	1	-1/+1
\| \| \| \|	llvm-svn: 259631
*	Fix uninitiazed variable use problem	Xinliang David Li	2016-02-03	1	-1/+1
\| \| \| \|	llvm-svn: 259630
*	[PGO] Profile summary reader/writer support	Xinliang David Li	2016-02-03	3	-13/+104
\| \| \| \| \| \| \| \| \| \|	With this patch, the profile summary data will be available in indexed profile data file so that profiler reader/compiler optimizer can start to make use of. Differential Revision: http://reviews.llvm.org/D16258 llvm-svn: 259626
*	LowerBitSets: Don't bother to do any work if the llvm.bitset.test intrinsic ↵	Peter Collingbourne	2016-02-03	1	-1/+1
\| \| \| \| \| \|	is unused. llvm-svn: 259625
*	Add #include "llvm/Support/raw_ostream.h" to fix Windows build.	Peter Collingbourne	2016-02-03	1	-0/+1
\| \| \| \|	llvm-svn: 259623
*	Transforms: Move GlobalOpt's Evaluator to Utils where it can be reused.	Peter Collingbourne	2016-02-03	3	-655/+598
\| \| \| \|	llvm-svn: 259621
*	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.	Kyle Butt	2016-02-03	1	-19/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617
*	Revert r259576: Disable the vzeroupper insertion pass on PS4.	Yunzhong Gao	2016-02-03	1	-3/+0
\| \| \| \| \| \|	Will re-implement based on review feedback. llvm-svn: 259615
*	RegCoalescer: Making sure re-materialization defines all subranges	Marcello Maggioni	2016-02-03	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \|	The register coalescer can rematerialize constants that define more of a register than the copy it is going to replace was going to do. This is valid in the case the register was undef before the copy happened. This patch makes sure that all the subranges defined by the new rematerialization instructions have at least a dead def. Review: http://reviews.llvm.org/D16693 llvm-svn: 259614
*	[LoopVersioning] Expose loop versioning as a pass too	Adam Nemet	2016-02-03	2	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LoopVersioning is a transform utility that transform passes can use to run-time disambiguate may-aliasing accesses. I'd like to also expose as pass to allow it to be unit-tested. I am planning to add support for non-aliasing annotation in LoopVersioning and I'd like to be able to write tests directly using this pass. (After that feature is done, the pass could also be used to look for optimization opportunities that are hidden behind incomplete alias information at compile time.) The pass drives LoopVersioning in its default way which is to fully disambiguate may-aliasing accesses no matter how many checks are required. Reviewers: hfinkel, ashutosh.nema, sbaranga Subscribers: zzheng, mssimpso, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D16612 llvm-svn: 259610
*	Attempt #2 to unbreak r259595.	George Burgess IV	2016-02-02	1	-4/+4
\| \| \| \|	llvm-svn: 259602