bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[InstCombine] fold (X >>u C) << C --> X & (-1 << C)	Sanjay Patel	2017-01-26	1	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already have this fold when the lshr has one use, but it doesn't need that restriction. We may be able to remove some code from foldShiftedShift(). Also, move the similar: (X << C) >>u C --> X & (-1 >>u C) ...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst(). That whole function seems questionable since it is called by commonShiftTransforms(), but there's really not much in common if we're checking the shift opcodes for every fold. llvm-svn: 293215
*	[Hexagon] Add Hexagon-specific loop idiom recognition pass	Krzysztof Parzyszek	2017-01-26	4	-5/+1637
\| \| \| \|	llvm-svn: 293213
*	NewGVN: Add algorithm overview	Daniel Berlin	2017-01-26	1	-0/+21
\| \| \| \|	llvm-svn: 293212
*	[InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with ↵	Sanjay Patel	2017-01-26	1	-16/+24
\| \| \| \| \| \|	splat vectors llvm-svn: 293208
*	[AArch64] Refine Kryo Machine Model	Balaram Makam	2017-01-26	1	-22/+40
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Refine floating point SQRT and DIV with accurate latency information. Reviewers: mcrosier Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D29191 llvm-svn: 293204
*	[IfConversion] Use reverse_iterator to simplify. NFC	Kyle Butt	2017-01-26	1	-70/+35
\| \| \| \| \| \|	This simplifies skipping debug instructions and shrinking ranges. llvm-svn: 293202
*	[PPC] cleanup of mayLoad/mayStore flags and memory operands.	Sean Fertile	2017-01-26	5	-26/+28
\| \| \| \| \| \| \| \| \| \| \| \|	1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store instructions. 2) Updated the flags on a number of intrinsics indicating that they write memory. 3) Added SDNPMemOperand flags for some target dependent SDNodes so that they propagate their memory operand Review: https://reviews.llvm.org/D28818 llvm-svn: 293200
*	NewGVN: Make unreachable blocks be marked with unreachable	Daniel Berlin	2017-01-26	1	-18/+13
\| \| \| \|	llvm-svn: 293196
*	Replace addEarlyAsPossiblePasses callback with adjustPassManager	Stanislav Mekhanoshin	2017-01-26	4	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-01-26	5	-402/+381
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
*	[XRay][Arm32] Reduce the portion of the stub and implement more staging for ↵	Serge Rogatch	2017-01-26	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tail calls - in LLVM Summary: This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2. Coupled patch: - https://reviews.llvm.org/D28674 Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris Differential Revision: https://reviews.llvm.org/D28673 llvm-svn: 293185
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-01-26	5	-381/+402
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
*	Use shouldAssumeDSOLocal in classifyGlobalReference.	Rafael Espindola	2017-01-26	2	-21/+10
\| \| \| \| \| \| \| \| \| \|	And teach shouldAssumeDSOLocal that ppc has no copy relocations. The resulting code handle a few more case than before. For example, it knows that a weak symbol can be resolved to another .o file, but it will still be in the main executable. llvm-svn: 293180
*	[X86][SSE] Add support for combining ANDNP byte masks with target shuffles	Simon Pilgrim	2017-01-26	1	-6/+26
\| \| \| \|	llvm-svn: 293178
*	[SCEV] Introduce add operation inlining limit	Daniil Fukalov	2017-01-26	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Inlining in getAddExpr() can cause abnormal computational time in some cases. New parameter -scev-addops-inline-threshold is intruduced with default value 500. Reviewers: sanjoy Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D28812 llvm-svn: 293176
*	[X86][SSE] Pull out target shuffle resolve code into helper. NFCI.	Simon Pilgrim	2017-01-26	1	-14/+21
\| \| \| \| \| \|	Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit. llvm-svn: 293175
*	[AMDGPU] Fix typo in GCNSchedStrategy	Valery Pykhtin	2017-01-26	1	-1/+1
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
*	Revert "[mips] N64 static relocation model support"	Simon Dardis	2017-01-26	12	-240/+115
\| \| \| \| \| \|	This reverts commit r293164. There are multiple tests failing. llvm-svn: 293170
*	[LV] Fix an issue where forming LCSSA in the place that we did would	Chandler Carruth	2017-01-26	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168
*	[mips] N64 static relocation model support	Simon Dardis	2017-01-26	12	-115/+240
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes one change to GOT handling and two changes to N64's relocation model handling. Furthermore, the jumptable encodings have been corrected for static N64. Big GOT handling is now done via a new SDNode MipsGotHi - this node is unconditionally lowered to an lui instruction. The first change to N64's relocation handling is the lifting of the restriction that N64 always uses PIC. Now it is possible to target static environments. The second change adds support for 64 bit symbols and enables them by default. Previously N64 had patterns for sym32 mode only. In this mode all symbols are assumed to have 32 bit addresses. sym32 mode support is selectable with attribute 'sym32'. A follow on patch for clang will add the necessary frontend parameter. This partially resolves PR/23485. Thanks to Brooks Davis for reporting the issue! Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D23652 llvm-svn: 293164
*	[ARM] GlobalISel: Load i1, i8 and i16 args from stack	Diana Picus	2017-01-26	3	-14/+44
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for loading i1, i8 and i16 arguments from the stack, with or without the ABI extension flags. When the ABI extension flags are present, we load a 4-byte value, otherwise we preserve the size of the load and let the instruction selector replace it with a LDRB/LDRH. This generates the same thing as DAGISel. Differential Revision: https://reviews.llvm.org/D27803 llvm-svn: 293163
*	[PM] Use PoisoningVH correctly when merely deleting entries in a map	Chandler Carruth	2017-01-26	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with it. This code was dereferencing the PoisoningVH which isn't allowed once it is poisoned. But the code itself really doesn't need to access the pointer, it is just doing the safe stuff of clearing out data structures keyed on the pointer value. Change the code to use iterators to erase directly from a DenseMap. This is also substantially more efficient as it avoids lots of hashing and lookups to do the erasure. DenseMap supports iterating behind the iteration which is fairly easy to implement. Sadly, I don't have a test case here. I'm not even close and I don't know that I ever will be. The issue is that several of the tricky aspects of fixing this only show up when you cause the stack's SmallVector to be in EXACTLY the right location. I only ever got a reproduction for those with Clang, and only with exactly the right command line flags. Any adjustment, even to seemingly unrelated flags, would make partial and half-way solutions magically start to "work". In good news, all of this was caught with the LLVM test suite. Also, there is no specific code here that is untested, just that the old pattern of code won't immediately fail on any test case I've managed to contrive. llvm-svn: 293160
*	[AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last ↵	Craig Topper	2017-01-26	1	-1/+1
\| \| \| \| \| \|	DAG combine phase where I had originally meant to put it. llvm-svn: 293157
*	[X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked ↵	Craig Topper	2017-01-26	1	-2/+2
\| \| \| \| \| \|	operations, use the correct type for the immediate operand. llvm-svn: 293156
*	[TargetTransformInfo] Refactor and improve getScalarizationOverhead()	Jonas Paulsson	2017-01-26	6	-57/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
*	[DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting ↵	Craig Topper	2017-01-26	1	-0/+8
\| \| \| \| \| \|	undef subvectors. llvm-svn: 293152
*	[X86] Add demanded elts support for the inputs to pclmul intrinsic	Craig Topper	2017-01-26	1	-0/+38
\| \| \| \| \| \| \| \|	This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors. Differential Revision: https://reviews.llvm.org/D28979 llvm-svn: 293151
*	Revert test commit	Taewook Oh	2017-01-26	1	-1/+0
\| \| \| \|	llvm-svn: 293150
*	test commit	Taewook Oh	2017-01-26	1	-3/+4
\| \| \| \|	llvm-svn: 293148
*	[PM] Simplify the new PM interface to the loop unroller and expose two	Chandler Carruth	2017-01-26	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling. I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline. Differential Revision: https://reviews.llvm.org/D28897 llvm-svn: 293136
*	[libFuzzer] remove a bit of stale code	Kostya Serebryany	2017-01-26	2	-6/+0
\| \| \| \|	llvm-svn: 293129
*	[libFuzzer] further simplify __sanitizer_cov_trace_pc_guard	Kostya Serebryany	2017-01-26	2	-9/+7
\| \| \| \|	llvm-svn: 293128
*	AMDGPU: Fold fneg into round instructions	Matt Arsenault	2017-01-26	1	-1/+7
\| \| \| \|	llvm-svn: 293127
*	[libFuzzer] simplify the code for __sanitizer_cov_trace_pc_guard and make ↵	Kostya Serebryany	2017-01-26	5	-4/+31
\| \| \| \| \| \|	sure it is not asan/msan-instrumented llvm-svn: 293125
*	[LoopUnroll] Properly update loopinfo for runtime unrolling by 2	Michael Kuperstein	2017-01-26	3	-10/+19
\| \| \| \| \| \| \| \| \| \| \|	Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124
*	[NewGVN] Skip uses in unreachable blocks.	Davide Italiano	2017-01-26	1	-0/+6
\| \| \| \| \| \| \| \|	Otherwise we ask for a domtree node that's not there, and we crash. Differential Revision: https://reviews.llvm.org/D29145 llvm-svn: 293122
*	[llc] Add -pass-remarks-output	Adam Nemet	2017-01-26	1	-5/+11
\| \| \| \| \| \| \|	This is the opt/llc counterpart of -fsave-optimization-record to output optimization remarks in a YAML file. llvm-svn: 293121
*	LowerTypeTests: Ignore external globals with type metadata.	Peter Collingbourne	2017-01-26	1	-3/+6
\| \| \| \| \| \|	Thanks to Davide Italiano for finding the problem and providing a test case. llvm-svn: 293119
*	[libFuzzer] don't call GetPreviousInstructionPc on the hot path -- only when ↵	Kostya Serebryany	2017-01-26	1	-18/+22
\| \| \| \| \| \|	dumping the PCs llvm-svn: 293117
*	[APFloat] Fix comments. NFC.	Tim Shen	2017-01-26	1	-28/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix comments in response to jlebar's comments in D27872. Reviewers: jlebar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29109 llvm-svn: 293116
*	[ValueTracking] Implement SignBitMustBeZero correctly for sqrt.	Justin Lebar	2017-01-26	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously we assumed that the result of sqrt(x) always had 0 as its sign bit. But sqrt(-0) == -0. Reviewers: hfinkel, efriedma, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28928 llvm-svn: 293115
*	[NewGVN] Simplify folding a lambda used only once. NFCI.	Davide Italiano	2017-01-25	1	-5/+3
\| \| \| \|	llvm-svn: 293112
*	New OptimizationRemarkEmitter pass for MIR	Adam Nemet	2017-01-25	6	-46/+235
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows MIR passes to emit optimization remarks with the same level of functionality that is available to IR passes. It also hooks up the greedy register allocator to report spills. This allows for interesting use cases like increasing interleaving on a loop until spilling of registers is observed. I still need to experiment whether reporting every spill scales but this demonstrates for now that the functionality works from llc using -pass-remarks*=<pass>. Differential Revision: https://reviews.llvm.org/D29004 llvm-svn: 293110
*	[OptDiag] Split code region out of DiagnosticInfoOptimizationBase	Adam Nemet	2017-01-25	2	-21/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code region is the only part of this class that is IR-specific. Code region is moved down in the inheritance tree to a new derived class, called DiagnosticInfoIROptimization. All the existing remarks are derived from this new class now. This allows the new MIR pass-remark classes to be derived from DiagnosticInfoOptimizationBase. Also because we keep the name DiagnosticInfoOptimizationBase, the clang parts don't need any adjustment. Differential Revision: https://reviews.llvm.org/D29003 llvm-svn: 293109
*	NFC: Rename (PDB) RawSession to NativeSession	Adrian McCarthy	2017-01-25	26	-176/+188
\| \| \| \| \| \| \| \|	This eliminates one overload on the term Raw. Differential Revision: https://reviews.llvm.org/D29098 llvm-svn: 293104
*	Revert "[PPC] Give unaligned memory access lower cost on processor that ↵	Daniel Jasper	2017-01-25	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	supports it" This reverts commit r292680. It is causing significantly worse performance and test timeouts in our internal builds. I have already routed reproduction instructions your way. llvm-svn: 293092
*	[pdb] Correctly parse the hash adjusters table from TPI stream.	Zachary Turner	2017-01-25	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not a list of pairs, it is a hash table data structure. We now correctly parse this out and dump it from llvm-pdbdump. We still need to understand the conditions that lead to a type getting an entry in the hash adjuster table. That will be done in a followup investigation / patch. Differential Revision: https://reviews.llvm.org/D29090 llvm-svn: 293090
*	SDag: fix how initial loads are formed when splitting vector ops.	Tim Northover	2017-01-25	1	-1/+4
\| \| \| \| \| \| \| \|	Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. llvm-svn: 293088
*	GlobalISel: rework getOrCreateVReg to avoid double lookup. NFC.	Tim Northover	2017-01-25	1	-20/+20
\| \| \| \| \| \|	Thanks to Quentin for suggesting the refactoring. llvm-svn: 293087
*	DebugInfo: remove unused parameter from function. NFC.	Tim Northover	2017-01-25	2	-4/+2
\| \| \| \| \| \| \|	I think it's a hold-over from some previous iteration, but it's never set to true in LLVM as it exists now. llvm-svn: 293086