bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Enable the shrink wrapping optimization for PPC64.	Kit Barton	2015-09-10	3	-77/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The changes in this patch are as follows: 1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function 2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function 3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run: Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64. Phabricator review: http://reviews.llvm.org/D11817 llvm-svn: 247237
*	[AArch64] Match FI+offset in STNP addressing mode.	Ahmed Bougacha	2015-09-10	2	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, we need to teach isFrameOffsetLegal about STNP. It already knew about the STP/LDP variants, but those were probably never exercised, because it's only the load/store optimizer that generates STP/LDP, and the only user of the method is frame lowering, which runs earlier. The STP/LDP cases were wrong: they didn't take into account the fact that they return two results, not one, so the immediate offset will be the 4th operand, not the 3rd. Follow-up to r247234. llvm-svn: 247236
*	[AArch64] Match base+offset in STNP addressing mode.	Ahmed Bougacha	2015-09-10	1	-0/+16
\| \| \| \| \| \|	Followup to r247231. llvm-svn: 247234
*	[AArch64] Support selecting STNP.	Ahmed Bougacha	2015-09-10	3	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We could go through the load/store optimizer and match STNP where we would have matched a nontemporal-annotated STP, but that's not reliable enough, as an opportunistic optimization. Insetad, we can guarantee emitting STNP, by matching them at ISel. Since there are no single-input nontemporal stores, we have to resort to some high-bits-extracting trickery to generate an STNP from a plain store. Also, we need to support another, LDP/STP-specific addressing mode, base + signed scaled 7-bit immediate offset. For now, only match the base. Let's make it smart separately. Part of PR24086. llvm-svn: 247231
*	AMDGPU/SI: Fix more cases of losing exec operands	Matt Arsenault	2015-09-10	3	-16/+12
\| \| \| \|	llvm-svn: 247230
*	AMDGPU/SI: Fix creating v_mov_b32s without exec uses	Matt Arsenault	2015-09-10	1	-2/+14
\| \| \| \| \| \| \|	This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229
*	Revert r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes"	Hans Wennborg	2015-09-10	4	-58/+58
\| \| \| \| \| \| \|	This caused build breakges, e.g. http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926 llvm-svn: 247226
*	[CodeGen] Make x86 nontemporal store patfrags generic. NFC.	Ahmed Bougacha	2015-09-10	1	-19/+0
\| \| \| \| \| \|	To be used by other targets. llvm-svn: 247225
*	[RewriteStatepointsForGC] Minor refactor to use shared implementation [NFC]	Philip Reames	2015-09-10	1	-8/+1
\| \| \| \|	llvm-svn: 247223
*	[RewriteStatepointsForGC] Strengthen a confusingly weak assertion [NFC]	Philip Reames	2015-09-10	1	-3/+3
\| \| \| \| \| \|	The assertion was weaker than it should be and gave the impression we're growing the number of base defining values being considered during the fixed point interation. That's not true. The tighter form of the assert is useful documentation. llvm-svn: 247221
*	[RewriteStatepointsForGC] One last bit of naming [NFCI]	Philip Reames	2015-09-10	1	-7/+7
\| \| \| \|	llvm-svn: 247220
*	[WinEH] Add codegen support for cleanuppad and cleanupret	Reid Kleckner	2015-09-10	9	-62/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219
*	[RewriteStatepointsForGC] Further style/naming fixup [NFCI]	Philip Reames	2015-09-10	1	-26/+26
\| \| \| \|	llvm-svn: 247217
*	Fix Clang-tidy misc-use-override warnings, other minor fixes	Hans Wennborg	2015-09-10	4	-58/+58
\| \| \| \| \| \| \| \|	Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D12740 llvm-svn: 247216
*	[RewriteStatepointsForGC] More naming cleanup [NFCI]	Philip Reames	2015-09-10	1	-6/+6
\| \| \| \|	llvm-svn: 247213
*	[RewriteStatepointsForGC] Code cleanup [NFC]	Philip Reames	2015-09-09	1	-25/+26
\| \| \| \| \| \|	Factor out common code related to naming values, fix a small style issue. More to follow in separate changes. llvm-svn: 247211
*	[RewriteStatepointsForGC] Extend base pointer inference to handle insertelement	Philip Reames	2015-09-09	1	-58/+61
\| \| \| \| \| \| \| \| \| \| \| \|	This change is simply enhancing the existing inference algorithm to handle insertelement instructions by conservatively inserting a new instruction to propagate the vector of associated base pointers. In the process, I'm ripping out the peephole optimizations which mostly helped cover the fact this hadn't been done. Note that most of the newly inserted nodes will be nearly immediately removed by the post insertion optimization pass introduced in 246718. Arguably, we should be trying harder to avoid the malloc traffic here, but I'd rather get the code correct, then worry about compile time. Unlike previous extensions of the algorithm to handle more case, I discovered the existing code was causing miscompiles in some cases. In particular, we had an implicit assumption that the peephole covered all insert element instructions, so if we had a value directly based on a insert element the peephole didn't cover, we proceeded as if it were a base anyways. Not good. I believe we had the same issue with shufflevector which is why I adjusted the predicate for them as well. Differential Revision: http://reviews.llvm.org/D12583 llvm-svn: 247210
*	[RewriteStatepointsForGC] Make base pointer inference deterministic	Philip Reames	2015-09-09	1	-44/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the base pointer algorithm wasn't deterministic. The core fixed point was (of course), but we were inserting new nodes and optimizing them in an order which was unspecified and variable. We'd somewhat hacked around this for testing by sorting by value name, but that doesn't solve the general determinism problem. Instead, we can use the order of traversal over the def/use graph to give us a single consistent ordering. Today, this is a DFS order, but the exact order doesn't mater provided it's deterministic for a given input. (Q: It is safe to rely on a deterministic order of operands right?) Note that this only fixes the determinism within a single inference step. The inference step is currently invoked many times in a non-deterministic order. That's a future change in the sequence. :) Differential Revision: http://reviews.llvm.org/D12640 llvm-svn: 247208
*	LowerBitSets: Fix non-determinism bug.	Peter Collingbourne	2015-09-09	1	-4/+22
\| \| \| \| \| \| \| \|	Visit disjoint sets in a deterministic order based on the maximum BitSetNM index, otherwise the order in which we visit them will depend on pointer comparisons. This was being exposed by MSan. llvm-svn: 247201
*	[SEH] Emit 32-bit SEH tables for the new EH IR	Reid Kleckner	2015-09-09	7	-98/+279
\| \| \| \| \| \| \| \| \| \| \|	The 32-bit tables don't actually contain PC range data, so emitting them is incredibly simple. The 64-bit tables, on the other hand, use the same table for state numbering as well as label ranges. This makes things more difficult, so it will be implemented later. llvm-svn: 247192
*	ScalarEvolution assume hanging bugfix	Piotr Padlewski	2015-09-09	1	-13/+13
\| \| \| \| \| \|	http://reviews.llvm.org/D12719 llvm-svn: 247184
*	Revert trunc(lshr (sext A), Cst) to ashr A, Cst	David Majnemer	2015-09-09	1	-20/+0
\| \| \| \| \| \|	This reverts commit r246997, it introduced a regression (PR24763). llvm-svn: 247180
*	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ↵	Renato Golin	2015-09-09	1	-109/+52
\| \| \| \| \| \| \| \|	,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177
*	Save LaneMask with livein registers	Matthias Braun	2015-09-09	22	-79/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With subregister liveness enabled we can detect the case where only parts of a register are live in, this is expressed as a 32bit lanemask. The current code only keeps registers in the live-in list and therefore enumerated all subregisters affected by the lanemask. This turned out to be too conservative as the subregister may also cover additional parts of the lanemask which are not live. Expressing a given lanemask by enumerating a minimum set of subregisters is computationally expensive so the best solution is to simply change the live-in list to store the lanemasks as well. This will reduce memory usage for targets using subregister liveness and slightly increase it for other targets Differential Revision: http://reviews.llvm.org/D12442 llvm-svn: 247171
*	VirtRegMap: Improve addMBBLiveIns() using SlotIndex::MBBIndexIterator; NFC	Matthias Braun	2015-09-09	1	-25/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have an explicit iterator over the idx2MBBMap in SlotIndices we can use the fact that segments and the idx2MBBMap is sorted by SlotIndex position so can advance both simultaneously instead of starting from the beginning for each segment. This complicates the code for the subregister case somewhat but should be more efficient and has the advantage that we get the final lanemask for each block immediately which will be important for a subsequent change. Removes the now unused SlotIndexes::findMBBLiveIns function. Differential Revision: http://reviews.llvm.org/D12443 llvm-svn: 247170
*	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible	Chandler Carruth	2015-09-09	73	-1258/+1230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167
*	MachineVerifier: Check that SlotIndex MBBIndexList is sorted.	Matthias Braun	2015-09-09	1	-0/+17
\| \| \| \| \| \| \|	This introduces a check that the MBBIndexList is sorted as proposed in http://reviews.llvm.org/D12443 but split up into a separate commit. llvm-svn: 247166
*	AMDGPU: Extract full 64-bit subregister and use subregs	Matt Arsenault	2015-09-09	1	-35/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of extracting both 32-bit components from the 128-bit register. This produces fewer copies and is easier for the copy peephole optimizer to understand and see the actual uses as extracts from a reg_sequence. This avoids needing to handle subregister composing in the PeepholeOptimizer's ValueTracker for this case. llvm-svn: 247162
*	AMDGPU: Remove unused multiclass argument	Matt Arsenault	2015-09-09	1	-5/+4
\| \| \| \|	llvm-svn: 247161
*	[WebAssembly] Implement calls with void return types.	Dan Gohman	2015-09-09	4	-8/+17
\| \| \| \|	llvm-svn: 247158
*	AMDGPU/SI: Fold operands through REG_SEQUENCE instructions	Tom Stellard	2015-09-09	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157
*	[CostModel][AArch64] Remove amortization factor for some of the vector ↵	Silviu Baranga	2015-09-09	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	select instructions Summary: We are not scalarizing the wide selects in codegen for i16 and i32 and therefore we can remove the amortization factor. We still have issues with i64 vectors in codegen though. Reviewers: mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12724 llvm-svn: 247156
*	don't repeat function names in comments; NFC	Sanjay Patel	2015-09-09	4	-33/+27
\| \| \| \|	llvm-svn: 247154
*	[WebAssembly] Tidy up some unneeded newline characters.	Dan Gohman	2015-09-09	1	-10/+9
\| \| \| \|	llvm-svn: 247152
*	function names start with a lower case letter; NFC	Sanjay Patel	2015-09-09	1	-54/+54
\| \| \| \|	llvm-svn: 247150
*	AVX512: Implemented encoding and intrinsics for	Igor Breger	2015-09-09	1	-52/+109
\| \| \| \| \| \| \| \| \|	vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149
*	don't repeat function names in comments; NFC	Sanjay Patel	2015-09-09	1	-35/+32
\| \| \| \|	llvm-svn: 247148
*	[mips][microMIPS] Implement ADDU16, AND16, ANDI16, NOT16, OR16, SLL16 and ↵	Zoran Jovanovic	2015-09-09	5	-11/+120
\| \| \| \| \| \| \| \|	SRL16 instructions Differential Revision: http://reviews.llvm.org/D11178 llvm-svn: 247146
*	Fix PR 24633 - Handle undef values when parsing standalone constants.	Alex Lorenz	2015-09-09	1	-0/+1
\| \| \| \|	llvm-svn: 247145
*	Rename ExitCount to BackedgeTakenCount, because that's what it is.	James Molloy	2015-09-09	1	-8/+9
\| \| \| \| \| \|	We called a variable ExitCount, stored the backedge count in it, then redefined it to be the exit count again. llvm-svn: 247140
*	Delay predication of stores until near the end of vector code generation	James Molloy	2015-09-09	1	-56/+28
\| \| \| \| \| \| \| \|	Predicating stores requires creating extra blocks. It's much cleaner if we do this in one pass instead of mutating the CFG while writing vector instructions. Besides which we can make use of helper functions to update domtree for us, reducing the work we need to do. llvm-svn: 247139
*	Fix vector splitting for extract_vector_elt and vector elements of <8-bits.	Daniel Sanders	2015-09-09	2	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: One of the vector splitting paths for extract_vector_elt tries to lower: define i1 @via_stack_bug(i8 signext %idx) { %1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx ret i1 %1 } to: define i1 @via_stack_bug(i8 signext %idx) { %base = alloca <2 x i1> store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base %2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx %3 = load i1, i1* %2 ret i1 %3 } However, the elements of <2 x i1> are not byte-addressible. The result of this is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies to '%base + %idx * 0', and then simply '%base' causing all values of %idx to extract element zero. This commit fixes this by promoting the vector elements of <8-bits to i8 before splitting the vector. This fixes a number of test failures in pocl. Reviewers: pekka.jaaskelainen Subscribers: pekka.jaaskelainen, llvm-commits Differential Revision: http://reviews.llvm.org/D12591 llvm-svn: 247128
*	Fix a typo I spotted when hacking on SROA. Somewhat alarming that	Chandler Carruth	2015-09-09	1	-1/+1
\| \| \| \| \| \|	nothing broke. llvm-svn: 247127
*	[mips][microMIPS] Implement CACHEE and PREFE instructions	Zoran Jovanovic	2015-09-09	4	-8/+53
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D11628 llvm-svn: 247125
*	AMDGPU: Fix not encoding src2 of VOP3b instructions	Matt Arsenault	2015-09-09	1	-4/+4
\| \| \| \| \| \| \|	Broken by r247074. Should include an assembler test, but the assembler is currently broken for VOP3b apparently. llvm-svn: 247123
*	[IRCE] Add INITIALIZE_PASS_DEPENDENCY invocations.	Sanjoy Das	2015-09-09	1	-2/+9
\| \| \| \| \| \|	IRCE was just using INITIALIZE_PASS(), which is incorrect. llvm-svn: 247122
*	[RuntimeDyld] Add support for MachO x86_64 SUBTRACTOR relocation.	Lang Hames	2015-09-09	1	-1/+50
\| \| \| \|	llvm-svn: 247119
*	[WebAssembly] Fix lowering of calls with more than one argument.	Dan Gohman	2015-09-09	1	-2/+5
\| \| \| \|	llvm-svn: 247118
*	SelectionDAG: Support Expand of f16 extloads	Matt Arsenault	2015-09-09	3	-30/+27
\| \| \| \| \| \| \| \| \| \|	Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113
*	[WebAssembly] Implement WebAssemblyInstrInfo::copyPhysReg	Dan Gohman	2015-09-09	3	-22/+38
\| \| \| \|	llvm-svn: 247110