bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Do not verify dominator tree if it has no roots	Serge Pavlov	2017-01-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033
*	Implemented color coding and Vertex labels in XRay Graph	Dean Michael Berris	2017-01-25	4	-2/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A patch to enable the llvm-xray graph subcommand to color edges and vertices based on statistics and to annotate vertices with statistics. Depends on D27243 Reviewers: dblaikie, dberris Reviewed By: dberris Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28225 llvm-svn: 293031
*	[X86]Enable the use of 'mov' with a 64bit GPR and a large immediate	Coby Tayree	2017-01-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Enable the next form (intel style): "mov <reg64>, <largeImm>" which is should be available, where <largeImm> stands for immediates which exceed the range of a singed 32bit integer Differential Revision: https://reviews.llvm.org/D28988 llvm-svn: 293030
*	AMDGPU: Check nsz instead of unsafe math	Matt Arsenault	2017-01-25	2	-3/+3
\| \| \| \|	llvm-svn: 293028
*	[SimplifyCFG] Do not sink and merge inline-asm instructions.	Akira Hatanaka	2017-01-25	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conservatively disable sinking and merging inline-asm instructions as doing so can potentially create arguments that cannot satisfy the inline-asm constraints. For example, SimplifyCFG used to do the following transformation: (before) if.then: %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8) br label %if.end if.else: %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6) br label %if.end (after) %.sink = select i1 %tobool, i32 6, i32 8 %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink) This would result in a crash in the backend since only immediate integer operands are permitted for constraint "n". rdar://problem/30110806 Differential Revision: https://reviews.llvm.org/D29111 llvm-svn: 293025
*	DAG: Recognize no-signed-zeros-fp-math attribute	Matt Arsenault	2017-01-25	3	-1/+81
\| \| \| \| \| \| \| \|	clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
*	Ignore llvm/test/tools/llvm-symbolizer/coff-exports.test on mingw.	NAKAMURA Takumi	2017-01-25	1	-1/+3
\| \| \| \| \| \| \|	FIXME: Demangler could behave along not host but target. For example, assume host=mingw, target=msc. llvm-svn: 293021
*	DAGCombiner: Allow negating ConstantFP after legalize	Matt Arsenault	2017-01-25	1	-2/+1
\| \| \| \|	llvm-svn: 293019
*	[InstCombine] Added regression test to narrow-swich.ll	Gerolf Hoflehner	2017-01-25	1	-0/+42
\| \| \| \|	llvm-svn: 293018
*	AMDGPU: Implement early ifcvt target hooks.	Matt Arsenault	2017-01-25	4	-2/+567
\| \| \| \| \| \| \| \| \| \| \| \|	Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
*	gold-plugin: Add the file path to the file open error diagnostic.	Peter Collingbourne	2017-01-25	1	-0/+8
\| \| \| \|	llvm-svn: 293013
*	[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls	Chandler Carruth	2017-01-25	17	-1/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011
*	[GlobalISel] Generate selector for more integer binop patterns.	Ahmed Bougacha	2017-01-25	1	-2/+2
\| \| \| \| \| \| \| \|	This surprisingly isn't NFC because there are patterns to select GPR sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to SUB if NZCV is dead. From ISel's perspective, both are fine. llvm-svn: 293010
*	[coroutines] Spill the result of the invoke instruction correctly	Gor Nishanov	2017-01-25	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction). In the following example, we will split the edge going into %cont and insert the spill there. ``` %r = invoke double @print(double 0.0) to label %cont unwind label %pad cont: %0 = call i8 @llvm.coro.suspend(token none, i1 false) switch i8 %0, label %suspend [i8 0, label %resume i8 1, label %cleanup] resume: call double @print(double %r) ``` Reviewers: majnemer Reviewed By: majnemer Subscribers: mehdi_amini, llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D29102 llvm-svn: 293006
*	GlobalISel: Use the correct types when translating landingpad instructions	Justin Bogner	2017-01-25	2	-2/+46
\| \| \| \| \| \| \| \| \| \| \|	There was a bug here where we were using p0 instead of s32 for the selector type in the landingpad. Instead of hardcoding these types we should get the types from the landingpad instruction directly. Note that we replicate an assert from SDAG here to only support two-valued landingpads. llvm-svn: 292995
*	Fix llvm-objdump so it picks a good CPU based for Mach-O files	Kevin Enderby	2017-01-24	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	for CPU_SUBTYPE_ARM_V7S and CPU_SUBTYPE_ARM_V7K. For these two cpusubtypes they should default to a cortex-a7 CPU to give proper disassembly without a -mcpu= flag. rdar://27431703 llvm-svn: 292993
*	AMDGPU: Remove spurious out branches after a kill	Matt Arsenault	2017-01-24	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
*	Revert rL292621. Caused some internal build bot failures in apple.	Wei Mi	2017-01-24	3	-454/+0
\| \| \| \|	llvm-svn: 292984
*	Enable FeatureFlatForGlobal on Volcanic Islands	Matt Arsenault	2017-01-24	273	-377/+374
\| \| \| \| \| \| \| \| \| \| \|	This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
*	Explicitly promote indirect calls before sample profile annotation.	Dehao Chen	2017-01-24	2	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 292979
*	Revert [AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8.	Ivan Krasin	2017-01-24	3	-241206/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Reason: broke ASAN bots with a global buffer overflow. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/2291 Each test contains 20-30K test cases but takes only several (from 4 to 10) seconds to complete on average machine. The tests cover the majority of AMDGPU Gfx7/Gfx8 instructions, including many dark corners, and intended to quickly find out if something is broken. llvm-svn: 292974
*	Remove the load hoisting code of MLSM, it is completely subsumed by GVNHoist	Daniel Berlin	2017-01-24	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GVNHoist performs all the optimizations that MLSM does to loads, in a more general way, and in a faster time bound (MLSM is N^3 in most cases, N^4 in a few edge cases). This disables the load portion. Note that the way ld_hoist_st_sink.ll is written makes one think that the loads should be moved to the while.preheader block, but 1. Neither MLSM nor GVNHoist do it (they both move them to identical places). 2. MLSM couldn't possibly do it anyway, as the while.preheader block is not the head of the diamond, while.body is. (GVNHoist could do it if it was legal). 3. At a glance, it's not legal anyway because the in-loop load conflict with the in-loop store, so the loads must stay in-loop. I am happy to update the test to use update_test_checks so that checking is tighter, just was going to do it as a followup. Note that i can find no particular benefit to the store portion on any real testcase/benchmark i have (even size-wise). If we really still want it, i am happy to commit to writing a targeted store sinker, just taking the code from the MemorySSA port of MergedLoadStoreMotion (which is N^2 worst case, and N most of the time). We can do what it does in a much better time bound. We also should be both hoisting and sinking stores, not just sinking them, anyway, since whether we should hoist or sink to merge depends basically on luck of the draw of where the blockers are placed. Nonetheless, i have left it alone for now. Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29079 llvm-svn: 292971
*	AMDGPU/SI: Give up in promote alloca when a pointer may be captured.	Changpeng Fang	2017-01-24	1	-0/+47
\| \| \| \| \| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
*	[AMDGPU] Add VGPR copies post regalloc fix pass	Stanislav Mekhanoshin	2017-01-24	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
*	[AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'	Evandro Menezes	2017-01-24	1	-1/+1
\| \| \| \| \| \| \|	In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954
*	[InstSimplify] try to eliminate icmp Pred (add nsw X, C1), C2	Sanjay Patel	2017-01-24	1	-21/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was surprised to see that we're missing icmp folds based on 'add nsw' in InstCombine, but we should handle the InstSimplify cases first because that could make the InstCombine code simpler. Here are Alive-based proofs for the logic: Name: add_neg_constant Pre: C1 < 0 && (C2 > ((1<<(width(C1)-1)) + C1)) %a = add nsw i7 %x, C1 %b = icmp sgt %a, C2 => %b = false Name: add_pos_constant Pre: C1 > 0 && (C2 < ((1<<(width(C1)-1)) + C1 - 1)) %a = add nsw i6 %x, C1 %b = icmp slt %a, C2 => %b = false Name: nuw Pre: C1 u>= C2 %a = add nuw i11 %x, C1 %b = icmp ult %a, C2 => %b = false Differential Revision: https://reviews.llvm.org/D29053 llvm-svn: 292952
*	[X86][AVX2] Regenerate test.	Simon Pilgrim	2017-01-24	1	-2/+1
\| \| \| \|	llvm-svn: 292950
*	[CodeView] Fix off-by-one error in def range gap emission	Reid Kleckner	2017-01-24	1	-0/+24
\| \| \| \| \| \| \| \| \|	Also fixes a much worse bug where we emitted the wrong gap size for the def range uncovered by the test for this issue. Fixes PR31726. llvm-svn: 292949
*	[X86][AVX2] Removed FIXME comment and regenerated test.	Simon Pilgrim	2017-01-24	1	-5/+1
\| \| \| \| \| \|	The comment talked about replacing vpmovzxwd+vpslld+vpsrad with vpmovsxwd - which isn't valid as we're sign extending a <8 x i1> bool vector not an all/nobits <8 x i16> llvm-svn: 292948
*	[X86][AVX2] Cleaned up test triple and regenerated tests.	Simon Pilgrim	2017-01-24	1	-14/+2
\| \| \| \|	llvm-svn: 292946
*	[SelectionDAG] Handle inverted conditions when splitting into multiple branches.	Geoff Berry	2017-01-24	3	-5/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y \| Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944
*	[PH] Replace uses of AssertingVH from members of analysis results with	Chandler Carruth	2017-01-24	4	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a lazy-asserting PoisoningVH. AssertVH is fundamentally incompatible with cache-invalidation of analysis results. The invaliadtion happens after the AssertingVH has already fired. Instead, use a PoisoningVH that will assert if the dangling handle is ever used rather than merely be assigned or destroyed. This patch also removes all of the (numerous) doomed attempts to work around this fundamental incompatibility. It is a pretty significant simplification IMO. The most interesting change is in the Inliner where we still do some clearing because we don't want to rely on the coarse grained invalidation strategy of the containing pass manager. However, I prefer the approach that contains this logic to the cleanup phase of the Inliner, and I think we could enhance the CGSCC analysis management layer to make this even better in the future if desired. The rest is straight cleanup. I've also added a test for one of the harder cases to work around: when a module analysis contains many AssertingVHes pointing at functions. Differential Revision: https://reviews.llvm.org/D29006 llvm-svn: 292928
*	[AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8.	Artem Tamazov	2017-01-24	3	-0/+241206
\| \| \| \| \| \| \| \| \|	Each test contains 20-30K test cases but takes only several (from 4 to 10) seconds to complete on average machine. The tests cover the majority of AMDGPU Gfx7/Gfx8 instructions, including many dark corners, and intended to quickly find out if something is broken. llvm-svn: 292922
*	[X86][SSE] Add support for constant folding vector arithmetic shift by ↵	Simon Pilgrim	2017-01-24	3	-45/+38
\| \| \| \| \| \|	immediates llvm-svn: 292919
*	[X86][SSE] Add support for constant folding vector logical shift by immediates	Simon Pilgrim	2017-01-24	12	-209/+166
\| \| \| \|	llvm-svn: 292915
*	[SLP] Additional test for checking that instruction with extra args is	Alexey Bataev	2017-01-24	1	-0/+57
\| \| \| \| \| \|	not reconstructed. llvm-svn: 292911
*	Update domtree incrementally in loop peeling.	Serge Pavlov	2017-01-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	With this change dominator tree remains in sync after each step of loop peeling. Differential Revision: https://reviews.llvm.org/D29029 llvm-svn: 292895
*	AMDGPU : Add trap handler support.	Wei Ding	2017-01-24	1	-5/+4
\| \| \| \|	llvm-svn: 292893
*	[PM] Further fixes to the test case in r292863.	Chandler Carruth	2017-01-24	1	-1/+1
\| \| \| \| \| \|	This should hopefully fix the MSVC failures remaining. llvm-svn: 292887
*	[SelectionDAG] Teach getNode to simplify a couple easy cases of ↵	Craig Topper	2017-01-24	4	-301/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876
*	[PM] Try to make all three compilers happy when it comes to pretty printing.	Davide Italiano	2017-01-24	1	-14/+11
\| \| \| \| \| \| \|	Modeled after a similar change from Michael Kuperstein. Let's hope this sticks together. llvm-svn: 292872
*	LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.	Matthias Braun	2017-01-24	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A register unit may be allocatable and non-reserved but some of the register(tuples) built with it are reserved. We still need to calculate liveness in this case. Note to out of tree targets: If you start seeing machine verifier errors with this commit, it probably means that you do not properly mark super registers of reserved register as reserved. See for example r292836 or r292870 for example on how to fix that. rdar://29996737 Differential Revision: https://reviews.llvm.org/D28881 llvm-svn: 292871
*	[LTO] Add test to show up we don't support ThinLTO yet.	Davide Italiano	2017-01-24	1	-0/+13
\| \| \| \|	llvm-svn: 292865
*	[LTO] Teach lib/LTO about the new pass manager.	Davide Italiano	2017-01-24	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28997 llvm-svn: 292864
*	[PM] Flesh out the new pass manager LTO pipeline.	Davide Italiano	2017-01-24	2	-7/+105
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28996 llvm-svn: 292863
*	[sanitizer-coverage] emit __sanitizer_cov_trace_pc_guard w/o a preceding ↵	Kostya Serebryany	2017-01-24	2	-3/+3
\| \| \| \| \| \|	'if' by default. Update the docs, also add deprecation notes around other parts of sanitizer coverage llvm-svn: 292862
*	SimplifyLibCalls: Replace more unary libcalls with intrinsics	Matt Arsenault	2017-01-23	3	-123/+276
\| \| \| \|	llvm-svn: 292855
*	[LoopUnroll] First form LCSSA, then loop-simplify	Michael Kuperstein	2017-01-23	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	Running non-LCSSA-preserving LoopSimplify followed by LCSSA on (roughly) the same loop is incorrect, since LoopSimplify may break LCSSA arbitrarily higher in the loop nest. Instead, run LCSSA first, and then run LCSSA-preserving LoopSimplify on the result. This fixes PR31718. Differential Revision: https://reviews.llvm.org/D29055 llvm-svn: 292854
*	AMDGPU: Custom lower more vector operations	Matt Arsenault	2017-01-23	4	-18/+513
\| \| \| \| \| \|	This avoids stack usage. llvm-svn: 292846
*	DAG: Don't fold vector extract into load if target doesn't want to	Matt Arsenault	2017-01-23	1	-0/+31
\| \| \| \| \| \| \|	Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842