bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use shouldAssumeDSOLocal in classifyGlobalReference.	Rafael Espindola	2017-01-26	2	-4/+3
\| \| \| \| \| \| \| \| \| \|	And teach shouldAssumeDSOLocal that ppc has no copy relocations. The resulting code handle a few more case than before. For example, it knows that a weak symbol can be resolved to another .o file, but it will still be in the main executable. llvm-svn: 293180
*	[X86][SSE] Add support for combining ANDNP byte masks with target shuffles	Simon Pilgrim	2017-01-26	1	-33/+21
\| \| \| \|	llvm-svn: 293178
*	[AMDGPU] Fix typo in GCNSchedStrategy	Valery Pykhtin	2017-01-26	1	-8/+3
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
*	Revert "[mips] N64 static relocation model support"	Simon Dardis	2017-01-26	37	-282/+212
\| \| \| \| \| \|	This reverts commit r293164. There are multiple tests failing. llvm-svn: 293170
*	[mips] N64 static relocation model support	Simon Dardis	2017-01-26	37	-212/+282
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes one change to GOT handling and two changes to N64's relocation model handling. Furthermore, the jumptable encodings have been corrected for static N64. Big GOT handling is now done via a new SDNode MipsGotHi - this node is unconditionally lowered to an lui instruction. The first change to N64's relocation handling is the lifting of the restriction that N64 always uses PIC. Now it is possible to target static environments. The second change adds support for 64 bit symbols and enables them by default. Previously N64 had patterns for sym32 mode only. In this mode all symbols are assumed to have 32 bit addresses. sym32 mode support is selectable with attribute 'sym32'. A follow on patch for clang will add the necessary frontend parameter. This partially resolves PR/23485. Thanks to Brooks Davis for reporting the issue! Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D23652 llvm-svn: 293164
*	[ARM] GlobalISel: Load i1, i8 and i16 args from stack	Diana Picus	2017-01-26	3	-8/+84
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for loading i1, i8 and i16 arguments from the stack, with or without the ABI extension flags. When the ABI extension flags are present, we load a 4-byte value, otherwise we preserve the size of the load and let the instruction selector replace it with a LDRB/LDRH. This generates the same thing as DAGISel. Differential Revision: https://reviews.llvm.org/D27803 llvm-svn: 293163
*	AMDGPU: Fold fneg into round instructions	Matt Arsenault	2017-01-26	3	-11/+99
\| \| \| \|	llvm-svn: 293127
*	[ImplicitNullChecks] Add a test demonstrating a case we don't get today	Sanjoy Das	2017-01-26	1	-0/+46
\| \| \| \|	llvm-svn: 293126
*	[llc] Add -pass-remarks-output	Adam Nemet	2017-01-26	1	-0/+42
\| \| \| \| \| \| \|	This is the opt/llc counterpart of -fsave-optimization-record to output optimization remarks in a YAML file. llvm-svn: 293121
*	[llc] Add -pass-remarks-with-hotness	Adam Nemet	2017-01-25	1	-7/+17
\| \| \| \| \| \|	Analogous to the code in opt, this enables hotness in opt-remarks. llvm-svn: 293113
*	New OptimizationRemarkEmitter pass for MIR	Adam Nemet	2017-01-25	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows MIR passes to emit optimization remarks with the same level of functionality that is available to IR passes. It also hooks up the greedy register allocator to report spills. This allows for interesting use cases like increasing interleaving on a loop until spilling of registers is observed. I still need to experiment whether reporting every spill scales but this demonstrates for now that the functionality works from llc using -pass-remarks*=<pass>. Differential Revision: https://reviews.llvm.org/D29004 llvm-svn: 293110
*	SDag: fix how initial loads are formed when splitting vector ops.	Tim Northover	2017-01-25	1	-0/+10
\| \| \| \| \| \| \| \|	Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. llvm-svn: 293088
*	AMDGPU: Set call_convention bit in kernel_code_t	Matt Arsenault	2017-01-25	1	-0/+2
\| \| \| \| \| \| \|	According to the documentation this is supposed to be -1 if indirect calls are not supported. llvm-svn: 293081
*	[XRay][AArch64] More staging for tail call support in XRay on AArch64 - in LLVM	Serge Rogatch	2017-01-25	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch. Related: https://reviews.llvm.org/D28948 (compiler-rt) Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson Differential Revision: https://reviews.llvm.org/D28947 llvm-svn: 293080
*	Revert "Do not verify dominator tree if it has no roots"	Chad Rosier	2017-01-25	1	-1/+1
\| \| \| \| \| \| \|	This reverts commit r293033, per Danny's comment. In short, we require domtrees to have roots at all times. llvm-svn: 293075
*	[DAGCombiner] Match load by bytes idiom and fold it into a single load. ↵	Artur Pilipenko	2017-01-25	5	-0/+1582
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attempt #2. The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036
*	[ARM] GlobalISel: Support i1 add and ABI extensions	Diana Picus	2017-01-25	4	-0/+113
\| \| \| \| \| \| \| \| \| \| \|	Add support for: * i1 add * i1 function arguments, if passed through registers * i1 returns, with ABI signext/zeroext Differential Revision: https://reviews.llvm.org/D27706 llvm-svn: 293035
*	[ARM] GlobalISel: Support i8/i16 ABI extensions	Diana Picus	2017-01-25	4	-0/+141
\| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment, this means supporting the signext/zeroext attribute on the return type of the function. For function arguments, signext/zeroext should be handled by the caller, so there's nothing for us to do until we start lowering calls. Note that this does not include support for other extensions (i8 to i16), those will be added later. Differential Revision: https://reviews.llvm.org/D27705 llvm-svn: 293034
*	Do not verify dominator tree if it has no roots	Serge Pavlov	2017-01-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033
*	AMDGPU: Check nsz instead of unsafe math	Matt Arsenault	2017-01-25	2	-3/+3
\| \| \| \|	llvm-svn: 293028
*	DAG: Recognize no-signed-zeros-fp-math attribute	Matt Arsenault	2017-01-25	3	-1/+81
\| \| \| \| \| \| \| \|	clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
*	DAGCombiner: Allow negating ConstantFP after legalize	Matt Arsenault	2017-01-25	1	-2/+1
\| \| \| \|	llvm-svn: 293019
*	AMDGPU: Implement early ifcvt target hooks.	Matt Arsenault	2017-01-25	4	-2/+567
\| \| \| \| \| \| \| \| \| \| \| \|	Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
*	[GlobalISel] Generate selector for more integer binop patterns.	Ahmed Bougacha	2017-01-25	1	-2/+2
\| \| \| \| \| \| \| \|	This surprisingly isn't NFC because there are patterns to select GPR sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to SUB if NZCV is dead. From ISel's perspective, both are fine. llvm-svn: 293010
*	GlobalISel: Use the correct types when translating landingpad instructions	Justin Bogner	2017-01-25	2	-2/+46
\| \| \| \| \| \| \| \| \| \| \|	There was a bug here where we were using p0 instead of s32 for the selector type in the landingpad. Instead of hardcoding these types we should get the types from the landingpad instruction directly. Note that we replicate an assert from SDAG here to only support two-valued landingpads. llvm-svn: 292995
*	AMDGPU: Remove spurious out branches after a kill	Matt Arsenault	2017-01-24	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
*	Revert rL292621. Caused some internal build bot failures in apple.	Wei Mi	2017-01-24	3	-454/+0
\| \| \| \|	llvm-svn: 292984
*	Enable FeatureFlatForGlobal on Volcanic Islands	Matt Arsenault	2017-01-24	273	-377/+374
\| \| \| \| \| \| \| \| \| \| \|	This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
*	AMDGPU/SI: Give up in promote alloca when a pointer may be captured.	Changpeng Fang	2017-01-24	1	-0/+47
\| \| \| \| \| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
*	[AMDGPU] Add VGPR copies post regalloc fix pass	Stanislav Mekhanoshin	2017-01-24	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
*	[AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'	Evandro Menezes	2017-01-24	1	-1/+1
\| \| \| \| \| \| \|	In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954
*	[X86][AVX2] Regenerate test.	Simon Pilgrim	2017-01-24	1	-2/+1
\| \| \| \|	llvm-svn: 292950
*	[X86][AVX2] Removed FIXME comment and regenerated test.	Simon Pilgrim	2017-01-24	1	-5/+1
\| \| \| \| \| \|	The comment talked about replacing vpmovzxwd+vpslld+vpsrad with vpmovsxwd - which isn't valid as we're sign extending a <8 x i1> bool vector not an all/nobits <8 x i16> llvm-svn: 292948
*	[X86][AVX2] Cleaned up test triple and regenerated tests.	Simon Pilgrim	2017-01-24	1	-14/+2
\| \| \| \|	llvm-svn: 292946
*	[SelectionDAG] Handle inverted conditions when splitting into multiple branches.	Geoff Berry	2017-01-24	3	-5/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y \| Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944
*	[X86][SSE] Add support for constant folding vector arithmetic shift by ↵	Simon Pilgrim	2017-01-24	3	-45/+38
\| \| \| \| \| \|	immediates llvm-svn: 292919
*	[X86][SSE] Add support for constant folding vector logical shift by immediates	Simon Pilgrim	2017-01-24	12	-209/+166
\| \| \| \|	llvm-svn: 292915
*	AMDGPU : Add trap handler support.	Wei Ding	2017-01-24	1	-5/+4
\| \| \| \|	llvm-svn: 292893
*	[SelectionDAG] Teach getNode to simplify a couple easy cases of ↵	Craig Topper	2017-01-24	4	-301/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876
*	LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.	Matthias Braun	2017-01-24	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A register unit may be allocatable and non-reserved but some of the register(tuples) built with it are reserved. We still need to calculate liveness in this case. Note to out of tree targets: If you start seeing machine verifier errors with this commit, it probably means that you do not properly mark super registers of reserved register as reserved. See for example r292836 or r292870 for example on how to fix that. rdar://29996737 Differential Revision: https://reviews.llvm.org/D28881 llvm-svn: 292871
*	AMDGPU: Custom lower more vector operations	Matt Arsenault	2017-01-23	4	-18/+513
\| \| \| \| \| \|	This avoids stack usage. llvm-svn: 292846
*	DAG: Don't fold vector extract into load if target doesn't want to	Matt Arsenault	2017-01-23	1	-0/+31
\| \| \| \| \| \| \|	Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
*	[APFloat] Switch from (PPCDoubleDoubleImpl, IEEEdouble) layout to ↵	Tim Shen	2017-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(IEEEdouble, IEEEdouble) Summary: This patch changes the layout of DoubleAPFloat, and adjust all operations to do either: 1) (IEEEdouble, IEEEdouble) -> (uint64_t, uint64_t) -> PPCDoubleDoubleImpl, then run the old algorithm. 2) Do the right thing directly. 1) includes multiply, divide, remainder, mod, fusedMultiplyAdd, roundToIntegral, convertFromString, next, convertToInteger, convertFromAPInt, convertFromSignExtendedInteger, convertFromZeroExtendedInteger, convertToHexString, toString, getExactInverse. 2) includes makeZero, makeLargest, makeSmallest, makeSmallestNormalized, compare, bitwiseIsEqual, bitcastToAPInt, isDenormal, isSmallest, isLargest, isInteger, ilogb, scalbn, frexp, hash_value, Profile. I could split this into two patches, e.g. use 1) for all operatoins first, then incrementally change some of them to 2). I didn't do that, because 1) involves code that converts data between PPCDoubleDoubleImpl and (IEEEdouble, IEEEdouble) back and forth, and may pessimize the compiler. Instead, I find easy functions and use approach 2) for them directly. Next step is to implement move multiply and divide from 1) to 2). I don't have plans for other functions in 1). Differential Revision: https://reviews.llvm.org/D27872 llvm-svn: 292839
*	AMDGPU: Combine fp16/fp64 subtarget features	Matt Arsenault	2017-01-23	10	-45/+90
\| \| \| \| \| \| \|	The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
*	[AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.	Ahmed Bougacha	2017-01-23	1	-6/+12
\| \| \| \| \| \| \| \| \|	Since we're now avoiding operations using narrow scalar integer types, we have to legalize the integer side of the FP conversions. This requires teaching the legalizer how to do that. llvm-svn: 292828
*	[AArch64][GlobalISel] Legalize narrow scalar ops again.	Ahmed Bougacha	2017-01-23	9	-415/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since r279760, we've been marking as legal operations on narrow integer types that have wider legal equivalents (for instance, G_ADD s8). Compared to legalizing these operations, this reduced the amount of extends/truncates required, but was always a weird legalization decision made at selection time. So far, we haven't been able to formalize it in a way that permits the selector generated from SelectionDAG patterns to be sufficient. Using a wide instruction (say, s64), when a narrower instruction exists (s32) would introduce register class incompatibilities (when one narrow generic instruction is selected to the wider variant, but another is selected to the narrower variant). It's also impractical to limit which narrow operations are matched for which instruction, as restricting "narrow selection" to ranges of types clashes with potentially incompatible instruction predicates. Concerns were also raised regarding MIPS64's sign-extended register assumptions, as well as wrapping behavior. See discussions in https://reviews.llvm.org/D26878. Instead, legalize the operations. Should we ever revert to selecting these narrow operations, we should try to represent this more accurately: for instance, by separating a "concrete" type on operations, and an "underlying" type on vregs, we could move the "this narrow-looking op is really legal" decision to the legalizer, and let the selector use the "underlying" vreg type only, which would be guaranteed to map to a register class. In any case, we eventually should mitigate: - the performance impact by selecting no-op extract/truncates to COPYs (which we currently do), and the COPYs to register reuses (which we don't do yet). - the compile-time impact by optimizing away extract/truncate sequences in the legalizer. llvm-svn: 292827
*	[ARM] Classification Improvements to ARM Sched-Models. NFCI.	Javed Absar	2017-01-23	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a series of patches to enable adding of machine sched models for ARM processors easier and compact. They define new sched-readwrites for groups of ARM instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. The current patch focuses on floating-point instructions. Reviewers: Diana Picus (rovka), Renato Golin (rengolin) Differential Revision: https://reviews.llvm.org/D28194 llvm-svn: 292825
*	DAG: Allow legalization of fcanonicalize vector types	Matt Arsenault	2017-01-23	1	-0/+214
\| \| \| \|	llvm-svn: 292814
*	[X86][SSE] Add missing X86ISD::ANDNP combines.	Simon Pilgrim	2017-01-22	2	-41/+7
\| \| \| \|	llvm-svn: 292767
*	[X86][SSE] Improve shuffle combining with zero insertions	Simon Pilgrim	2017-01-22	2	-59/+30
\| \| \| \| \| \|	Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766