bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes	Dmitry Preobrazhensky	2019-03-29	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59878 llvm-svn: 357249
*	AMDGPU: Make sram-ecc off by default for Vega20	Konstantin Zhuravlyov	2019-03-29	1	-1/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247
*	[X86] Add X86TargetLowering::isCommutativeBinOp override.	Simon Pilgrim	2019-03-29	2	-0/+13
\| \| \| \| \| \|	We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244
*	[PowerPC] Add the support for __builtin_setrnd()	Kang Zhang	2019-03-29	2	-0/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241
*	[ScheduleDAG] Move `Topo` and `addEdge` to base class.	Clement Courbet	2019-03-29	1	-3/+1
\| \| \| \| \| \| \| \| \|	Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239
*	AMDGPU/GlobalISel: Insert waterfall loop for vector indexing	Matt Arsenault	2019-03-29	2	-0/+174
\| \| \| \| \| \| \| \|	The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235
*	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub ↵	Zi Xuan Wu	2019-03-29	2	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233
*	[WebAssembly] Merge used feature sets, update atomics linkage policy	Thomas Lively	2019-03-29	7	-62/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226
*	[BPF] add proper multi-dimensional array support	Yonghong Song	2019-03-28	2	-35/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215
*	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << ↵	Craig Topper	2019-03-28	1	-23/+29
\| \| \| \| \| \| \| \| \| \| \| \|	C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196
*	Delay initialization of three static global maps, NFC	Reid Kleckner	2019-03-28	1	-23/+24
\| \| \| \| \| \| \|	This avoids allocating a few KB of heap memory on startup, and instead allocates these maps lazily. I noticed this while profiling LLD. llvm-svn: 357192
*	[MIPS GlobalISel] Select float constants	Petar Avramovic	2019-03-28	3	-4/+71
\| \| \| \| \| \| \| \|	Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183
*	[x86] avoid cmov in movmsk reduction	Sanjay Patel	2019-03-28	1	-19/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172
*	[X86MacroFusion] Handle branch fusion (AMD CPUs).	Clement Courbet	2019-03-28	5	-56/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
*	AMDGPU: Make exec mask optimzations more resistant to block splits	Matt Arsenault	2019-03-28	3	-22/+84
\| \| \| \| \| \| \|	Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170
*	[X86] AMD Piledriver (BdVer2): fine-tune some latencies	Roman Lebedev	2019-03-28	1	-28/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on llvm-exegesis measurements. Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter, it is now possible to continue cleanup of the scheduler model. With this, there are no more latency inconsistencies for the opcodes that produce stable measurements, and only a few inconsistencies for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.) llvm-svn: 357169
*	[NFC] Format InlineFeatureIgnoreList.	Clement Courbet	2019-03-28	1	-51/+52
\| \| \| \| \| \|	To avoid more spurious clang-format changes when adding features (D59872). llvm-svn: 357168
*	[X85][AVX] Add missing vXi16 broadcast fold patterns	Simon Pilgrim	2019-03-28	2	-0/+24
\| \| \| \| \| \| \| \|	Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155
*	[ARM GlobalISel] Fix G_STORE with s1	Diana Picus	2019-03-28	1	-0/+18
\| \| \| \| \| \| \|	G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154
*	[ARM GlobalISel] Fix selection of G_SELECT	Diana Picus	2019-03-28	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \|	G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153
*	[WebAssembly] Rename wasm fixup kinds	Sam Clegg	2019-03-28	4	-14/+12
\| \| \| \| \| \| \| \| \| \| \|	These fixup kinds are not explicitly related to the code section. They are there to signal how to apply the fixup. Also, a couple of other minor wasm cleanups. Differential Revision: https://reviews.llvm.org/D59908 llvm-svn: 357145
*	[ARM] Remove dead function ARMMCCodeEmitter::getSOImmOpValue	Sam Clegg	2019-03-27	1	-34/+0
\| \| \| \| \| \| \| \| \|	The last reference to this function was removed from the ARM td files in 2015 in rL225266. Differential Revision: https://reviews.llvm.org/D59868 llvm-svn: 357130
*	[x86] improve AVX lowering of vector zext	Sanjay Patel	2019-03-27	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129
*	[x86] look through bitcast operand of MOVMSK	Sanjay Patel	2019-03-27	1	-6/+5
\| \| \| \| \| \| \| \| \|	This is not exactly NFC because it should make further combines of MOVMSK easier to match, but there should be no outward differences because we have isel patterns in place specifically to allow this. See: // Also support integer VTs to avoid a int->fp bitcast in the DAG. llvm-svn: 357128
*	[X86ISelDAGToDAG] Move initialization of OptForSize and OptForMinSize from ↵	Craig Topper	2019-03-27	1	-5/+7
\| \| \| \| \| \| \| \|	PreprocessISelDAG to runOnMachineFunction. NFCI This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created. llvm-svn: 357123
*	[LegalizeVectorTypes] Allow single loads and stores for more short vectors	Justin Bogner	2019-03-27	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120
*	[WebAssembly] Add some whitespace to WebAssemblyFixIrreducibleControlFlow	Alon Zakai	2019-03-27	1	-0/+2
\| \| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D59855 modified: llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp llvm-svn: 357117
*	[ARM] Don't confuse the scheduler for very large VLDMDIA etc.	Eli Friedman	2019-03-27	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109
*	[AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal.	Amara Emerson	2019-03-27	1	-1/+1
\| \| \| \|	llvm-svn: 357108
*	Reapply "AMDGPU: Scavenge register instead of findUnusedReg"	Matt Arsenault	2019-03-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reapplies r356149, using the correct overload of findUnusedReg which passes the current iterator. This worked most of the time, because the scavenger iterator was moved at the end of the frame index loop in PEI. This would fail if the spill was the first instruction. This was further hidden by the fact that the scavenger wasn't passed in for normal frame index elimination. llvm-svn: 357098
*	[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD	Craig Topper	2019-03-27	2	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096
*	[AArch64][SVE] Asm: error on unexpected SVE vector register type suffix	Sander de Smalen	2019-03-27	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an assembler bug that allowed SVE vector registers to contain a type suffix when not expected. The SVE unpredicated movprfx instruction is the only instruction affected. The following are examples of what was previously valid: movprfx z0.b, z0.b movprfx z0.b, z0.s movprfx z0, z0.s These instructions are now erroneous. Patch by Cullen Rhodes (c-rhodes) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357094
*	AMDGPU: Enable the scavenger for large frames	Matt Arsenault	2019-03-27	1	-5/+14
\| \| \| \| \| \| \|	Another test is needed for the case where the scavenge fail, but there's another issue with that which needs an additional fix. llvm-svn: 357093
*	AMDGPU: Add additional MIR tests for exec mask optimizations	Matt Arsenault	2019-03-27	1	-3/+11
\| \| \| \| \| \| \| \| \| \|	Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. llvm-svn: 357091
*	AMDGPU: Skip debug_instr when collapsing end_cf	Matt Arsenault	2019-03-27	1	-3/+8
\| \| \| \| \| \| \|	Based on how these are inserted, I doubt this was causing a problem in practice. llvm-svn: 357090
*	AMDGPU: Fix missing scc implicit def on s_andn2_b64_term	Matt Arsenault	2019-03-27	1	-18/+13
\| \| \| \| \| \| \|	Introduce new helper class to copy properties directly from the base instruction. llvm-svn: 357089
*	AMDGPU: Don't hardcode num defs for MUBUF instructions	Matt Arsenault	2019-03-27	1	-2/+2
\| \| \| \| \| \| \|	This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084
*	AMDGPU: wave_barrier is not isBarrier	Matt Arsenault	2019-03-27	1	-1/+0
\| \| \| \| \| \| \|	This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081
*	[BPF] use std::map to ensure consistent output	Yonghong Song	2019-03-27	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The .BTF.ext FuncInfoTable and LineInfoTable contain information organized per ELF section. Current definition of FuncInfoTable/LineInfoTable is: std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable where the key is the section name off in the string table. The unordered_map may cause the order of section output different for different platforms. The same for unordered map definition of std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries where BTF_KIND_DATASEC entries may have different ordering for different platforms. This patch fixed the issue by using std::map. Test static-var-derived-type.ll is modified to generate two DataSec's which will ensure the ordering is the same for all supported platforms. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 357077
*	AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics	Matt Arsenault	2019-03-27	1	-4/+11
\| \| \| \| \| \|	The offset operand index is different for atomics. llvm-svn: 357073
*	Revert of 357063 [AMDGPU][MC] Corrected handling of tied src for atomic ↵	Dmitry Preobrazhensky	2019-03-27	1	-7/+7
\| \| \| \| \| \| \|	return MUBUF opcodes Reason: the change was mistakenly committed before review llvm-svn: 357066
*	[AArch64] NFC: Cleanup isAArch64FrameOffsetLegal	Sander de Smalen	2019-03-27	2	-202/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup isAArch64FrameOffsetLegal by: - Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo(). - Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction has an unscaled variant. - Simplifying the logic that calculates the offset to fit the immediate. Reviewers: paquette, evandro, eli.friedman, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357064
*	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes	Dmitry Preobrazhensky	2019-03-27	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59305 llvm-svn: 357063
*	[AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo	Sander de Smalen	2019-03-27	1	-0/+6
\| \| \| \| \| \| \|	This patch also adds cases PRFUMi and PRFMui. This change was discussed in https://reviews.llvm.org/D59635. llvm-svn: 357059
*	Revert rL356864 : [X86][SSE41] Start shuffle combining from ↵	Simon Pilgrim	2019-03-27	1	-33/+28
\| \| \| \| \| \| \| \| \| \| \| \|	ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057
*	[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << ↵	Craig Topper	2019-03-27	1	-40/+9
\| \| \| \| \| \| \| \| \| \|	C1, go through the isel table instead of manually selecting. Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables. Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table. llvm-svn: 357049
*	[NFC][PowerPC] Custom PowerPC specific machine-scheduler	QingShan Zhang	2019-03-27	7	-1/+128
\| \| \| \| \| \| \| \| \| \|	This patch lays the groundwork for extending the generic machine scheduler by providing a PPC-specific implementation. There are no functional changes as this is an incremental patch that simply provides the necessary overrides which just encapsulate the behavior of the generic scheduler. Subsequent patches will add specific behavior. Differential Revision: https://reviews.llvm.org/D59284 llvm-svn: 357047
*	[X86] Simplify some code in matchBitExtract by using ANY_EXTEND.	Craig Topper	2019-03-27	1	-10/+2
\| \| \| \| \| \| \|	We were manually outputting the code we would get from selecting ANY_EXTEND. We can save some code by just letting an ANY_EXTEND go through isel on its own. llvm-svn: 357045
*	[PPC] Refactor PPCBranchSelector.cpp	Guozhi Wei	2019-03-26	1	-136/+177
\| \| \| \| \| \| \| \| \| \|	This patch splits the huge function PPCBranchSelector.cpp:runOnMachineFunction into several smaller functions. No functional change. Differential Revision: https://reviews.llvm.org/D59623 llvm-svn: 357033
*	[PowerPC] Remove UseVSXReg	Stefan Pintilie	2019-03-26	4	-110/+86
\| \| \| \| \| \| \| \| \| \|	The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028