bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Don't build the Tpi Hash map by default.	Zachary Turner	2018-12-03	1	-3/+0
\| \| \| \| \| \| \|	This is very slow and should be done for specific cases where lookups will need to happen. llvm-svn: 348160
*	[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.	Craig Topper	2018-12-03	1	-42/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55165 llvm-svn: 348159
*	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb ↵	Craig Topper	2018-12-03	2	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158
*	Fix non-modular build.	Adrian Prantl	2018-12-03	1	-0/+4
\| \| \| \|	llvm-svn: 348157
*	Update Diagnostic handling for changes in CFE.	Adrian Prantl	2018-12-03	1	-11/+27
\| \| \| \| \| \| \| \| \| \| \|	The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component. https://reviews.llvm.org/D55085 llvm-svn: 348155
*	[CmpInstAnalysis] fix formatting; NFC	Sanjay Patel	2018-12-03	2	-9/+9
\| \| \| \| \| \| \| \|	There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149
*	Fixing -print-module-scope for legacy SCC passes	Fedor Sergeev	2018-12-03	1	-4/+21
\| \| \| \| \| \| \| \| \| \|	It appears that print-module-scope was not implemented for legacy SCC passes. Fixed to print a whole module instead of just current SCC. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D54793 llvm-svn: 348144
*	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.	Jonas Paulsson	2018-12-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141
*	[AArch64] Add command-line option for SSBS	Pablo Barrio	2018-12-03	3	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SSBS, as it was previously only possible to enable by selecting -march=armv8.5-a. Similar patch upstream in GNU binutils: https://sourceware.org/ml/binutils/2018-09/msg00274.html Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54629 llvm-svn: 348137
*	[AMDGPU] Add sdwa support for ADD\|SUB U64 decomposed Pseudos	Ron Lieberman	2018-12-03	2	-2/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The introduction of S_{ADD\|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision: https://reviews.llvm.org/D54882 llvm-svn: 348132
*	ARM: use target-specific SUBS node when combining cmp with cmov.	Tim Northover	2018-12-03	5	-11/+42
\| \| \| \| \| \| \| \| \| \| \|	This has two positive effects. First, using a custom node prevents recombination leading to an infinite loop since the output DAG is notionally a little more complex than the input one. Using a flag-setting instruction also allows the subtraction to be folded with the related comparison more easily. https://reviews.llvm.org/D53190 llvm-svn: 348122
*	[NFC][AArch64] Split out backend features	Diogo N. Sampaio	2018-12-03	6	-72/+297
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 llvm-svn: 348121
*	[llvm-dwarfdump] - Stop printing the bogus empty section name on invalid dwarf.	George Rimar	2018-12-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there is no .debug_addr section for some reason, llvm-dwarfdump would print the bogus empty section name when dumping ranges in .debug_info: DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000004 [0x0000000000000000, 0x0000000000000001) "" [0x0000000000000000, 0x0000000000000002) "") That happens because of the code which uses 0 (zero) as a section index as a default value. The code should use -1ULL instead because technically 0 is a valid zero section index in ELF and -1ULL is a special constant used that means "no section available". This is mostly a fix for the overall correctness/safety of the code, but a test case is provided too. Differential revision: https://reviews.llvm.org/D55113 llvm-svn: 348115
*	[ARM][MC] Move information about variadic register defs into tablegen	Oliver Stannard	2018-12-03	5	-38/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, variadic operands on an MCInst are assumed to be uses, because they come after the defs. However, this is not always the case, for example the Arm/Thumb LDM instructions write to a variable number of registers. This adds a property of instruction definitions which can be used to mark variadic operands as defs. This only affects MCInst, because MachineInstruction already tracks use/def per operand in each instance of the instruction, so can already represent this. This property can then be checked in MCInstrDesc, allowing us to remove some special cases in ARMAsmParser::isITBlockTerminator. Differential revision: https://reviews.llvm.org/D54853 llvm-svn: 348114
*	[ARM][Asm] Debug trace for the processInstruction loop	Oliver Stannard	2018-12-03	2	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the Arm assembly parser, we first match an instruction, then call processInstruction to possibly change it to a different encoding, to match rules in the architecture manual which can't be expressed by the table-generated matcher. This adds debug printing so that this process is visible when using the -debug option. To support this, I've added a new overload of MCInst::dump_pretty which takes the opcode name as a StringRef, since we don't have an InstPrinter instance in the assembly parser. Instead, we can get the same information directly from the MCInstrInfo. Differential revision: https://reviews.llvm.org/D54852 llvm-svn: 348113
*	[KMSAN] Enable -msan-handle-asm-conservative by default	Alexander Potapenko	2018-12-03	1	-2/+5
\| \| \| \| \| \| \| \| \| \|	This change enables conservative assembly instrumentation in KMSAN builds by default. It's still possible to disable it with -msan-handle-asm-conservative=0 if something breaks. It's now impossible to enable conservative instrumentation for userspace builds, but it's not used anyway. llvm-svn: 348112
*	[ARM] FP16: support vld1.16 for vector loads with post-increment	Sjoerd Meijer	2018-12-03	1	-0/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D55112 llvm-svn: 348110
*	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction	Kang Zhang	2018-12-03	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 348109
*	[NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global ↵	QingShan Zhang	2018-12-03	3	-15/+36
\| \| \| \| \| \| \| \| \| \| \|	is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision: https://reviews.llvm.org/D54925 llvm-svn: 348108
*	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a ↵	Craig Topper	2018-12-02	1	-0/+12
\| \| \| \| \| \|	bitcast and a store of a iX scalar. llvm-svn: 348104
*	[X86] Fix bad comment. NFC	Craig Topper	2018-12-02	1	-1/+1
\| \| \| \|	llvm-svn: 348103
*	[ValueTracking] Support funnel shifts in computeKnownBits()	Nikita Popov	2018-12-02	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D55140 llvm-svn: 348091
*	[SelectionDAG] fold constant with undef vector per element	Sanjay Patel	2018-12-02	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \| \|	This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090
*	[DAGCombiner] guard against an oversized shift crash	Sanjay Patel	2018-12-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
*	[ValueTracking] add helper function for testing implied condition; NFCI	Sanjay Patel	2018-12-02	3	-60/+46
\| \| \| \| \| \| \| \|	We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088
*	[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.	Craig Topper	2018-12-02	1	-23/+8
\| \| \| \| \| \|	Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087
*	[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to ↵	Craig Topper	2018-12-02	1	-2/+8
\| \| \| \| \| \| \| \|	avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086
*	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode ↵	Craig Topper	2018-12-02	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085
*	[MachineOutliner][AArch64] Improve checks for stack instructions	Jessica Paquette	2018-12-01	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081
*	[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1	Craig Topper	2018-12-01	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079
*	[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR	Graham Sellers	2018-12-01	2	-7/+55
\| \| \| \| \| \| \| \| \|	The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit. Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it. Differential: https://reviews.llvm.org/D55071 llvm-svn: 348075
*	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts ↵	Simon Pilgrim	2018-12-01	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073
*	[InstCombine] Support ssub.sat canonicalization for non-splats	Nikita Popov	2018-12-01	2	-16/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072
*	[ThinLTO] Allow importing of functions with var args	Teresa Johnson	2018-12-01	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068
*	[RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases	Alex Bradbury	2018-12-01	1	-26/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. SRAW assumed that (sext_inreg foo, i32) could only be produced when sign-extended an i32. However, it can be produced by input such as: define i64 @tricky_ashr(i64 %a, i64 %b) { %1 = shl i64 %a, 32 %2 = ashr i64 %1, 32 %3 = ashr i64 %2, %b ret i64 %3 } It's important not to select sraw in the above case, because sraw only uses bits lower 5 bits from the shift, while a shift of 32-63 would be valid. Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be produced when zero-extending a value that was originally i32 in LLVM IR. This is obviously incorrect. This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and adds test cases that would demonstrate a miscompile if the incorrect patterns were re-added. llvm-svn: 348067
*	Use RequireNullTerminator=false in identify_magic.	Zachary Turner	2018-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	identify_magic does not need the file to be null terminated. Passing true here causes the file reading code to decide not to use mmap in some rare cases (which happen to be true 100% of the time in PDB files) which can lead to very large files failing to load. Since it was probably just an accident that we were passing true here (since it is the default function parameter), this should be strictly an improvement. llvm-svn: 348059
*	[NVPTX] Add lowering of i128 numbers as struct fields	Artem Belevich	2018-12-01	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	Addition to D34555 - override VTs computation with ComputePTXValueVTs for struct fields. Author: Denys Zariaiev<denys.zariaiev@gmail.com> Differential Revision: https://reviews.llvm.org/D55144 llvm-svn: 348057
*	LegacyDivergenceAnalysis: fix uninitialized value	Nicolai Haehnle	2018-11-30	1	-1/+3
\| \| \| \| \|	Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26 llvm-svn: 348051
*	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics	Nicolai Haehnle	2018-11-30	6	-220/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050
*	AMDGPU: Fix various issues around the VirtReg2Value mapping	Nicolai Haehnle	2018-11-30	2	-31/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049
*	[DA] GPUDivergenceAnalysis for unstructured GPU kernels	Nicolai Haehnle	2018-11-30	2	-26/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048
*	[MachineOutliner] Outline both register save calls + no LR save calls together	Jessica Paquette	2018-11-30	1	-32/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of treating the outlined functions for these as distinct frames, they should be combined into one case. Neither allows for stack fixups, and both generate the same frame. Thus, they ought to be considered one case. This makes the code far easier to understand, for one thing. It also offers some small code size improvements. It's fairly rare to see a class of outlined functions that doesn't fall entirely into one variant (on CTMark anyway). It does happen from time to time though. This mostly offers some serious simplification. Also update the test to show the added functionality. llvm-svn: 348036
*	AArch64: Don't emit CFI for SCS register in nounwind functions.	Peter Collingbourne	2018-11-30	1	-14/+16
\| \| \| \| \| \| \| \| \| \|	All that you can legitimately do with the CFI for a nounwind function is get a backtrace, and adjusting the SCS register is not (currently) required for this purpose. Differential Revision: https://reviews.llvm.org/D54988 llvm-svn: 348035
*	[Mem2Reg] Fix nondeterministic corner case	Joseph Tremoulet	2018-11-30	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D55077 llvm-svn: 348024
*	[DWARFv5] Verify all-or-nothing constraint on DIFile source	Scott Linder	2018-11-30	1	-0/+18
\| \| \| \| \| \| \| \| \|	Update IR verifier to check the constraint that DIFile source is present on all files or no files. Differential Revision: https://reviews.llvm.org/D54953 llvm-svn: 348022
*	[X86] Change vXi8 MULHU lowering to unpack high and low half of lanes ↵	Craig Topper	2018-11-30	1	-54/+47
\| \| \| \| \| \| \| \|	instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation. llvm-svn: 348019
*	[X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked ↵	Craig Topper	2018-11-30	1	-5/+5
\| \| \| \| \| \| \| \|	operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates. llvm-svn: 348018
*	[SelectionDAG] fold FP binops with 2 undef operands to undef	Sanjay Patel	2018-11-30	1	-2/+4
\| \| \| \|	llvm-svn: 348016
*	[AMDGPU] Disable SReg Global LD/ST, perf regression	Ron Lieberman	2018-11-30	1	-0/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D55093 llvm-svn: 348014
*	Revert "[BTF] Add BTF DebugInfo"	Yonghong Song	2018-11-30	8	-1117/+2
\| \| \| \| \| \|	This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf. llvm-svn: 348004