bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Merge 2 consecutive HasInt256 branches. NFCI.	Simon Pilgrim	2019-09-03	1	-3/+2
\| \| \| \|	llvm-svn: 370761
*	[SystemZ] Recognize INLINEASM_BR in backend.	Jonas Paulsson	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \|	SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753
*	[ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands	David Green	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745
*	[SystemZ] Add support for fentry.	Jonas Paulsson	2019-09-03	2	-0/+15
\| \| \| \| \| \| \|	SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743
*	[ARM] Invert CSEL predicates if the opposite is a simpler constant to ↵	David Green	2019-09-03	4	-30/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741
*	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.	David Green	2019-09-03	6	-1/+92
\| \| \| \| \| \| \| \| \| \| \| \|	Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739
*	[mips] Switch to the `.text` section after emitting asm file preamble	Simon Atanasyan	2019-09-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now the last `.section` directive in the MIPS asm file preamble is the `.section .mdebug.abi`. If assembler code injected for example by the LLVM `module asm` or the C ` __asm` directives do not contain explicit switching to the `.text` section it goes to the `.mdebug.abi` section. It might be unexpected to the user and in fact for example breaks building some existing code like FreeBSD libc [1]. The patch forces switching to the `.text` section after emitting MIPS assembler file preamble. [1] https://bugs.llvm.org/show_bug.cgi?id=43119 Fix PR43119. Differential Revision: https://reviews.llvm.org/D67014 llvm-svn: 370735
*	[ARM] Fix MVE ldst offset ranges	David Green	2019-09-03	1	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731
*	[ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings	Oliver Stannard	2019-09-03	1	-25/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729
*	Bug fix on function epilog optimization (ARM backend)	Oliver Stannard	2019-09-03	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728
*	[LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit	Bjorn Pettersson	2019-09-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fold-tail currently supports reduction last-vector-value live-out's, but has yet to support last-scalar-value live-outs, including non-header phi's. As it relies on AllowedExit in order to detect them and bail out we need to add the non-header PHI nodes to AllowedExit, otherwise we end up with miscompiles. Solves https://bugs.llvm.org/show_bug.cgi?id=43166 Reviewers: fhahn, Ayal Reviewed By: fhahn, Ayal Subscribers: anna, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67074 llvm-svn: 370721
*	[LV] Tail-folding, runtime scev checks	Sjoerd Meijer	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \|	Now that we allow tail-folding, not only when we optimise for size, make sure we do not run in this assert. Differential revision: https://reviews.llvm.org/D66932 llvm-svn: 370711
*	[LV] Tail-folding with runtime memory checks	Sjoerd Meijer	2019-09-03	1	-1/+4
\| \| \| \| \| \| \| \| \|	The loop vectorizer was running in an assert when it tried to fold the tail and had to emit runtime memory disambiguation checks. Differential revision: https://reviews.llvm.org/D66803 llvm-svn: 370707
*	[MachinePipeliner] Add a way to unit-test the schedule emitter	James Molloy	2019-09-03	3	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself. One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway. This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at. We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with: llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir And run the emission in isolation with: llc < test.mir -run-pass=modulo-schedule-test llvm-svn: 370705
*	[ARM] Select vmla	Sam Tebbs	2019-09-03	1	-0/+15
\| \| \| \| \| \| \| \|	This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704
*	[X86] Simplify the setOperationAction handling for fp_to_uint by improving ↵	Craig Topper	2019-09-03	2	-19/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700
*	[X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.	Craig Topper	2019-09-03	1	-13/+7
\| \| \| \| \| \| \| \|	Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699
*	[LegalizeDAG] Pass DAG to two calls to SDNode::dump in debug prints so that ↵	Craig Topper	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \|	they will print target specific nodes correctly. The dump methods can only print target node names correctly if they can get access to the TLI object. llvm-svn: 370694
*	[X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.	Craig Topper	2019-09-03	1	-8/+7
\| \| \| \| \| \| \|	Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693
*	[X86] Enable fp128 as a legal type with SSE1 rather than with MMX.	Craig Topper	2019-09-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682
*	[ARM] Use MQPR not QPR for MVE registers	David Green	2019-09-02	3	-96/+98
\| \| \| \| \| \| \| \| \|	We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676
*	[TargetLowering][PS4] Add sincos(f) lib functions when target is PS4	Robert Lougher	2019-09-02	1	-0/+5
\| \| \| \| \| \| \| \| \|	PS4 supports sincosf and sincos. Adding the library functions enables the sin(f)+cos(f) -> sincos(f) optimization. Differential Revision: https://reviews.llvm.org/D67009 llvm-svn: 370675
*	[SystemZ] Support constrained fpto[su]i intrinsics	Ulrich Weigand	2019-09-02	3	-16/+32
\| \| \| \| \| \| \| \| \| \| \|	Now that constrained fpto[su]i intrinsic are available, add codegen support to the SystemZ backend. In addition to pure back-end changes, I've also needed to add the strict_fp_to_[su]int and any_fp_to_[su]int pattern fragments in the obvious way. llvm-svn: 370674
*	[SVE][Inline-Asm] Support for SVE asm operands	Kerry McLaughlin	2019-09-02	4	-7/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673
*	[X86] getPMOVMSKB - add MVT::v64i8 handling and remove from ↵	Simon Pilgrim	2019-09-02	1	-11/+12
\| \| \| \| \| \|	combineBitcastvxi1. NFCI. llvm-svn: 370670
*	Recommit r370661 "[llvm-nm] - Add a test case for case when we dump a symbol ↵	George Rimar	2019-09-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that belongs to a section with a broken sh_name." Fix: add a 'consumeError()' call to ObjectFile.cpp. This error was never checked. Original commit message: It adds a test case for a problem fixed by D66976 <https://reviews.llvm.org/D66976>. It was introduced by me in D66089 <https://reviews.llvm.org/D66089>. The error reported was never consumed because of a wrong variable name used, so it could fail when LLVM_ENABLE_ABI_BREAKING_CHECKS is used. Differential revision: https://reviews.llvm.org/D67002 llvm-svn: 370669
*	[DAGCombiner] try to form test+set out of shift+mask patterns	Sanjay Patel	2019-09-02	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating bugs are: https://bugs.llvm.org/show_bug.cgi?id=41340 https://bugs.llvm.org/show_bug.cgi?id=42697 As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a codegen opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test. This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext. Differential Revision: https://reviews.llvm.org/D66687 llvm-svn: 370668
*	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵	Jay Foad	2019-09-02	2	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
*	[AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null	Dmitry Preobrazhensky	2019-09-02	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665
*	[FileCheck] Forbid using var defined on same line	Thomas Preud'homme	2019-09-02	1	-36/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Commit r366897 introduced the possibility to set a variable from an expression, such as [[#VAR2:VAR1+3]]. While introducing this feature, it introduced extra logic to allow using such a variable on the same line later on. Unfortunately that extra logic is flawed as it relies on a mapping from variable to expression defining it when the mapping is from variable definition to expression. This flaw causes among other issues PR42896. This commit avoids the problem by forbidding all use of a variable defined on the same line, and removes the now useless logic. Redesign will be done in a later commit because it will require some amount of refactoring first for the solution to be clean. One example is the need for some sort of transaction mechanism to set a variable temporarily and from an expression and rollback if the CHECK pattern does not match so that diagnostics show the right variable values. Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: JonChesterfield, rogfer01, hfinkel, kristina, rnk, tra, arichardson, grimar, dblaikie, probinson, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D66141 llvm-svn: 370663
*	[AMDGPU][MC][GFX10] Enabled null with 64-bit operands	Dmitry Preobrazhensky	2019-09-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660
*	[InstCombine] recognize bswap disguised as shufflevector	Sanjay Patel	2019-09-02	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N8} --> bswap (bitcast X to i{N8}) In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 ...we have a more complicated case where SLP is making a mess of bswap. This patch won't do anything for that currently, but we need to improve bswap recognition in instcombine, SLP, and/or a standalone pass to avoid that problem. This is limited using the data-layout so we don't try to do this transform with actual vector types. The backend does not appear to have folds to convert in either direction, so we don't want to mess up something that is actually better lowered as a shuffle. On x86, we're trading something like this: vmovd %edi, %xmm0 vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u] vmovd %xmm0, %eax For: movl %edi, %eax bswapl %eax Differential Revision: https://reviews.llvm.org/D66965 llvm-svn: 370659
*	[llvm-dlltool] Handle external and internal names with differing decoration	Martin Storsjo	2019-09-02	1	-1/+12
\| \| \| \| \| \| \| \|	Also add a missed part of the test from SVN r369747. Differential Revision: https://reviews.llvm.org/D66996 llvm-svn: 370656
*	[llvm-dlltool] Remove support for implying output name	Martin Storsjo	2019-09-02	1	-10/+2
\| \| \| \| \| \| \| \| \| \|	I don't see GNU dlltool supporting doing this; with only a -d option and no -l option, GNU dlltool runs successfully but doesn't write any output file. Differential Revision: https://reviews.llvm.org/D65645 llvm-svn: 370655
*	[AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions	Dmitry Preobrazhensky	2019-09-02	1	-4/+23
\| \| \| \| \| \| \| \| \| \|	See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652
*	[X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions.	Andrea Di Biagio	2019-09-02	12	-29/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649
*	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations	Jeremy Morse	2019-09-02	2	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370648
*	[X86] combineHorizontalPredicateResult - pull out repeated ↵	Simon Pilgrim	2019-09-02	1	-2/+2
\| \| \| \| \| \|	getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637
*	[yaml2obj] - Allow overriding sh_name fields of the sections.	George Rimar	2019-09-02	2	-4/+9
\| \| \| \| \| \| \| \| \|	This is in line with the previous changes which allowed to override the sh_offset/sh_size and useful for writing test cases. Differential revision: https://reviews.llvm.org/D66998 llvm-svn: 370633
*	[DWARFVerifier] Verify GNU extensions of call site DWARF symbols	Djordje Todorovic	2019-09-02	1	-2/+7
\| \| \| \| \| \| \| \| \|	Verify that the call site DWARF symbols (added during the implementation of the debug entry values feature) are generated properly. Differential Revision: https://reviews.llvm.org/D66865 llvm-svn: 370631
*	[AArch64][GlobalISel] Fix zext narrowScalar to use the right type when creating	Amara Emerson	2019-09-02	1	-3/+5
\| \| \| \| \| \| \| \|	the merges. Fixes PR43171. llvm-svn: 370627
*	[X86] Add initial support for unfolding broadcast loads from arithmetic ↵	Craig Topper	2019-09-01	3	-10/+155
\| \| \| \| \| \| \| \| \| \|	instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620
*	[DAGCombiner] improve throughput of shift+logic+shift	Sanjay Patel	2019-09-01	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617
*	[X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.	Simon Pilgrim	2019-09-01	1	-28/+30
\| \| \| \| \| \| \| \| \| \|	Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613
*	Fix shadow variable warning. NFCI.	Simon Pilgrim	2019-09-01	1	-3/+3
\| \| \| \|	llvm-svn: 370610
*	[ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ↵	Roman Lebedev	2019-09-01	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ops (PR43188) As we have already established/fixed in https://bugs.llvm.org/show_bug.cgi?id=42209 https://reviews.llvm.org/D63065 https://reviews.llvm.org/rL363522 the InstSimplify handling for @llvm.with.overflow ops with undefs is correct. Therefore if ConstantFolding produces different results, then it is wrong. This duplication of code hints at the need for some refactoring, but for now address the brokenness of ConstantFolding by copying the known-good handling from rL363522. Fixes https://bugs.llvm.org/show_bug.cgi?id=43188 llvm-svn: 370608
*	[ARM] Remove MVE masked loads/stores	David Green	2019-09-01	3	-127/+0
\| \| \| \| \| \| \| \| \|	These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607
*	[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken ↵	Shiva Chen	2019-09-01	2	-45/+88
\| \| \| \| \| \| \| \| \| \|	with constant inputs Summary: This fixes the bugzilla id 43183 which triggerd by the following commit: [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall llvm-svn: 370604
*	AMDGPU: Remove unused custom node definition	Matt Arsenault	2019-09-01	3	-12/+0
\| \| \| \|	llvm-svn: 370603
*	[X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns ↵	Craig Topper	2019-08-31	1	-22/+18
\| \| \| \| \| \| \| \| \| \|	with VMOVDI2PDIrr/VMOV64toPQIrr. This is what the copies will eventually be turned into. We don't use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should use the real instruction here too. llvm-svn: 370601