bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"	Amara Emerson	2019-04-19	1	-5/+8
\| \| \| \| \| \|	This introduces some runtime failures which I'll need to investigate further. llvm-svn: 358771
*	[GlobalISel][AArch64] Legalize vector G_FPOW	Jessica Paquette	2019-04-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc. Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change. Differential Revision: https://reviews.llvm.org/D60218 llvm-svn: 358764
*	[CodeGen] Add "const" to MachineInstr::mayAlias	Bjorn Pettersson	2019-04-19	14	-54/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744
*	[AMDGPU] Ignore non-SUnits edges	Piotr Sobczak	2019-04-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Ignore edges to non-SUnits (e.g. ExitSU) when checking for low latency instructions. When calling the function isLowLatencyInstruction(), an ExitSU could be on the list of successors, not necessarily a regular SU. In other places in the code there is a check "Succ->NodeNum >= DAGSize" to prevent further processing of ExitSU as "Succ->getInstr()" is NULL in such a case. Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())" has the guard, so it is clearly an omission here. Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60864 llvm-svn: 358740
*	[X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the ↵	Craig Topper	2019-04-19	1	-0/+3
\| \| \| \| \| \| \| \|	AND could match a movzx. Could get further improvements by recognizing (i64 and (anyext (i32 shl))). llvm-svn: 358737
*	[X86] Make sure we copy the HandleSDNode back to N before executing the ↵	Craig Topper	2019-04-19	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default code after the switch in matchAddressRecursively Summary: There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code. This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead. matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere. Reviewers: niravd, spatel, RKSimon, bkramer Reviewed By: niravd Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60843 llvm-svn: 358735
*	[GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s	Jessica Paquette	2019-04-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s. We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic. This adds legalizer support for those 3, which gives us selection via the importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and add a GISel line to arm64-vabs.ll. Differential Revision: https://reviews.llvm.org/D60881 llvm-svn: 358715
*	[GlobalISel][AArch64] Legalize v8s8 loads	Jessica Paquette	2019-04-18	1	-0/+1
\| \| \| \| \| \| \| \|	Add legalizer support for loads of v8s8 and update legalize-load-store.mir. Differential Revision: https://reviews.llvm.org/D60877 llvm-svn: 358714
*	[X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask	Simon Pilgrim	2019-04-18	1	-23/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together. This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356). This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit. My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great. Differential Revision: https://reviews.llvm.org/D60375#inline-539623 llvm-svn: 358692
*	[X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is ↵	Simon Pilgrim	2019-04-18	1	-37/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	power-of-2. This replaces the MOVMSK combine introduced at D52121/rL342326 (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra. Differential Revision: https://reviews.llvm.org/D60625 llvm-svn: 358651
*	[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS()	Kang Zhang	2019-04-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177 When the two operands for BUILD_VECTOR are same, we will get assert error. llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&): Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) && "The loads cannot be both consecutive and reverse consecutive."' failed. This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We should use `getScalarType().getStoreSize();` to get the ElemSize instread of `getScalarSizeInBits() / 8`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60811 llvm-svn: 358644
*	[AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))	Tim Renouf	2019-04-18	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
*	[x86] try to widen 'shl' as part of LEA formation	Sanjay Patel	2019-04-17	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test file has pairs of tests that are logically equivalent: https://rise4fun.com/Alive/2zQ %t4 = and i8 %t1, 8 %t5 = zext i8 %t4 to i16 %sh = shl i16 %t5, 2 %t6 = add i16 %sh, %t0 => %t4 = and i8 %t1, 8 %sh2 = shl i8 %t4, 2 %z5 = zext i8 %sh2 to i16 %t6 = add i16 %z5, %t0 ...so if we can fold the shift op into LEA in the 1st pattern, then we should be able to do the same in the 2nd pattern (unnecessary 'movzbl' is a separate bug I think). We don't want to do this any sooner though because that would conflict with generic transforms that try to narrow the width of the shift. Differential Revision: https://reviews.llvm.org/D60789 llvm-svn: 358622
*	[AsmPrinter] hoist %a output template to base class for ARM+Aarch64	Nick Desaulniers	2019-04-17	2	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do basically the same thing (Aarch64 did not correctly handle immediates, ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses %a with an immediate) for a flag that should be target independent anyways. Reviewers: echristo, peter.smith Reviewed By: echristo Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60841 llvm-svn: 358618
*	[GlobalISel] Add legalization support for non-power-2 loads and stores	Amara Emerson	2019-04-17	1	-8/+5
\| \| \| \| \| \| \| \| \| \|	Legalize things like i24 load/store by splitting them into smaller power of 2 operations. This matches how SelectionDAG handles these operations. Differential Revision: https://reviews.llvm.org/D59971 llvm-svn: 358613
*	[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC	Nick Desaulniers	2019-04-17	3	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603
*	[AMDGPU][MC] Corrected handling of "-" before expressions	Dmitry Preobrazhensky	2019-04-17	1	-38/+58
\| \| \| \| \| \| \| \| \| \|	See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60622 llvm-svn: 358596
*	AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions	Rhys Perry	2019-04-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
*	[AMDGPU][MC] Corrected parsing of registers	Dmitry Preobrazhensky	2019-04-17	1	-27/+126
\| \| \| \| \| \| \| \| \| \|	See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60621 llvm-svn: 358581
*	[AMDGPU] Flag new raw/struct atomic ops as source of divergence	Tim Renouf	2019-04-17	1	-0/+22
\| \| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D60731 Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d llvm-svn: 358579
*	[CostModel][X86] Add bool anyof/allof reduction costs	Simon Pilgrim	2019-04-17	1	-0/+42
\| \| \| \| \| \| \| \|	On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574
*	[X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for ↵	Craig Topper	2019-04-17	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	GR32<->XMM register copies. We have two versions of some instructions, VR128 versions and FR32 versions that are marked as CodeGenOnly. This change switches to using the VR128 versions for these copies. It's after register allocation so the class size no longer matters. This matches how GR64 works. llvm-svn: 358555
*	[NVPTXAsmPrinter] clean up dead code. NFC	Nick Desaulniers	2019-04-16	2	-45/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The printOperand function takes a default parameter, for which there are zero call sites that explicitly pass such a parameter. As such, there is no case to support. This means that the method printVecModifiedImmediate is purly dead code, and can be removed. The eventual goal for some of these AsmPrinter refactoring is to have printOperand be a virtual method; making it easier to print operands from the base class for more generic Asm printing. It will help if all printOperand methods have the same function signature (ie. no Modifier argument when not needed). Reviewers: echristo, tra Reviewed By: echristo Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60727 llvm-svn: 358527
*	[TargetLowering] Rename preferShiftsToClearExtremeBits and ↵	Simon Pilgrim	2019-04-16	6	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \|	shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526
*	[X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops	Simon Pilgrim	2019-04-16	1	-5/+13
\| \| \| \| \| \| \| \| \| \|	Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register. This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data. Differential Revision: https://reviews.llvm.org/D60562 llvm-svn: 358516
*	[Hexagon] Remove indeterministic traversal order	Krzysztof Parzyszek	2019-04-16	1	-9/+8
\| \| \| \| \| \|	Patch by Sergei Larin. llvm-svn: 358505
*	[RISCV] Custom lower SHL_PARTS, SRA_PARTS, SRL_PARTS	Luis Marques	2019-04-16	2	-3/+108
\| \| \| \| \| \| \| \| \|	When not optimizing for minimum size (-Oz) we custom lower wide shifts (SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall. Differential Revision: https://reviews.llvm.org/D59477 llvm-svn: 358498
*	[X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a ↵	Craig Topper	2019-04-15	3	-1/+8
\| \| \| \| \| \| \| \|	512 type. The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a llvm-svn: 358450
*	AMDGPU: Fix unreachable when counting register usage of SGPR96	Matt Arsenault	2019-04-15	1	-0/+3
\| \| \| \|	llvm-svn: 358447
*	AMDGPU: Fix printed format of SReg_96	Matt Arsenault	2019-04-15	1	-0/+3
\| \| \| \| \| \| \|	These are artificial, so I think this should only come up with inline asm comments. llvm-svn: 358446
*	[X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without ↵	Craig Topper	2019-04-15	1	-24/+21
\| \| \| \| \| \| \| \|	avx512bw. 32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash. llvm-svn: 358436
*	Add explicit dependency to MCDwarf.h in ARC backend.	Pete Couperus	2019-04-15	1	-0/+1
\| \| \| \|	llvm-svn: 358430
*	[X86] Restore the pavg intrinsics.	Craig Topper	2019-04-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427
*	Add slbfee instruction.	Sean Fertile	2019-04-15	4	-0/+7
\| \| \| \|	llvm-svn: 358425
*	[AMDGPU] Fixed incorrect test in vcnd/vcmp optimization	Tim Renouf	2019-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes a test I introduced in change D59191 (that added src0 and src1 modifiers to the v_cndmask instruction for disassembly purposes). Spotted by David Binderman in bug 41488. Differential Revision: https://reviews.llvm.org/D60652 Change-Id: I6ac95e66cd84e812ed3359ad57bcd0e13198ba0c llvm-svn: 358392
*	[Sparc] Fix typo. NFC.	Jim Lin	2019-04-15	1	-2/+2
\| \| \| \|	llvm-svn: 358370
*	[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with ↵	Amara Emerson	2019-04-15	2	-19/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6% test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369
*	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define ↵	Amara Emerson	2019-04-15	5	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368
*	[X86] Redefine KUNPCK instructions to take a narrower source register class ↵	Craig Topper	2019-04-14	1	-11/+9
\| \| \| \| \| \| \| \| \|	than destination register class. Remove copies from the isel output pattern. There's no reason for the inputs to be the destination register class. This just forces an unnecessary copy in the output patterns. llvm-svn: 358362
*	[X86] Put the locked mi8 instrutions above the locked mi/mi32 so they will ↵	Craig Topper	2019-04-14	1	-24/+26
\| \| \| \| \| \| \| \| \|	be prefered. We want 64mi8 to be prefered over 64mi32. The order for 16mi/32mi doesn't really matter. llvm-svn: 358361
*	[X86] Change IMUL with immediate instruction order to ri8 instructions come ↵	Craig Topper	2019-04-14	1	-31/+34
\| \| \| \| \| \| \| \| \| \|	before ri/ri32 instructions. This will ensure IMUL64ri8 is tried before IMUL64ri32. For IMUL32 and IMUL16 the order doesn't really matter because only the ri8 versions use a predicate. That automatically gives them priority. llvm-svn: 358360
*	[X86] Move VPTESTM matching from the isel table to custom code in ↵	Craig Topper	2019-04-14	2	-251/+396
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86ISelDAGToDAG. We had many tablegen patterns for these instructions. And due to the commutability of the patterns, tablegen expands them to even more patterns. All together VPTESTMD patterns accounted for more the 50K of the 610K isel table. This had gotten bad when we stopped canonicalizing AND to vXi64. This required a pattern for every combination of bitcast input type. This change moves the matching to custom code where it is easier to look through the bitcasts without being concerned with the specific types. The test changes are because we are now stricter with one use checks as its required to make load folding legal. We now require the AND and any BITCAST to only have a single use. This prevents forming VPTESTM and a VPAND with the same inputs. We now support broadcast loads for 128/256 patterns without VLX. We'll widen to 512-bit like and still fold the broadcast since the amount of memory read doesn't change. There are a few tests that got slightly longer because are now prefering load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were able to share the XOR with multiple VPTESTM instructions. llvm-svn: 358359
*	[X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has ↵	Craig Topper	2019-04-14	1	-148/+256
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358
*	[X86] Remove some unused tablegen multiclasses. NFC	Craig Topper	2019-04-14	1	-27/+0
\| \| \| \|	llvm-svn: 358345
*	[X86] Use PC-relative mode for the kernel code model	Bill Wendling	2019-04-13	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The Linux kernel uses PC-relative mode, so allow that when the code model is "kernel". Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, kees, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D60643 llvm-svn: 358343
*	[X86] Use int64_t and isInt<N> instead of APInt operations in ↵	Craig Topper	2019-04-13	1	-8/+6
\| \| \| \| \| \| \| \| \| \|	foldLoadStoreIntoMemOperand. NFC We know all our values are limited to 64 bits here so we don't need an APInt. This should save some generated code checking between large and small size. llvm-svn: 358338
*	[WebAssembly] Use Function::hasOptSize() (NFC)	Heejin Ahn	2019-04-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Use member function. Reviewers: aheejin Subscribers: sunfish, hiraditya, sbc100, jgravelle-google, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60651 Patch by Hideto Ueno (uenoku) llvm-svn: 358336
*	[AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and ↵	Amara Emerson	2019-04-13	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318
*	[AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an ↵	Amara Emerson	2019-04-12	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \|	undef mask element. If a shufflevector's mask vector has an element with "undef" then the generic instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT. This fixes the selector to handle this case, and for now assumes that undef just means zero. In future we'll optimize this case properly. llvm-svn: 358312
*	[WebAssembly] Add mutable-globals to bleeding-edge CPU	Thomas Lively	2019-04-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This brings the backend in line with Clang. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60594 llvm-svn: 358310