bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[WebAssembly][NFC] Unify ARGUMENT classes	Thomas Lively	2018-10-13	5	-45/+36
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53172 llvm-svn: 344436
*	[RISCV] Eliminate unnecessary masking of promoted shift amounts	Alex Bradbury	2018-10-12	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it is promoted to the ShiftAmountTy. This results in zero-extension (masking) which is unnecessary for RISC-V as the shift operations only read the lower 5 or 6 bits (RV32 or RV64). I initially proposed adding a getExtendForShiftAmount hook so the shift amount can be any-extended (D52975). @efriedma explained this was unsafe, so I have instead eliminate the unnecessary and operations at instruction selection time in a manner similar to X86InstrCompiler.td. Differential Revision: https://reviews.llvm.org/D53224 llvm-svn: 344432
*	[X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to ↵	Craig Topper	2018-10-12	1	-7/+13
\| \| \| \| \| \|	avoid a stack stack temporary. llvm-svn: 344425
*	[X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 ↵	Craig Topper	2018-10-12	1	-7/+3
\| \| \| \| \| \| \| \|	(bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR. Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence. llvm-svn: 344424
*	[X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults ↵	Craig Topper	2018-10-12	1	-8/+2
\| \| \| \| \| \| \| \|	if the dest type can be widened by generic legalization. NFCI The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion. llvm-svn: 344423
*	[x86] add and use fast horizontal vector math subtarget feature	Sanjay Patel	2018-10-12	3	-7/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361
*	Fix unused variable warning after r344348	Eric Liu	2018-10-12	1	-0/+1
\| \| \| \|	llvm-svn: 344350
*	[X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage.	Simon Pilgrim	2018-10-12	1	-51/+30
\| \| \| \| \| \| \| \|	Pull out repeated byte sum stage for popcount of vector elements > 8bits. This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw. llvm-svn: 344348
*	[PowerPC] avoid masking already-zero bits in BitPermutationSelector	Hiroshi Inoue	2018-10-12	1	-15/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable. ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value. This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node. VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them. VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits. This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero). Differential Revision: https://reviews.llvm.org/D48025 llvm-svn: 344347
*	[X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine	Simon Pilgrim	2018-10-12	1	-0/+12
\| \| \| \| \| \| \| \|	Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336
*	[tblgen][llvm-mca] Add the ability to describe move elimination candidates ↵	Andrea Di Biagio	2018-10-12	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	via tablegen. This patch adds the ability to identify instructions that are "move elimination candidates". It also allows scheduling models to describe processor register files that allow move elimination. A move elimination candidate is an instruction that can be eliminated at register renaming stage. Each subtarget can specify which instructions are move elimination candidates with the help of tablegen class "IsOptimizableRegisterMove" (see llvm/Target/TargetInstrPredicate.td). For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated. The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this: ``` def : IsOptimizableRegisterMove<[ InstructionEquivalenceClass<[ // GPR variants. MOV32rr, MOV64rr, // MMX variants. MMX_MOVQ64rr, // SSE variants. MOVAPSrr, MOVUPSrr, MOVAPDrr, MOVUPDrr, MOVDQArr, MOVDQUrr, // AVX variants. VMOVAPSrr, VMOVUPSrr, VMOVAPDrr, VMOVUPDrr, VMOVDQArr, VMOVDQUrr ], CheckNot<CheckSameRegOperand<0, 1>> > ]>; ``` Definitions of IsOptimizableRegisterMove from processor models of a same Target are processed by the SubtargetEmitter to auto-generate a target-specific override for each of the following predicate methods: ``` bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI) const; bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned CPUID) const; ``` By default, those methods return false (i.e. conservatively assume that there are no move elimination candidates). Tablegen class RegisterFile has been extended with the following information: - The set of register classes that allow move elimination. - Maxium number of moves that can be eliminated every cycle. - Whether move elimination is restricted to moves from registers that are known to be zero. This patch is structured in three part: A first part (which is mostly boilerplate) adds the new 'isOptimizableRegisterMove' target hooks, and extends existing register file descriptors in MC by introducing new fields to describe properties related to move elimination. A second part, uses the new tablegen constructs to describe move elimination in the BtVer2 scheduling model. A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove' hook to mark instructions that are candidates for move elimination. It also teaches class RegisterFile how to describe constraints on move elimination at PRF granularity. llvm-mca tests for btver2 show differences before/after this patch. Differential Revision: https://reviews.llvm.org/D53134 llvm-svn: 344334
*	[X86] Ignore float/double non-temporal loads (PR39256)	Simon Pilgrim	2018-10-12	1	-0/+3
\| \| \| \| \| \| \| \|	Scalar non-temporal loads were asserting instead of just being ignored. Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895 llvm-svn: 344331
*	[mips] Mark fmaxl as a long double emulation routine	Stefan Maksimovic	2018-10-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failure was discovered upon running projects/compiler-rt/test/builtins/Unit/divtc3_test.c in a stage2 compiler build. When compiling projects/compiler-rt/lib/builtins/divtc3.c, a call to fmaxl within the divtc3 implementation had its return values read from registers $2 and $3 instead of $f0 and $f2. Include fmaxl in the list of long double emulation routines to have its return value correctly interpreted as f128. Almost exact issue here: https://reviews.llvm.org/D17760 Differential Revision: https://reviews.llvm.org/D52649 llvm-svn: 344326
*	Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"	Tom Stellard	2018-10-11	2	-31/+0
\| \| \| \| \| \| \| \|	This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317
*	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free	Matthias Braun	2018-10-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315
*	AMDGPU/GlobalISel: Implement select for G_INSERT	Tom Stellard	2018-10-11	2	-0/+31
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 344310
*	[RISCV] Fix disassembling of fence instruction with invalid field	Ana Pazos	2018-10-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instruction with 0 in fence field being disassembled as fence , iorw. Printing "unknown" to match GAS behavior. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51828 llvm-svn: 344309
*	Inline variable into assert to avoid unused variable warning.	Richard Trieu	2018-10-11	1	-2/+1
\| \| \| \|	llvm-svn: 344308
*	[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.	Craig Topper	2018-10-11	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291
*	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS] (fixed)	Thomas Lively	2018-10-11	11	-95/+5
\| \| \| \|	llvm-svn: 344287
*	[Hexagon] Restrict compound instructions with constant value.	Sumanth Gundapaneni	2018-10-11	1	-10/+27
\| \| \| \| \| \| \| \| \| \|	Having a constant value operand in the compound instruction is not always profitable. This patch improves coremark by ~4% on Hexagon. Differential Revision: https://reviews.llvm.org/D53152 llvm-svn: 344284
*	[WebAssembly] Revert rL344180, which was breaking expensive checks	Thomas Lively	2018-10-11	11	-3/+94
\| \| \| \|	llvm-svn: 344280
*	[Hexagon] Eliminate potential sources of non-determinism in HCE	Krzysztof Parzyszek	2018-10-11	1	-9/+33
\| \| \| \| \| \| \| \| \|	Also, avoid comparing GUIDs when ordering global addresses, because source file location can cause different GUID to be calculated. As a result, a pair of symbols can compare "less" in one directory, but "greater" in another. llvm-svn: 344271
*	[X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to ↵	Craig Topper	2018-10-11	3	-80/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode. This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about. We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR. I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that. I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet. Differential Revision: https://reviews.llvm.org/D53126 llvm-svn: 344270
*	[AARCH64][FIX] Emit data symbol for constant pool data	Diogo N. Sampaio	2018-10-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	The ARM64 elf emitter would omit printing data symbol for zero filled constant data. This patch overrides the emitFill method as to enforce that the symbol is correctly printed. Differential revision: https://reviews.llvm.org/D53132 llvm-svn: 344248
*	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern	Roman Lebedev	2018-10-11	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed in D48491, we can't really do this in the TableGen, since we need to produce two instructions. This only implements one single pattern. The other 3 patterns will be in follow-ups. I'm not sure yet if we want to also fuse shift into here (i.e `(x >> start) & ...`) Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D52304 llvm-svn: 344224
*	[WebAssembly][NFC] Use intrinsic dag nodes directly	Thomas Lively	2018-10-11	3	-70/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of custom lowering to WebAssemblyISD nodes first. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53119 llvm-svn: 344211
*	[WebAssembly] Saturating float to int intrinsics	Thomas Lively	2018-10-11	2	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Although the saturating float to int instructions are already emitted from normal IR, the fpto{s,u}i instructions produce poison values if the argument cannot fit in the result type. These intrinsics are therefore necessary to get guaranteed defined saturating behavior. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53004 llvm-svn: 344204
*	[X86] Prevent non-temporal loads from folding into instructions by blocking ↵	Craig Topper	2018-10-10	2	-54/+31
\| \| \| \| \| \| \| \| \| \|	them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189
*	Replace most users of UnknownSize with LocationSize::unknown(); NFC	George Burgess IV	2018-10-10	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186
*	[WebAssembly][NFC] Remove repetition of Defs = [ARGUMENTS]	Thomas Lively	2018-10-10	11	-90/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: By moving that line into the `I` multiclass. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53093 llvm-svn: 344180
*	[X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering	Roman Lebedev	2018-10-10	2	-66/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 \| PR38938 ]], we fail to emit `BEXTR` if the mask is shifted. We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`, and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node, and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back. So it would seem X86ISelLowering is the place to handle this. This patch only moves the matchBEXTRFromAnd() from X86DAGToDAGISel to X86ISelLowering. It does not add support for the 'shifted mask' pattern. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52426 llvm-svn: 344179
*	[WebAssembly][NFC] Use vnot patfrag to simplify v128.not	Thomas Lively	2018-10-10	1	-14/+7
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53097 llvm-svn: 344175
*	[x86] allow single source horizontal op matching (PR39195)	Sanjay Patel	2018-10-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141
*	[TargetLowering] Add root node back to work list after successful ↵	Simon Pilgrim	2018-10-10	1	-14/+2
\| \| \| \| \| \| \| \| \| \|	SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132
*	[SystemZ] Temporarily disable high VFs with integer div/rem.	Jonas Paulsson	2018-10-10	1	-0/+7
\| \| \| \| \| \| \| \|	Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129
*	[X86] Remove FeatureRTM from Skylake processor list	Craig Topper	2018-10-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53041 llvm-svn: 344116
*	[SystemZ] Take better care when computing needed vector registers in TTI.	Jonas Paulsson	2018-10-10	1	-17/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115
*	[PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8	QingShan Zhang	2018-10-10	2	-44/+0
\| \| \| \| \| \| \| \| \| \|	For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52449 llvm-svn: 344109
*	[WebAssembly] Fix fneg lowering	Thomas Lively	2018-10-10	1	-28/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Subtraction from zero and floating point negation do not have the same semantics, so fix lowering. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52948 llvm-svn: 344107
*	[WebAssembly] Improve comments for SIMD instruction definitions	Heejin Ahn	2018-10-10	1	-2/+2
\| \| \| \|	llvm-svn: 344106
*	[WebAssembly] Handle V128 register class in explicit locals pass	Thomas Lively	2018-10-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also add tests to catch crashes in passes that are not normally run in tests. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52959 llvm-svn: 344094
*	[X86] Fix sanitizer bot failure from 344085	Rong Xu	2018-10-09	1	-2/+3
\| \| \| \| \| \|	Fix the memory issue exposed by sanitizer. llvm-svn: 344092
*	[WebAssembly] Improve readability of SIMD instructions (NFC)	Heejin Ahn	2018-10-09	3	-442/+586
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Categorize instructions into the categories as in the SIMD spec - Move SIMD-related definition to WebAssemblyInstrSIMD.td - Put definition and use of patterns together - Add newlines here and there Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53045 llvm-svn: 344086
*	Recommit r343993: [X86] condition branches folding for three-way conditional ↵	Rong Xu	2018-10-09	6	-0/+601
\| \| \| \| \| \| \| \|	codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085
*	[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported	Nemanja Ivanovic	2018-10-09	2	-0/+10
\| \| \| \| \| \| \| \| \| \|	This is the PPC-specific non-controversial part of https://reviews.llvm.org/D44548 that simply enables this combine for PPC since PPC has these instructions. This commit will allow the target-independent portion to be truly target independent. llvm-svn: 344077
*	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits ↵	Craig Topper	2018-10-09	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071
*	[x86] use demanded bits to simplify masked store codegen	Sanjay Patel	2018-10-09	1	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \|	As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048
*	[mips] Set pointer size to 4 bytes for N32 ABI	Simon Atanasyan	2018-10-09	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF generation. In case of MIPS it's incorrect to check for Triple::isMIPS64() only this function returns true for N32 ABI too. Now we do not have a method to recognize N32 if it's specified by a command line option and is not a part of a target triple. So we check for Triple::GNUABIN32 only. It's better than nothing. Differential revision: https://reviews.llvm.org/D52874 llvm-svn: 344039
*	[PowerPC] Remove self-copies in pre-emit peephole	Nemanja Ivanovic	2018-10-09	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are occasionally instances where AADB rewrites registers in such a way that a reg-reg copy becomes a self-copy. Such an instruction is obviously redundant and can be removed. This patch does precisely that. Note that this will not remove various nop's that we insert (which are themselves just self-copies). The reason those are left alone is that all of them have their own opcodes (that just encode to a self-copy). What prompted this patch is the fact that these self-copies sometimes end up using registers that make the instruction a priority-setting nop, thereby having a significant effect on performance. Differential revision: https://reviews.llvm.org/D52432 llvm-svn: 344036