bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ThinLTO] Fix printing of module paths for distributed backend indexes	Teresa Johnson	2018-07-02	1	-17/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the individual index files emitted for distributed ThinLTO backends, the module path ids are not contiguous. Assign slots to module paths in order to handle this better and also to get contiguous numbering in the summary assembly. Reviewers: davidxl, dexonsmith Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, steven_wu Differential Revision: https://reviews.llvm.org/D48698 llvm-svn: 336148
*	[WebAssembly] Support for atomic stores	Heejin Ahn	2018-07-02	3	-1/+153
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for atomic store instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48839 llvm-svn: 336145
*	[ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.	Vadzim Dambrouski	2018-07-02	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: efriedma, rogfer01, javed.absar Reviewed By: efriedma, rogfer01 Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D48846 llvm-svn: 336144
*	[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).	Tim Shen	2018-07-02	1	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140
*	[WebAssembly] Fix fast-isel optimization of branch conditions.	Dan Gohman	2018-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	LLVM doesn't guarantee anything about the high bits of a register holding an i1 value at the IR level, so don't translate LLVM IR i1 values directly into WebAssembly conditional branch operands. WebAssembly's conditional branches do demand all 32 bits be valid. Fixes PR38019. llvm-svn: 336138
*	[X86] Add phony registers for high halves of regs with low halves	Krzysztof Parzyszek	2018-07-02	2	-37/+74
\| \| \| \| \| \| \| \| \| \|	Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (H), and R8-R15 (BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134
*	Replace "Replacable" with "Replaceable". [NFC]	Alina Sbirlea	2018-07-02	1	-13/+13
\| \| \| \|	llvm-svn: 336133
*	[SLP] Recognize min/max pattern using instructions producing same values.	Farhana Aleen	2018-07-02	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130
*	[InstCombine] reverse canonicalization of add --> or to allow more shuffle ↵	Sanjay Patel	2018-07-02	1	-12/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128
*	[MC] Error on a .zerofill directive in a non-virtual section	Francis Visoiu Mistrih	2018-07-02	9	-16/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On darwin, all virtual sections have zerofill type, and having a .zerofill directive in a non-virtual section is not allowed. Instead of asserting, show a nicer error. In order to use the equivalent of .zerofill in a non-virtual section, the usage of .zero of .space is required. This patch replaces the assert with an error. Differential Revision: https://reviews.llvm.org/D48517 llvm-svn: 336127
*	[X86] Don't use aligned load/store instructions for fp128 if the load/store ↵	Craig Topper	2018-07-02	2	-5/+18
\| \| \| \| \| \| \| \| \| \|	isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121
*	[AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.	Amara Emerson	2018-07-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	We currently don't any-extend vararg parameters before storing them to the stack locations on Darwin. However, SelectionDAG however does this, and so user code is in the wild which inadvertently relies on this extension. This can manifest in cases where the value stored is (int)0, but the actual parameter is interpreted by va_arg as a pointer, and so not extending to 64 bits causes the callee to load additional undefined bits. llvm-svn: 336120
*	Revert "[Dominators] Add the DomTreeUpdater class"	Jakub Kuderski	2018-07-02	2	-512/+0
\| \| \| \| \| \| \| \|	Temporary revert because of a failing test on some buildbots. This reverts commit r336114. llvm-svn: 336117
*	[Dominators] Add the DomTreeUpdater class	Jakub Kuderski	2018-07-02	2	-0/+512
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html \| RFC - A new dominator tree updater for LLVM ]]. This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process. —Prior to the patch— - Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily. - DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated. - Functions receiving DT/DDT need to branch a lot which is currently necessary. - Functions using both DomTree and PostDomTree need to call the update function separately on both trees. - People need to construct an additional DeferredDominance class to use functions only receiving DDT. —After the patch— Patch by Chijun Sima <simachijun@gmail.com>. Reviewers: kuhar, brzycki, dmgreen, grosser, davide Reviewed By: kuhar, brzycki Subscribers: vsk, mgorny, llvm-commits Author: NutshellySima Differential Revision: https://reviews.llvm.org/D48383 llvm-svn: 336114
*	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values	Simon Pilgrim	2018-07-02	1	-47/+22
\| \| \| \| \| \|	We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113
*	[ValueTracking] allow undef elements when matching vector abs	Sanjay Patel	2018-07-02	1	-32/+27
\| \| \| \|	llvm-svn: 336111
*	[CodeGen] Make block removal order deterministic in CodeGenPrepare	David Stenberg	2018-07-02	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace use of a SmallPtrSet with a SmallSetVector to make the worklist iteration order deterministic. This is done as the order the blocks are removed may affect whether or not PHI nodes in successor blocks are removed. For example, consider the following case where %bb1 and %bb2 are removed: bb1: br i1 undef, label %bb3, label %bb4 bb2: br i1 undef, label %bb4, label %bb3 bb3: pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ] br label %bb4 bb4: pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ], [ pv1, %bb3 ], [ v0, %other ] If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2 to pv1 will be removed before %bb1 is removed as a predecessor to %bb4. The pv1 node will thus be optimized out (to v0) at the time %bb1 is removed as a predecessor to %bb4, leaving the blocks as following when the incoming value from %bb1 has been removed: bb3: ; pv1 optimized out, incoming value to pv2 is v0 br label %bb4 bb4: pv2 = phi type [ v0, %bb3 ], [ v0, %other ] The pv2 PHI node will be optimized away by removePredecessor() as all incoming values are identical. In case %bb2 is removed after %bb1, pv1 will not be optimized out at the time %bb2 is removed as a predecessor to %bb4, leaving the blocks as following when the incoming value from %bb2 to pv2 has been removed: bb3: pv1 = phi type [ undef, %bb2 ], [ v0, %other ] br label %bb4 bb4: pv2 = phi type [ pv1, %bb3 ], [ v0, %other ] The pv2 PHI node will thus not be removed in this case, ultimately leading to the following output bb3: ; pv1 optimized out, incoming value to pv2 is v0 br label %bb4 bb4: pv2 = phi type [ v0, %bb3 ], [ v0, %other ] I have not looked into changing DeleteDeadBlock() so that the redundant PHI nodes are removed. I have not added a test case, as I was not able to create a particularly small and (not messy) reproducer. This is likely due to SmallPtrSet behaving deterministically when in small mode. Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle Reviewed By: fhahn Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D48369 llvm-svn: 336109
*	[X86] Use addAliasForDirective to support the .word directive (reland)	Alex Bradbury	2018-07-02	1	-25/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 This is a fixed reland of rL336100. This should have been caught in pre-commit testing so apologies for the noise. llvm-svn: 336104
*	Revert r336100	Alex Bradbury	2018-07-02	1	-3/+25
\| \| \| \| \| \|	This was a bad change. .word == 2byte on x86. llvm-svn: 336103
*	[SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector ↵	Simon Pilgrim	2018-07-02	1	-6/+0
\| \| \| \| \| \| \| \|	getEntryCost This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure. llvm-svn: 336102
*	[X86] Use addAliasForDirective to support the .word directive	Alex Bradbury	2018-07-02	1	-25/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 llvm-svn: 336100
*	Recommit r328307: [IPSCCP] Use constant range information for comparisons of ↵	Florian Hahn	2018-07-02	1	-111/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098
*	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct ↵	Simon Pilgrim	2018-07-02	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095
*	[SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for ↵	Simon Pilgrim	2018-07-02	1	-1/+5
\| \| \| \| \| \| \| \|	getEntryCost/vectorizeTree. NFCI. Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators. llvm-svn: 336092
*	[AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)	Sander de Smalen	2018-07-02	2	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increments/decrements the result with the number of active bits from the predicate. The inc/dec variants added are: - incp x0, p0.h (scalar) - incp z0.h, p0 (vector) The unsigned saturating inc/dec variants added are: - uqincp x0, p0.h (scalar) - uqincp w0, p0.h (scalar, 32bit) - uqincp z0.h, p0 (vector) The signed saturating inc/dec variants added are: - sqincp x0, p0.h (scalar) - sqincp x0, p0.h, w0 (scalar, 32bit) - sqincp z0.h, p0 (vector) llvm-svn: 336091
*	[AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.	Sander de Smalen	2018-07-02	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increment/decrement vector by multiple of predicate constraint element count. The variants added by this patch are: - INCH, INCW, INC and (saturating): - SQINCH, SQINCW, SQINCD - UQINCH, UQINCW, UQINCW - SQDECH, SQINCW, SQINCD - UQDECH, UQINCW, UQINCW For example: incw z0.s, all, mul #4 llvm-svn: 336090
*	[X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic ↵	Simon Pilgrim	2018-07-02	1	-0/+2
\| \| \| \| \| \| \| \|	llvm-exegesis uop testing We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources. llvm-svn: 336089
*	[Mips][FastISel] Do not duplicate condition while lowering branches	Petar Jovanovic	2018-07-02	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change fixes the issue that arises when we duplicate condition from the predecessor block. If the condition's arguments are not considered alive across the blocks, fast regalloc gets confused and starts generating reloads from the slots that have never been spilled to. This change also leads to smaller code given that, unlike on architectures with condition codes, on Mips we can branch directly on register value, thus we gain nothing by duplication. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D48642 llvm-svn: 336084
*	[AArch64][SVE] Asm: Support for vector element compares (immediate).	Sander de Smalen	2018-07-02	2	-0/+85
\| \| \| \| \| \| \| \|	Compare vector elements with a signed/unsigned immediate, e.g. cmpgt p0.s, p0/z, z0.s, #-16 cmphi p0.s, p0/z, z0.s, #127 llvm-svn: 336081
*	Reapply r334980 and r334983.	Sander de Smalen	2018-07-02	6	-18/+154
\| \| \| \| \| \| \| \| \|	These patches were previously reverted as they led to buildbot time-outs caused by large switch statement in printAliasInstr when using UBSan and O3. The issue has been addressed with a workaround (r335525). llvm-svn: 336079
*	[X86] Put some cases in switch statements back on one line to be more ↵	Craig Topper	2018-07-02	1	-566/+186
\| \| \| \| \| \| \| \|	compact and make it easier to see the similarities. NFC It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here. llvm-svn: 336077
*	[X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary ↵	Craig Topper	2018-07-02	4	-234/+149
\| \| \| \| \| \| \| \| \| \|	search. I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier. Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now. llvm-svn: 336075
*	[PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's ↵	QingShan Zhang	2018-07-02	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = (unsigned long long )(p - 50000 + i); unsigned long long x2 = (unsigned long long )(p - 61024 + i); unsigned long long x3 = (unsigned long long )(p - 62048 + i); unsigned long long x4 = (unsigned long long )(p - 64096 + i); result = x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074
*	Implement strip.invariant.group	Piotr Padlewski	2018-07-02	8	-20/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
*	Add an entry for rodata constant merge sections to the default	Eric Christopher	2018-07-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	section flags in the ELF assembler. This matches the defaults given in the rest of MC. Fixes PR37997 where we couldn't assemble our own assembly output without warnings. llvm-svn: 336072
*	[X86] Remove the places that return nullptr from ↵	Craig Topper	2018-07-01	1	-44/+10
\| \| \| \| \| \| \| \|	X86InstrInfo::commuteInstructionImpl. findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl. llvm-svn: 336070
*	[SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction ↵	Simon Pilgrim	2018-07-01	1	-11/+9
\| \| \| \| \| \|	instead of an opcode. NFCI. llvm-svn: 336069
*	[SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt ↵	Simon Pilgrim	2018-07-01	1	-12/+10
\| \| \| \| \| \| \| \|	helper. NFCI. This is a basic step towards matching more general instructions types than just opcodes. llvm-svn: 336068
*	[X86][Disassembler] Remove TYPE_BNDR from translateImmediate.	Craig Topper	2018-07-01	1	-2/+0
\| \| \| \| \| \|	I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line. llvm-svn: 336066
*	[SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.	Simon Pilgrim	2018-07-01	1	-4/+2
\| \| \| \|	llvm-svn: 336063
*	[UnrollAndJam] New Unroll and Jam pass	David Green	2018-07-01	11	-20/+1273
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
*	[Evaluator] Improve evaluation of call instruction	Eugene Leviant	2018-07-01	1	-7/+62
\| \| \| \| \| \|	Recommit of r335324 after buildbot failure fix llvm-svn: 336059
*	[X86] Remove unnecessary include. NFC	Craig Topper	2018-07-01	1	-1/+0
\| \| \| \| \| \|	Leftover from when the pass contained a DenseMap before it switched to binary search. llvm-svn: 336057
*	[X86] Move the memory unfolding table creation into its own class and make ↵	Craig Topper	2018-07-01	5	-5499/+5554
\| \| \| \| \| \| \| \| \| \| \| \|	it a ManagedStatic. Also move the static folding tables, their search functions and the new class into new cpp/h files. The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables. By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed. llvm-svn: 336056
*	[X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a ↵	Craig Topper	2018-06-30	3	-48/+36
\| \| \| \| \| \| \| \| \| \|	getFMA3Group free function. NFCI The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use. This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic. llvm-svn: 336055
*	[X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP ↵	Craig Topper	2018-06-30	1	-9/+9
\| \| \| \| \| \| \| \|	artificial registers so they can't be parsed by the assembly parser. There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand. llvm-svn: 336054
*	[X86] Use MVT::i8 for scalar shift amounts since that is what they ↵	Craig Topper	2018-06-30	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	ultimately need to legalize to. I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step. In one case it looks like we may have been creating a shift amount larger than the shift input itself. llvm-svn: 336052
*	[X86] When combining load to BZHI, make sure we create the shift instruction ↵	Craig Topper	2018-06-30	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	with an i8 type. This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted. So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step. llvm-svn: 336051
*	[DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)	Simon Pilgrim	2018-06-30	1	-7/+9
\| \| \| \| \| \| \| \| \| \|	The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048
*	AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.	Tom Stellard	2018-06-30	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041