bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[MachineOutliner][NFC] Use flags set in all candidates to check for calls	Jessica Paquette	2018-11-13	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \|	If we keep track of if the ContainsCalls bit is set in the MBB flags for each candidate, then we have a better chance of not checking the candidate for calls at all. This saves quite a few checks in some CTMark tests (~200 in Bullet, for example.) llvm-svn: 346816
*	[InstCombine] fold funnel shift amount based on demanded bits	Sanjay Patel	2018-11-13	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The shift amount of a funnel shift is modulo the scalar bitwidth: http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic ...so we can use demanded bits analysis on that operand to simplify it when we have a power-of-2 bitwidth. This is another step towards canonicalizing {shift/shift/or} to the intrinsics in IR. Differential Revision: https://reviews.llvm.org/D54478 llvm-svn: 346814
*	Preserve loop metadata when splitting exit blocks	Craig Topper	2018-11-13	1	-0/+32
\| \| \| \| \| \| \| \| \| \|	LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.) Patch by Chang Lin (clin1) Differential Revision: https://reviews.llvm.org/D53876 llvm-svn: 346810
*	[MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo	Jessica Paquette	2018-11-13	2	-23/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	We already determine a bunch of information about an MBB in getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating things that must be false/true. The first thing we can easily check is if an outlined sequence could ever contain calls. There's no reason to walk over the outlined range, checking for calls, if we already know that there are no calls in the block containing the sequence. llvm-svn: 346809
*	[InstCombine] canonicalize rotate patterns with cmp/select	Sanjay Patel	2018-11-13	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cmp+branch variant of this pattern is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 ...and as discussed there, we probably can't transform that without a rotate intrinsic. We do have that now via funnel shift, but we're not quite ready to canonicalize IR to that form yet. The case with 'select' should already be transformed though, so that's this patch. The sequence with negation followed by masking is what we use in the backend and partly in clang (though that part should be updated). https://rise4fun.com/Alive/TplC %cmp = icmp eq i32 %shamt, 0 %sub = sub i32 32, %shamt %shr = lshr i32 %x, %shamt %shl = shl i32 %x, %sub %or = or i32 %shr, %shl %r = select i1 %cmp, i32 %x, i32 %or => %neg = sub i32 0, %shamt %masked = and i32 %shamt, 31 %maskedneg = and i32 %neg, 31 %shl2 = lshr i32 %x, %masked %shr2 = shl i32 %x, %maskedneg %r = or i32 %shl2, %shr2 llvm-svn: 346807
*	[MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates	Jessica Paquette	2018-11-13	2	-4/+5
\| \| \| \| \| \| \|	Since we never outline anything with fewer than 2 occurrences, there's no reason to compute cost model information if there's less than that. llvm-svn: 346803
*	[AMDGPU] combine extractelement into several selects	Stanislav Mekhanoshin	2018-11-13	1	-4/+26
\| \| \| \| \| \| \| \| \| \|	An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800
*	[MemorySSA] Create query after checking if instruction is a fence.	Alina Sbirlea	2018-11-13	1	-2/+3
\| \| \| \| \| \| \|	The alternative is checking if I is a fence in the Query constructor, so as to not attempt to get a non-existent MemoryLocation. llvm-svn: 346798
*	Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling	Stanislav Mekhanoshin	2018-11-13	1	-0/+9
\| \| \| \| \| \| \| \| \|	Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795
*	[MS Demangler] Print public:, protected:, private: if set in FunctionClass ↵	Nico Weber	2018-11-13	1	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \|	or a variable's StorageClass. undname prints them, and the information is in the decorated name, so we probably shouldn't lose it when undecorating. I spot-checked a few of the funnier-looking outputs, and undname has the same output. Differential Revision: https://reviews.llvm.org/D54396 llvm-svn: 346791
*	[AsmPrinter] Rename a comment of .debug_gnu_pubnames entry	Fangrui Song	2018-11-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The comment refers to the field as "Kind:". However, in gdb, https://sourceware.org/gdb//onlinedocs/gdb/Index-Section-Format.html names it "attributes", gdb/dwarf2read.c:dw2_symtab_iter_next refers to the whole value as "cu_index_and_attrs" Change it to `Attributes:` for consistency. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: aprantl, JDevlieghere, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54480 llvm-svn: 346790
*	DebugInfo: Add a CU metadata attribute for use of DWARF ranges base address ↵	David Blaikie	2018-11-13	7	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	specifiers Summary: Ranges base address specifiers can save a lot of object size in relocation records especially in optimized builds. For an optimized self-host build of Clang with split DWARF and debug info compression in object files, but uncompressed debug info in the executable, this change produces about 18% smaller object files and 6% larger executable. While it would've been nice to turn this on by default, gold's 32 bit gdb-index support crashes on this input & I don't think there's any perfect heuristic to implement solely in LLVM that would suffice - so we'll need a flag one way or another (also possible people might want to aggressively optimized for executable size that contains debug info (even with compression this would still come at some cost to executable size)) - so let's plumb it through. Differential Revision: https://reviews.llvm.org/D54242 llvm-svn: 346788
*	[NativePDB] Improved support for nested type reconstruction.	Zachary Turner	2018-11-13	2	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a previous patch, we pre-processed the TPI stream in order to build the reverse mapping from nested type -> parent type so that we could accurately reconstruct a DeclContext hierarchy. However, there were some issues. An LF_NESTTYPE record is really just a typedef, so although it happens to be used to indicate the name of the nested type and referring to the global record which defines the type, it is also used for every other kind of nested typedef. When we rebuild the DeclContext hierarchy, we want it to be as accurate as possible, which means that if we have something like: struct A { struct B {}; using C = B; }; We don't want to create two CXXRecordDecls in the AST each with the exact same definition. We just want to create one for B and then define C as an alias to B. Previously, however, it would not be able to distinguish between the two cases and it would treat A::B and A::C as being two classes each with separate definitions. We address the first half of improving the pre-processing logic so that only actual definitions are treated this way. Later, in a followup patch, we can handle the case of nested typedefs since we're already going to be enumerating the field list anyway and this patch introduces the general framework for distinguishing between the two cases. Differential Revision: https://reviews.llvm.org/D54357 llvm-svn: 346786
*	[SelectionDAG][X86] Relax restriction on the width of an input to ↵	Craig Topper	2018-11-13	6	-213/+247
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
*	[WebAssembly] Fix broken assumption that all bitcasts are to functions types	Sam Clegg	2018-11-13	1	-26/+43
\| \| \| \| \| \| \| \| \| \|	Specifically, we can bitcast to void. Fixes PR39591 Differential Revision: https://reviews.llvm.org/D54447 llvm-svn: 346778
*	[FileSystem] Add expand_tilde function	Jonas Devlieghere	2018-11-13	2	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \|	In D54435 there was some discussion about the expand_tilde flag for real_path that I wanted to expose through the VFS. The consensus is that these two things should be separate functions. Since we already have the code for this I went ahead and added a function expand_tilde that does just that. Differential revision: https://reviews.llvm.org/D54448 llvm-svn: 346776
*	[IR] Add a dedicated FNeg IR Instruction	Cameron McInally	2018-11-13	15	-3/+311
\| \| \| \| \| \| \| \| \| \| \|	The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774
*	[CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater.	Florian Hahn	2018-11-13	3	-39/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates DuplicateInstructionsInSplitBetween to update a DTU instead of applying updates to the DT directly. Given that there only are 2 users, also updated them in this patch to avoid churn. I slightly moved the code in CallSiteSplitting around to reduce the places where we have to pass in DTU. If necessary, I could split those changes in a separate patch. This fixes missing DT updates when dealing with musttail calls in CallSiteSplitting, by using DTU->deleteBB. Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki Reviewed By: NutshellySima llvm-svn: 346769
*	Revert "[ThinLTO] Internalize readonly globals"	Steven Wu	2018-11-13	10	-289/+48
\| \| \| \| \| \|	This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2. llvm-svn: 346768
*	Fix uninitialized variable.	Alexander Kornienko	2018-11-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flags variable was not initialized and later used (both isMBBSafeToOutlineFrom implementations assume it's initialized), which breaks test/CodeGen/AArch64/machine-outliner.mir. under memory sanitizer: MemorySanitizer: use-of-uninitialized-value #0 in llvm::AArch64InstrInfo::getOutliningType(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&, unsigned int) const llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5494:9 #1 in (anonymous namespace)::InstructionMapper::convertToUnsignedVec(llvm::MachineBasicBlock&, llvm::TargetInstrInfo const&) llvm/lib/CodeGen/MachineOutliner.cpp:772:19 #2 in (anonymous namespace)::MachineOutliner::populateMapper((anonymous namespace)::InstructionMapper&, llvm::Module&, llvm::MachineModuleInfo&) llvm/lib/CodeGen/MachineOutliner.cpp:1543:14 #3 in (anonymous namespace)::MachineOutliner::runOnModule(llvm::Module&) llvm/lib/CodeGen/MachineOutliner.cpp:1645:3 #4 in (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1744:27 #5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1857:44 #6 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:597:8 llvm-svn: 346761
*	[CostModel][X86] Fix constant vector XOP rights shifts	Simon Pilgrim	2018-11-13	1	-2/+11
\| \| \| \| \| \| \| \|	We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
*	[VectorUtils] Use namespace for InterleaveGroup template specialization.	Florian Hahn	2018-11-13	1	-4/+6
\| \| \| \|	llvm-svn: 346759
*	[VPlan] VPlan version of InterleavedAccessInfo.	Florian Hahn	2018-11-13	5	-19/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758
*	[TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.	Simon Pilgrim	2018-11-13	1	-2/+1
\| \| \| \| \| \|	It has no member dependencies and this makes it easier to reuse in other cost analysis code. llvm-svn: 346755
*	Fix comment for XOP rotates. NFCI.	Simon Pilgrim	2018-11-13	1	-1/+1
\| \| \| \|	llvm-svn: 346753
*	Fix .cfi_restore with register numbers > 64	Alexander Richardson	2018-11-13	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: DW_CFA_restore can only encode register numbers up to 64 (6 bits unsigned int). For regsiter numbers > 64 we have to use DW_CFA_restore_extended instead which uses a ULEB128 value. I discovered this problem in the out-of-tree CHERI target since we use DWARF register number 89 for our return capability register. Reviewers: probinson, dblaikie, aprantl, espindola Reviewed By: dblaikie Subscribers: JohnReagan, emaste, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54420 llvm-svn: 346751
*	Fix modules build of AVRAsmParser.cpp	Alexander Richardson	2018-11-13	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without this change I get the following error: lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant] Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53425 llvm-svn: 346750
*	[SystemZ] Increase the number of VLREPs	Jonas Paulsson	2018-11-13	2	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a loaded value is replicated it is best to combine these two operations into a VLREP (load and replicate), but isel will not produce this if the load has other users as well. This patch handles this by putting the other users of the load to use the REPLICATE 0-element instead of the load. This way the load has only the REPLICATE node as user, and we get a VLREP. Review: Ulrich Weigand https://reviews.llvm.org/D54264 llvm-svn: 346746
*	[DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops	Craig Topper	2018-11-13	1	-14/+7
\| \| \| \| \| \| \| \| \| \|	It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728
*	[BuildingAJIT] Update chapter 2 to use the ORCv2 APIs.	Lang Hames	2018-11-13	1	-1/+2
\| \| \| \|	llvm-svn: 346726
*	[FileCheck] fixing typo in assert	Fedor Sergeev	2018-11-13	1	-1/+1
\| \| \| \|	llvm-svn: 346723
*	[FileCheck] introduce CHECK-COUNT-<num> repetition directive	Fedor Sergeev	2018-11-13	1	-103/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases it is desirable to match the same pattern repeatedly many times. Currently the only way to do it is to copy the same check pattern as many times as needed. And that gets pretty unwieldy when its more than count is big. Introducing CHECK-COUNT-<num> directive which acts like a plain CHECK directive yet matches the same pattern exactly <num> times. Extended FileCheckType to a struct to add Count there. Changed some parsing routines to handle non-fixed length of directive (all currently existing directives were fixed-length). The code is generic enough to allow future support for COUNT in more than just PlainCheck directives. See motivating example for this feature in reviews.llvm.org/D54223. Reviewed By: chandlerc, dblaikie Differential Revision: https://reviews.llvm.org/D54336 llvm-svn: 346722
*	[MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner	Jessica Paquette	2018-11-13	1	-20/+19
\| \| \| \| \| \| \| \| \|	Turns out it's way simpler to do this check with one LRU. Instead of maintaining two, just keep one. Check if each of the registers is available, and then check if it's a live out from the block. If it's a live out, but available in the block, we know we're in an unsafe case. llvm-svn: 346721
*	Introduce DebugCounter into ConstProp pass	Zhizhou Yang	2018-11-13	1	-26/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces DebugCounter into ConstProp pass at per-transformation level. It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass. This will make debug easier for the pass, also providing chance to do transformation level bisecting. Reviewers: davide, fhahn Reviewed By: fhahn Subscribers: llozano, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D50094 llvm-svn: 346720
*	[MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to ↵	Jessica Paquette	2018-11-12	3	-18/+39
\| \| \| \| \| \| \| \| \| \| \| \|	isMBBSafeToOutlineFrom Instead of returning Flags, return true if the MBB is safe to outline from. This lets us check for unsafe situations, like say, in AArch64, X17 is live across a MBB without being defined in that MBB. In that case, there's no point in performing an instruction mapping. llvm-svn: 346718
*	[InstCombine] narrow width of rotate patterns, part 3	Sanjay Patel	2018-11-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. https://rise4fun.com/Alive/FMn Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346716
*	[DWARF] Do not use PRIx32 for printing uint64_t values	Simon Atanasyan	2018-11-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `DWARFDebugAddrTable::dump` routine prints 32/64-bits addresses. These values are stored in a vector of `uint64_t` independently of their original sizes. But `format` function gets format string with PRIx32 suffix in case of 32-bit address size. At least on MIPS 32-bit targets that leads to incorrect output. This patch changes formats strings and always use PRIx64 to print `uint64_t` values. Differential Revision: http://reviews.llvm.org/D54424 llvm-svn: 346715
*	[InstCombine] narrow width of rotate patterns, part 2 (PR39624)	Sanjay Patel	2018-11-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: https://bugs.llvm.org/show_bug.cgi?id=39624 https://rise4fun.com/Alive/qnT %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346713
*	[GC][NFC] Simplify code now that we only have one safepoint kind	Philip Reames	2018-11-12	3	-17/+7
\| \| \| \| \| \|	This is the NFC follow up to exploit the semantic simplification from r346701 llvm-svn: 346712
*	[InstCombine] refactor code for matching shift amount of a rotate; NFC	Sanjay Patel	2018-11-12	1	-12/+17
\| \| \| \| \| \| \| \|	As shown in existing test cases and with: https://bugs.llvm.org/show_bug.cgi?id=39624 ...we're missing at least 2 more patterns for rotate narrowing. llvm-svn: 346711
*	Use a data structure better suited for large sets in SimplificationTracker.	Ali Tamur	2018-11-12	1	-11/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile. In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch. Reviewers: bjope, skatkov Reviewed By: bjope, skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54007 llvm-svn: 346710
*	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387)	Simon Pilgrim	2018-11-12	1	-8/+115
\| \| \| \| \| \| \| \|	This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706
*	AMDGPU: Adding more median3 patterns	Aakanksha Patil	2018-11-12	2	-9/+22
\| \| \| \| \| \| \| \|	min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
*	[GC] Remove so called PreCall safepoints	Philip Reames	2018-11-12	2	-9/+2
\| \| \| \| \| \| \|	Remove another bit of unused configuration potential from GCStrategy. It's not entirely clear what the intention here was, but from the docs, it sounds like this may have been subsumed by patchable call support. Note: This change is deliberately small to make it clear that while implemented, there's nothing using the option. A following NFC will do most of the simplifications. llvm-svn: 346701
*	[WebAssembly] Added WasmAsmParser.	Wouter van Oortmerssen	2018-11-12	4	-79/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is to replace the ELFAsmParser that WebAssembly was using, which so far was a stub that didn't do anything, and couldn't work correctly with wasm. This new class is there to implement generic directives related to wasm as a binary format. Wasm target specific directives are still parsed in WebAssemblyAsmParser as before. The two classes now cooperate more correctly too. Also implemented .result which was missing. Any unknown directives will now result in errors. Reviewers: dschuff, sbc100 Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54360 llvm-svn: 346700
*	[GC][InstCombine] Fix a potential iteration issue	Philip Reames	2018-11-12	1	-1/+4
\| \| \| \| \| \|	Noticed via inspection. Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops. May be a minor compile time problem today. llvm-svn: 346698
*	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of ↵	Craig Topper	2018-11-12	1	-13/+18
\| \| \| \| \| \| \| \| \| \|	directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697
*	NFC: DebugInfo: Reduce scope of DebugOffset to simplify code	David Blaikie	2018-11-12	1	-31/+33
\| \| \| \| \| \| \| \| \| \| \|	This was being used as a sort of indirect out parameter from shouldDump - seems simpler to use it as the actual result of the call. (this does mean using a pointer to an Optional & actually using all 3 states (null, None, and present) which is, admittedly, a tad subtle - but given the limited scope, seems OK to me - open to discussion though, if others feel strongly about it) llvm-svn: 346691
*	[AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]Z	Stanislav Mekhanoshin	2018-11-12	1	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690
*	[CostModel][X86] Add funnel shift rotation special case costs	Simon Pilgrim	2018-11-12	1	-1/+82
\| \| \| \| \| \|	When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688