bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.	Simon Pilgrim	2017-01-05	1	-8/+0
\| \| \| \| \| \|	Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152
*	[CostModel][X86] Include the cost of 256-bit upper subvector ↵	Simon Pilgrim	2017-01-05	1	-2/+2
\| \| \| \| \| \| \| \|	extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149
*	Typo	Joerg Sonnenberger	2017-01-05	1	-1/+1
\| \| \| \|	llvm-svn: 291147
*	[CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common ↵	Simon Pilgrim	2017-01-05	1	-272/+227
\| \| \| \| \| \|	shuffle cost LUTs. NFCI. llvm-svn: 291146
*	Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")	Matt Arsenault	2017-01-05	1	-19/+33
\| \| \| \| \| \|	Arrays are supposed to be static const llvm-svn: 291144
*	Remove a unnecessary hasLoopInvariantOperands check in loop sink.	Xin Tong	2017-01-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Preheader instruction's operands will always be invariant w.r.t. the loop which its the preheader for. Memory aliases are handled in canSinkOrHoistInst. Reviewers: danielcdh, davidxl Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D28270 llvm-svn: 291132
*	less braces; NFC	Sanjay Patel	2017-01-05	1	-2/+1
\| \| \| \|	llvm-svn: 291126
*	[CostModel][X86] Add support for broadcast shuffle costs	Simon Pilgrim	2017-01-05	1	-9/+48
\| \| \| \| \| \| \| \|	Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122
*	[X86] Optimize vector shifts with variable but uniform shift amounts	Zvi Rackover	2017-01-05	1	-16/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register. The lower 64-bits of the register are evaluated to determine the shift amount. This patch improves the construction of the vector containing the shift amount. Reviewers: craig.topper, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28353 llvm-svn: 291120
*	[ThinLTO] Add parenthesis as per build warning	Teresa Johnson	2017-01-05	1	-3/+2
\| \| \| \| \| \|	Fixes a warning about "\|\|" and "&&" due to r291108. llvm-svn: 291119
*	[PowerPC] Implement missing ISA 2.06 instructions.	Tony Jiang	2017-01-05	4	-1/+18
\| \| \| \| \| \| \|	Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement them and add corresponding test cases in this patch. llvm-svn: 291116
*	[ThinLTO] Use DenseSet instead of SmallPtrSet for holding GUIDs	Teresa Johnson	2017-01-05	1	-4/+4
\| \| \| \| \| \| \| \| \|	Should fix some more bot failures from r291108. This should have been a DenseSet, since GUID is not a pointer type. It caused some bots to fail, but for some reason I wasnt't getting a build failure. llvm-svn: 291115
*	[CostModel][X86] Pulled out common type legalization code	Simon Pilgrim	2017-01-05	1	-7/+4
\| \| \| \|	llvm-svn: 291109
*	[ThinLTO] Subsume all importing checks into a single flag	Teresa Johnson	2017-01-05	5	-117/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds a new summary flag NotEligibleToImport that subsumes several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal and IsNotViableToInline). It also subsumes the checking of references on the summary that was being done during the thin link by eligibleForImport() for each candidate. It is much more efficient to do that checking once during the per-module summary build and record it in the summary. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28169 llvm-svn: 291108
*	Currently isLikelyComplexAddressComputation tries to figure out if the given ↵	Mohammed Agabaria	2017-01-05	8	-55/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106
*	[GlobalISel] Add support for address-taken basic blocks	Kristof Beyls	2017-01-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level basic blocks need to be initialized, as the AsmPrinter uses this link to be able to print out labels for the basic blocks that are address-taken. Most of the changes in this commit are about adapting existing tests to include the basic block name that is now printed out in the MIR format, now that the name becomes available as the link to the LLVM-IR basic block is initialized. The relevant test change for the functionality added in this patch are the added "(address-taken)" strings in test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D28123 llvm-svn: 291105
*	[GlobalISel] Add support for switch statements	Kristof Beyls	2017-01-05	1	-0/+39
\| \| \| \| \| \| \| \| \| \|	This commit does this using a trivial chain of conditional branches. In the future, we probably want to reuse the optimized switch lowering used in SelectionDAG. Differential Revision: https://reviews.llvm.org/D28176 llvm-svn: 291099
*	[GlobalISel] Fix AArch64 ICMP instruction selection	Kristof Beyls	2017-01-05	1	-3/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28175 llvm-svn: 291097
*	[Test Commit] fixing some format issue in X86TTI to match clang-format output.	Mohammed Agabaria	2017-01-05	1	-3/+6
\| \| \| \|	llvm-svn: 291095
*	AVX-512: Optimized pattern for truncate with unsigned saturation.	Elena Demikhovsky	2017-01-05	1	-0/+63
\| \| \| \| \| \| \|	DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. Differential revision: https://reviews.llvm.org/D28216 llvm-svn: 291092
*	[X86] Add Intel Kaby Lake model numbers to getHostCPUName aliased to ↵	Craig Topper	2017-01-05	1	-2/+4
\| \| \| \| \| \| \| \|	"skylake" since there are no feature differences. Model numbers found here http://www.sandpile.org/x86/cpuid.htm llvm-svn: 291086
*	MC: support passing search paths to the IAS	Saleem Abdulrasool	2017-01-05	1	-0/+2
\| \| \| \| \| \| \|	This is needed to support inclusion in inline assembly via the `.include` directive. llvm-svn: 291085
*	[X86] Change getHostCPUName to report Intel model 0x4e as "skylake" instead ↵	Craig Topper	2017-01-05	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of "skylake-avx512". Add the proper 0x55 model for "skylake-avx512". Summary: Intel's i5-6300U CPU is reporting to have a model id of 78 (4e). The Host detection assumes that to be Skylake Xeon (with AVX512 support), instead of a normal Skylake machine. Patch by: Valentin Churavy Reviewers: nalimilan, craig.topper Subscribers: hfinkel, tkelman, craig.topper, nalimilan, llvm-commits Differential Revision: https://reviews.llvm.org/D28221 llvm-svn: 291084
*	[libFuzzer] use /tmp (or $TMPDIR, if present) to store temp files during merge	Kostya Serebryany	2017-01-05	4	-2/+13
\| \| \| \|	llvm-svn: 291078
*	IR: Module summary representation for type identifiers; summary test ↵	Peter Collingbourne	2017-01-05	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scaffolding for lowertypetests. Set up basic YAML I/O support for module summaries, plumb the summary into the pass and add a few command line flags to test YAML I/O support. Bitcode support to come separately, as will the code in LowerTypeTests that actually uses the summary. Also add a couple of tests that pass by virtue of the pass doing nothing with the summary (which happens to be the correct thing to do for those tests). Differential Revision: https://reviews.llvm.org/D28041 llvm-svn: 291069
*	Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")	Richard Smith	2017-01-05	1	-22/+18
\| \| \| \| \| \| \|	This caused buildbot failures due to returning ArrayRefs referencing local (temporary) objects. llvm-svn: 291067
*	[DWARF] Null out the debug locs of load instructions that have been moved by ↵	Wolfgang Pieb	2017-01-04	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \|	GVN performing partial redundancy elimination (PRE). Not doing so can cause jumpy line tables and confusing (though correct) source attributions. Differential Revision: https://reviews.llvm.org/D27857 llvm-svn: 291037
*	Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)	Mehdi Amini	2017-01-04	2	-28/+377
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a relatively simple scheme: we use the index emitted in the bitcode to avoid loading all the global metadata. Instead we load the index with their position in the bitcode so that we can load each of them individually. Materializing the global metadata block in this condition only triggers loading the named metadata, and the ones referenced from there (transitively). When materializing a function, metadata from the global block are loaded lazily as they are referenced. Two main current limitations are: 1) Global values other than functions are not materialized on demand, so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records (and their transitive dependencies). 2) When we load a single metadata, we don't recurse on the operands, instead we use a placeholder or a temporary metadata. Unfortunately tepmorary nodes are very expensive. This is why we don't have it always enabled and only for importing. These two limitations can be lifted in a subsequent improvement if needed. With this change, the total link time of opt with ThinLTO and Debug Info enabled is going down from 282s to 224s (~20%). Reviewers: pcc, tejohnson, dexonsmith Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28113 llvm-svn: 291027
*	Change BitstreamCursor::skipRecord to return the record code (NFC)	Mehdi Amini	2017-01-04	1	-4/+14
\| \| \| \|	llvm-svn: 291026
*	AMDGPU: Remove unneccessary intermediate vector	Matt Arsenault	2017-01-04	1	-18/+22
\| \| \| \|	llvm-svn: 291025
*	InstCombine: Fold cos(-x) -> cos(x)	Matt Arsenault	2017-01-04	1	-0/+14
\| \| \| \| \| \|	Also cos(fabs(x)) -> cos(x) llvm-svn: 291022
*	Reapply "Make BitCodeAbbrev ownership explicit using shared_ptr rather than ↵	David Blaikie	2017-01-04	2	-94/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IntrusiveRefCntPtr"" If this is a problem for anyone (shared_ptr is two pointers in size, whereas IntrusiveRefCntPtr is 1 - and the ref count control block that make_shared adds is probably larger than the one int in RefCountedBase) I'd prefer to address this by adding a lower-overhead version of shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing) to avoid the intrusiveness - this allows memory ownership to remain orthogonal to types and at least to me, seems to make code easier to understand (since no implicit ownership acquisition can happen). This recommits 291006, reverted in r291007. llvm-svn: 291016
*	[Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.	Tim Shen	2017-01-04	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero extended. For example, given double 65534.0, without legalization: fp-to-uint16: 65534.0 -> 0xfffe With the legalization: fp-to-sint32: 65534.0 -> 0x0000fffe Without this patch, legalization wrongly emits a signed extend assertion, which is consumed by later icmp instruction, and cause miscompile. Note that the floating point value must be in [0, 65535), otherwise the behavior is undefined. This patch reverts r279223 behavior and adds more tests and documentations. In PR29041's context, James Molloy mentioned that: We don't need to mask because conversion from float->uint8_t is undefined if the integer part of the float value is not representable in uint8_t. Therefore we can assume this doesn't happen! which is totally true and good, because fptoui is documented clearly to have undefined behavior when overflow/underflow happens. We should take the advantage of this behavior so that we can save unnecessary mask instructions. Reviewers: jmolloy, nadav, echristo, kbarton Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28284 llvm-svn: 291015
*	The patch fixes (base, index, offset) match.	Evgeny Stupachenko	2017-01-04	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of matching: (a + i) + 1 -> (a + i, undef, 1) Now it matches: (a + i) + 1 -> (a, i, 1) Reviewers: rengolin Differential Revision: http://reviews.llvm.org/D26367 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 291012
*	[AArch64] Update the feature set for Qualcomm's Falkor CPU.	Chad Rosier	2017-01-04	1	-1/+5
\| \| \| \|	llvm-svn: 291010
*	[AArch64] Fix over-eager early-exit in load-store combiner	Nirav Dave	2017-01-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fix early-exit analysis for memory operation pairing when operations are not emitted in ascending order. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D28251 llvm-svn: 291008
*	Revert "Make BitCodeAbbrev ownership explicit using shared_ptr rather than ↵	David Blaikie	2017-01-04	2	-94/+94
\| \| \| \| \| \| \| \| \| \| \|	IntrusiveRefCntPtr" Breaks Clang's use of bitcode. Reverting until I have a fix to go with it there. This reverts commit r291006. llvm-svn: 291007
*	Make BitCodeAbbrev ownership explicit using shared_ptr rather than ↵	David Blaikie	2017-01-04	2	-94/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IntrusiveRefCntPtr If this is a problem for anyone (shared_ptr is two pointers in size, whereas IntrusiveRefCntPtr is 1 - and the ref count control block that make_shared adds is probably larger than the one int in RefCountedBase) I'd prefer to address this by adding a lower-overhead version of shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing) to avoid the intrusiveness - this allows memory ownership to remain orthogonal to types and at least to me, seems to make code easier to understand (since no implicit ownership acquisition can happen). llvm-svn: 291006
*	[PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)	Hal Finkel	2017-01-04	1	-40/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts: 1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration. 2. When determining when we need to add a nop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction). First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places. Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or come from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections). There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return. Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and I don't think we should try to make it appear to work in the backend in spite of that fact. Unfortunately, because of the way that the ABI works, we need to generate code as if we supported interposition whenever the linker might insert stubs for the purpose of supporting it. Differential Revision: https://reviews.llvm.org/D27231 llvm-svn: 291003
*	NewGVN: Track the maximum number of iterations GVN takes on any function, so ↵	Daniel Berlin	2017-01-04	1	-1/+4
\| \| \| \| \| \|	we can pinpoint performance issues. llvm-svn: 291002
*	[lib/LTO] Simplify logic removing set but unused variable. NFCI.	Davide Italiano	2017-01-04	1	-9/+3
\| \| \| \| \| \| \|	Reported by David Binderman and ack'ed by Teresa on IRC. PR: 31527 llvm-svn: 291000
*	YAML: Remove Input::MapHNode::isValidKey(), use llvm::is_contained() ↵	Peter Collingbourne	2017-01-04	1	-9/+1
\| \| \| \| \| \|	instead. NFC. llvm-svn: 290999
*	Remove dead and unused variable NumSentinelElements.	Eric Christopher	2017-01-04	1	-2/+2
\| \| \| \| \| \|	Fixes PR31529. llvm-svn: 290998
*	Remove dead variable Len.	Eric Christopher	2017-01-04	1	-4/+1
\| \| \| \| \| \|	Fixes PR31528 llvm-svn: 290995
*	AMDGPU/SI: Implement sendmsghalt intrinsic	Jan Vesely	2017-01-04	6	-4/+21
\| \| \| \| \| \| \| \|	v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
*	Reapply "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of ↵	Robert Lougher	2017-01-04	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	common inst" This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The debugloc is now only set when the instruction is not a call, as this causes the verifier to assert (the inliner requires an inlinable callsite to have a debug loc if the caller and callee have debug info). Original commit message: Simplify CFG will try to sink the last instruction in a series of basic blocks, creating a "common" instruction in the successor block (sinkLastInstruction). When it does this, the debug location of the single instruction should be the merged debug locations of the commoned instructions. Original review: https://reviews.llvm.org/D27590 llvm-svn: 290973
*	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs	Simon Pilgrim	2017-01-04	1	-11/+9
\| \| \| \| \| \|	Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962
*	[PowerPC] Add identification for POWER8NVL	Nemanja Ivanovic	2017-01-04	1	-0/+1
\| \| \| \| \| \| \|	This CPU type was not previously recognized by LLVM which led to emitting poor (and sometimes incorrect) code in some JIT workloads on such a machine. llvm-svn: 290961
*	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI.	Simon Pilgrim	2017-01-04	1	-141/+81
\| \| \| \| \| \|	As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956
*	[framelowering] Skip dbg values when getting next/previous instruction.	Florian Hahn	2017-01-04	1	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955