bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PowerPC] Add a late MI-level pass for QPX load/splat simplification	Hal Finkel	2016-03-31	5	-4/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047
*	Revert r265039 "[X86] Merge adjacent stack adjustments in ↵	Hans Wennborg	2016-03-31	1	-19/+12
\| \| \| \| \| \| \| \| \| \|	eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046
*	Preserve extern_weak linkage in CloneModule.	Evgeniy Stepanov	2016-03-31	1	-10/+15
\| \| \| \| \| \| \|	Only force "extern" linkage if the function used to be a definition in the source module. Declarations keep their original linkage. llvm-svn: 265043
*	[ARM] Expand v1i64 and v2i64 ctpop.	Benjamin Kramer	2016-03-31	1	-0/+2
\| \| \| \| \| \| \|	The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040
*	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr ↵	Hans Wennborg	2016-03-31	1	-12/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039
*	Change eliminateCallFramePseudoInstr() to return an iterator	Hans Wennborg	2016-03-31	31	-88/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036
*	[lanai] isBrImm should accept any non-constant immediate.	Jacques Pienaar	2016-03-31	1	-17/+6
\| \| \| \| \| \| \| \|	isBrImm should accept any non-constant immediate. Previously it was only accepting LanaiMCExpr ones which was wrong. Differential Revision: http://reviews.llvm.org/D18571 llvm-svn: 265032
*	[PPC] basic support for Power 9 direct move instructions	Ehsan Amiri	2016-03-31	1	-2/+17
\| \| \| \| \| \| \| \|	http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions llvm-svn: 265031
*	[PGO] use emplace_back. NFC.	Rong Xu	2016-03-31	1	-1/+1
\| \| \| \| \| \|	Use emplace_back instead of push_back for simplicity. llvm-svn: 265030
*	[x86] use SSE/AVX ops for non-zero memsets (PR27100)	Sanjay Patel	2016-03-31	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
*	Minor code cleanup /NFC	Xinliang David Li	2016-03-31	1	-4/+6
\| \| \| \|	llvm-svn: 265025
*	Don't use potentially invalidated iterator	Stephan Bergmann	2016-03-31	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	If the lhs is evaluated before the rhs, FuncletI's operator-> can trigger the assert(isHandleInSync() && "invalid iterator access!"); at include/llvm/ADT/DenseMap.h:1061. (Happens e.g. when compiled with GCC 6.) Differential Revision: http://reviews.llvm.org/D18440 llvm-svn: 265024
*	[PowerPC] Correctly compute 64-bit offsets in fast isel	Ulrich Weigand	2016-03-31	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCSimplifyAddress contains this code: IntegerType OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(Context) : Type::getInt64Ty(Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually always MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023
*	[PowerPC] Basic support for P9 atomic loads and stores	Nemanja Ivanovic	2016-03-31	7	-0/+66
\| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022
*	[AArch64] Handle missing store pair opportunity	Jun Bum Lim	2016-03-31	1	-22/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change will handle missing store pair opportunity where the first store instruction stores zero followed by the non-zero store. For example, this change will convert : str wzr, [x8] str w1, [x8, #4] into: stp wzr, w1, [x8] Reviewers: jmolloy, t.p.northover, mcrosier Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18570 llvm-svn: 265021
*	[PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel	Ulrich Weigand	2016-03-31	3	-20/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct if and only if the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020
*	[mips] Range check simm16	Daniel Sanders	2016-03-31	4	-35/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are too many instructions to exhaustively test so addiu and lwc2 are used as representative examples. It should be noted that many memory instructions that should have simm16 range checking do not because it is also necessary to support the macro of the same name which accepts simm32. The range checks for these occur in the macro expansion. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18437 llvm-svn: 265019
*	[mips] Range check simm11 and mem_simm11.	Daniel Sanders	2016-03-31	2	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem is that they don't trigger the custom parser because all the candidates are disabled by feature bits. On all other subtargets, the diagnostics are accurate but are subject to the usual issues of needing to report multiple ways to correct the code (e.g. smaller offset, enable a CPU feature) but only being able to report one error. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18436 llvm-svn: 265018
*	[IFUNC] Introduce GlobalIndirectSymbol as a base class for alias and ifunc	Dmitry Polukhin	2016-03-31	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \|	This patch is a part of http://reviews.llvm.org/D15525 GlobalIndirectSymbol class contains common implementation for both aliases and ifuncs. This patch should be NFC change that just prepare common code for ifunc support. Differential Revision: http://reviews.llvm.org/D18433 llvm-svn: 265016
*	[AMDGPU] Disassembler: support for DPP	Sam Kolton	2016-03-31	2	-7/+23
\| \| \| \| \|	Review: http://reviews.llvm.org/D18642 llvm-svn: 265015
*	[mips] Split mem_msa into range checked mem_simm10 and mem_simm10_lsl[123]	Daniel Sanders	2016-03-31	5	-65/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also, made test_mi10.s formatting consistent with the majority of the MC tests. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18435 llvm-svn: 265014
*	Prevent X86ISelLowering from merging volatile loads	Nirav Dave	2016-03-31	3	-21/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013
*	[mips] Range check simm9 and fix a bug this revealed.	Daniel Sanders	2016-03-31	4	-12/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a 12-bit offset when it is only 9-bit. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18434 llvm-svn: 265010
*	[mips][microMIPS] Implement MFC, MFHC and DMFC* instructions	Zlatko Buljan	2016-03-31	7	-15/+101
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17334 llvm-svn: 265002
*	Indentation fix in SystemZInstrInfo.cpp	Jonas Paulsson	2016-03-31	1	-2/+2
\| \| \| \|	llvm-svn: 265000
*	[InstCombine] Fix incorrect rule from rL236202	Sanjoy Das	2016-03-31	1	-1/+2
\| \| \| \| \| \| \|	The rule for SMIN introduced in rL236202 doesn't work as advertised: the check for Pred == ICmpInst::ICMP_SGT was missing. llvm-svn: 264996
*	Delete trailing whitespace	Sanjoy Das	2016-03-31	1	-1/+1
\| \| \| \|	llvm-svn: 264995
*	[SCEV] Track NoWrap properties using MatchBinaryOp, NFC	Sanjoy Das	2016-03-31	1	-7/+16
\| \| \| \| \| \| \| \| \|	This way once we teach MatchBinaryOp to map more things into arithmetic, the non-wrapping add recurrence construction would understand it too. Right now MatchBinaryOp still only understands arithmetic, so this is solely a code-reorganization change. llvm-svn: 264994
*	[SCEV] NFC code motion to simplify later change	Sanjoy Das	2016-03-31	1	-77/+77
\| \| \| \|	llvm-svn: 264993
*	[X86] Use MVT instead of EVT in code called after legalization.	Craig Topper	2016-03-31	1	-3/+3
\| \| \| \|	llvm-svn: 264992
*	[PowerPC] Load two floats directly instead of using one 64-bit integer load	Hal Finkel	2016-03-31	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988
*	Introduce a @llvm.experimental.guard intrinsic	Sanjoy Das	2016-03-31	5	-5/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed on llvm-dev[1]. This change adds the basic boilerplate code around having this intrinsic in LLVM: - Changes in Intrinsics.td, and the IR Verifier - A lowering pass to lower @llvm.experimental.guard to normal control flow - Inliner support [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18527 llvm-svn: 264976
*	[X86] Enable call frame optimization ("mov to push") not only for optsize ↵	Hans Wennborg	2016-03-30	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	(PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966
*	CodeGen: Factor out code for tail call result compatibility check; NFC	Matthias Braun	2016-03-30	4	-105/+63
\| \| \| \|	llvm-svn: 264959
*	AMDGPU: Add frexp_exp intrinsic	Matt Arsenault	2016-03-30	2	-7/+18
\| \| \| \|	llvm-svn: 264944
*	AMDGPU: Constant folding for frexp_mant	Matt Arsenault	2016-03-30	1	-0/+14
\| \| \| \|	llvm-svn: 264943
*	Use existing PrintEscapedString in AssemblyWriter	Teresa Johnson	2016-03-30	1	-19/+3
\| \| \| \| \| \| \|	r264884 introduced a helper to escape the backslashes in the source file path, but I since discovered an existing mechanism to escape strings. llvm-svn: 264936
*	Cloning: Reduce complexity of debug info cloning and fix correctness issue.	Peter Collingbourne	2016-03-30	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit r260791 contained an error in that it would introduce a cross-module reference in the old module. It also introduced O(N^2) complexity in the module cloner by requiring the entire module to be visited for each function. Fix both of these problems by avoiding use of the CloneDebugInfoMetadata function (which is only designed to do intra-module cloning) and cloning function-attached metadata in the same way that we clone all other metadata. Differential Revision: http://reviews.llvm.org/D18583 llvm-svn: 264935
*	Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵	Aaron Ballman	2016-03-30	7	-14/+14
\| \| \| \| \| \|	"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
*	LegalizeDAG: Don't replace vector store with integer if not legal	Matt Arsenault	2016-03-30	3	-41/+87
\| \| \| \| \| \| \| \| \| \| \|	For the same reason as the corresponding load change. Note that ExpandStore is completely broken for non-byte sized element vector stores, but preserve the current broken behavior which has tests for it. The behavior should be the same, but now introduces a new typed store that is incorrectly split later rather than doing it directly. llvm-svn: 264928
*	LegalizeDAG: Don't replace vector load with integer unless legal	Matt Arsenault	2016-03-30	3	-28/+71
\| \| \| \| \| \| \| \| \| \| \| \| \|	On AMDGPU we want to be able to promote i64/f64 loads to v2i32. If the access is unaligned, this would conclude that since i64 is legal, it would convert it back to i64 and there is an endless legalization loop. Extract the logic for scalarizing the load into a new TargetLowering function, where this can also replace the custom function AMDGPU has for this. llvm-svn: 264927
*	[IndVarSimplify] Don't insert after a catchswitch	David Majnemer	2016-03-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Widening a PHI requires us to insert a trunc. The logical place for this trunc is in the same BB as the PHI. This is not possible if the BB is terminated by a catchswitch. This fixes PR27133. llvm-svn: 264926
*	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for ↵	Simon Pilgrim	2016-03-30	1	-1/+0
\| \| \| \| \| \| \| \|	consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922
*	[NVPTX] Make NVVMReflect a function pass.	Justin Lebar	2016-03-30	2	-102/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently it's a module pass. Make it a function pass so that we can move it to PassManagerBuilder's EP_EarlyAsPossible extension point, which only accepts function passes. Reviewers: rnk Subscribers: tra, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18615 llvm-svn: 264919
*	[PassManager] Make PassManagerBuilder::addExtension take an std::function, ↵	Justin Lebar	2016-03-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than a function pointer. Summary: This gives callers flexibility to pass lambdas with captures, which lets callers avoid the C-style void*-ptr closure style. (Currently, callers in clang store state in the PassManagerBuilderBase arg.) No functional change, and the new API is backwards-compatible. Reviewers: chandlerc Subscribers: joker.eph, cfe-commits Differential Revision: http://reviews.llvm.org/D18613 llvm-svn: 264918
*	[LoopVectorize] Don't vectorize loops when everything will be scalarized	Hal Finkel	2016-03-30	1	-18/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
*	[PGO] PGOFuncName in LTO optimizations	Rong Xu	2016-03-30	2	-6/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PGOFuncNames are used as the key to retrieve the Function definition from the MD5 stored in the profile. For internal linkage function, we prefix the source file name to the PGOFuncNames. LTO's internalization privatizes many global linkage symbols. This happens after value profile annotation, but those internal linkage functions should not have a source prefix. To differentiate compiler generated internal symbols from original ones, PGOFuncName meta data are created and attached to the original internal symbols in the value profile annotation step. If a symbol does not have the meta data, its original linkage must be non-internal. Also add a new map that maps PGOFuncName's MD5 value to the function definition. Differential Revision: http://reviews.llvm.org/D17895 llvm-svn: 264902
*	Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"	Teresa Johnson	2016-03-30	5	-0/+42
\| \| \| \| \| \| \|	This restores commit 264869, with a fix for windows bots to properly escape '\' in the path when serializing out. Added test. llvm-svn: 264884
*	[AArch64] Fix warnings pointed out by Hal.	Chad Rosier	2016-03-30	1	-1/+5
\| \| \| \|	llvm-svn: 264882
*	[PGO] Use ArrayRef in annotateValueSite()	Rong Xu	2016-03-30	1	-5/+6
\| \| \| \| \| \| \| \| \|	Using ArrayRef in annotateValueSite's parameter instead of using an array and it's size. Differential Revision: http://reviews.llvm.org/D18568 llvm-svn: 264879