bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV	Silviu Baranga	2016-04-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535
*	Add parentheses to silence warning.	Richard Trieu	2016-04-06	1	-1/+2
\| \| \| \|	llvm-svn: 265516
*	ValueMapper: Fix delayed blockaddress handling after r265273	Duncan P. N. Exon Smith	2016-04-06	1	-3/+3
\| \| \| \| \| \| \| \| \|	r265273 added Mapper::mapBlockAddress, which delays mapping a blockaddress value until the function has a body. The condition was backwards, and should be checking Function::empty instead of GlobalValue::isDeclaration. llvm-svn: 265508
*	[RS4GC] Add a comment	Sanjoy Das	2016-04-06	1	-0/+4
\| \| \| \|	llvm-svn: 265503
*	[RS4GC] NFC cleanup of the DeferredReplacement class	Sanjoy Das	2016-04-05	1	-5/+18
\| \| \| \| \| \|	Instead of constructors use clearly named factory methods. llvm-svn: 265486
*	[RS4GC] Better codegen for deoptimize calls	Sanjoy Das	2016-04-05	1	-16/+52
\| \| \| \| \| \| \| \| \|	Don't emit a gc.result for a statepoint lowered from @llvm.experimental.deoptimize since the call into __llvm_deoptimize is effectively noreturn. Instead follow the corresponding gc.statepoint with an "unreachable". llvm-svn: 265485
*	Try harder to appease MSVC after r265456	Duncan P. N. Exon Smith	2016-04-05	1	-3/+12
\| \| \| \| \| \|	r265465 wasn't good enough. I need to spell out all the moves. llvm-svn: 265470
*	IR: Introduce ConstantAggregate, NFC	Duncan P. N. Exon Smith	2016-04-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Add a common parent class for ConstantArray, ConstantVector, and ConstantStruct called ConstantAggregate. These are the aggregate subclasses of Constant that take operands. This is mainly a cleanup, adding common `isa` target and removing duplicated code. However, it also simplifies caching which constants point transitively at `GlobalValue` (a possible future direction). llvm-svn: 265466
*	Try to appease MSVC after r265456	Duncan P. N. Exon Smith	2016-04-05	1	-0/+4
\| \| \| \| \| \| \|	I can't remember if adding `= default` will make MSVC happy, or if I have to spell this out. Let's try the cleaner version first. llvm-svn: 265465
*	ValueMapper: Rewrite Mapper::mapMetadata without recursion	Duncan P. N. Exon Smith	2016-04-05	1	-108/+329
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit completely rewrites Mapper::mapMetadata (the implementation of llvm::MapMetadata) using an iterative algorithm. The guts of the new algorithm are in MDNodeMapper::map, the entry function in a new class. Previously, Mapper::mapMetadata performed a recursive exploration of the graph with eager "just in case there's a reason" malloc traffic. The new algorithm has these benefits: - New nodes and temporaries are not created eagerly. - Uniquing cycles are not duplicated (see new unit test). - No recursion. Given a node to map, it does this: 1. Use a worklist to perform a post-order traversal of the transitively referenced unmapped nodes. 2. Track which nodes will change operands, and which will have new addresses in the mapped scheme. Propagate the changes through the POT until fixed point, to pick up uniquing cycles that need to change. 3. Map all the distinct nodes without touching their operands. If RF_MoveDistinctMetadata, they get mapped to themselves; otherwise, they get mapped to clones. 4. Map the uniqued nodes (bottom-up), lazily creating temporaries for forward references as needed. 5. Remap the operands of the distinct nodes. Mehdi helped me out by profiling this with -flto=thin. On his workload (importing/etc. for opt.cpp), MapMetadata sped up by 15%, contributed about 50% less to persistent memory, and made about 100x fewer calls to malloc. The speedup is less than I'd hoped. The profile mainly blames DenseMap lookups; perhaps there's a way to reduce them (e.g., by disallowing remapping of MDString). It would be nice to break the strange remaining recursion on the Value side: MapValue => materializeInitFor => RemapInstruction => MapValue. I think we could do this by having materializeInitFor return a worklist of things to be remapped. llvm-svn: 265456
*	Adds the ability to use an epilog remainder loop during loop unrolling and makes	David L Kreitzer	2016-04-05	2	-78/+336
\| \| \| \| \| \| \| \| \| \|	this the default behavior. Patch by Evgeny Stupachenko (evstupac@gmail.com). Differential Revision: http://reviews.llvm.org/D18158 llvm-svn: 265388
*	[IFUNC] Use GlobalIndirectSymbol when aliases and ifuncs have something similar	Dmitry Polukhin	2016-04-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Second part extracted from http://reviews.llvm.org/D15525 Use GlobalIndirectSymbol in all cases when aliases and ifuncs have something in common. Differential Revision: http://reviews.llvm.org/D18754 llvm-svn: 265382
*	use range loop; NFCI	Sanjay Patel	2016-04-04	1	-3/+3
\| \| \| \|	llvm-svn: 265360
*	Enable unroll for constant bound loops when TripCount is not modulo of ↵	Zia Ansari	2016-04-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit. Commit for Evgeny Stupachenko (evstupac@gmail.com) Differential Revision: http://reviews.llvm.org/D18290 llvm-svn: 265337
*	[PGO] Avoid instrumenting direct callee's at value sites.	Betul Buyukkurt	2016-04-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Direct callees' that are cast to other function prototypes, show up in the Call/Invoke instructions as ConstantExpr's. Currently llvm::CallSite's getCalledFunction() fails to return the callees in such expressions as direct calls. Value profiling should avoid instrumenting such cases. Mostly NFC. llvm-svn: 265330
*	[ThinLTO] Augment FunctionImport dump with value name to GUID map	Teresa Johnson	2016-04-04	1	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To aid in debugging, dump out the correlation between value names and GUID for each source module when it is materialized. This will make it easier to comprehend the earlier summary-based function importing debug trace which only has access to and prints the GUIDs. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18556 llvm-svn: 265326
*	ValueMapper: Remove old FIXMEs; almost NFC	Duncan P. N. Exon Smith	2016-04-04	1	-21/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a few old FIXMEs from the original commit of the Metadata/Value split in r223802. These are commented out assertions to the effect that calls between mapValue and mapMetadata never return nullptr. (The only behaviour change is that Mapper::mapSimpleMetadata memoizes the nullptr return.) When I originally rewrote the mapping code, I thought we could be stricter in the new metadata hierarchy and never return nullptr when RF_NullMapMissingGlobalValues was off. It's still not entirely clear to me why these assertions failed (a few months ago, I had a theory that I forgot to write down, but that's helping no one). Understood or not, I no longer see how these commented-out assertions would be useful. I'm relegating them to the annals of source control before making significant changes to ValueMapper.cpp. llvm-svn: 265282
*	ValueMapper: Disallow metadata mapping recursion through mapValue	Duncan P. N. Exon Smith	2016-04-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an assertion to maintain the property from r265273. When Mapper::mapSimpleMetadata calls Mapper::mapValue, it should not find its way back to mapMetadataImpl. This guarantees that mapSimpleMetadata is not involved in any recursion. Since Mapper::mapValue calls out to arbitrary materializers, we need to save a bit on the ValueMap to make this assertion effective. There should be no functionality change here. This co-recursion should already have been impossible. llvm-svn: 265276
*	Work around MSVC failure from r265273	Duncan P. N. Exon Smith	2016-04-03	1	-0/+10
\| \| \| \| \| \|	http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19726 llvm-svn: 265275
*	ValueMapper: Avoid recursion in mapSimplifiedMetadata, NFC	Duncan P. N. Exon Smith	2016-04-03	1	-9/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main change is to delay materializing GlobalValue initializers from Mapper::mapValue until Mapper::~Mapper. This effectively removes all recursion from mapSimplifiedMetadata, as promised in r265270. mapSimplifiedMetadata calls mapValue for ConstantAsMetadata nodes to find the mapped constant, and now it shouldn't be possible for mapValue to indirectly re-invoke mapMetadata. I'll add an assertion to that effect in a follow-up (separated so that the assertion can easily be reverted independently, if it comes to that). This a step toward a broader goal: converting Mapper::mapMetadataImpl from a recursive to an iterative algorithm. When a BlockAddress points at a BasicBlock inside an unmaterialized function body, we need to delay it until the function body is materialized in Mapper::~Mapper. This commit creates a temporary BasicBlock and returns a new BlockAddress, then RAUWs the BasicBlock once it is known. This situation should be extremely rare since a BlockAddress is usually used from within the function it's referencing (and BlockAddress itself is rare). There should be no observable functionality change. llvm-svn: 265273
*	ValueMapper: Split out mapSimpleMetadata, NFC	Duncan P. N. Exon Smith	2016-04-03	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split out a helper for mapping metadata without operands. This is any metadata that is not an MDNode, and any MDNode where the answer is known without looking at operands. Through some weird twists, this function is co-recursive: mapSimpleMetadata => MapValue => materializeInitFor => linkFunctionBody => RemapInstructions => MapMetadata => mapSimpleMetadata I plan to break the recursion in a follow-up. llvm-svn: 265270
*	ValueMapper: Introduce Mapper helper class, NFC	Duncan P. N. Exon Smith	2016-04-03	1	-85/+101
\| \| \| \| \| \| \|	Remove a bunch of boilerplate from ValueMapper.cpp by using a new file-local class called Mapper. llvm-svn: 265268
*	[SimplifyLibCalls] Garbage collect dead code.	Davide Italiano	2016-04-03	1	-28/+7
\| \| \| \| \| \| \| \| \| \|	We already skip optimizations if the return value of printf() is used, so CI->use_empty() is always true. Differential Revision: http://reviews.llvm.org/D18656 llvm-svn: 265253
*	Linker: Remove IRMover::isMetadataUnneeded indirection; almost NFC	Duncan P. N. Exon Smith	2016-04-02	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of checking live during MapMetadata whether a subprogram is needed, seed the ValueMap with `nullptr` up-front. There is a small hypothetical functionality change. Previously, calling MapMetadataOp on a node whose "scope:" chain led to an unneeded subprogram would return nullptr. However, if that were ever called, then the subprogram would be needed; a situation that the IRMover is supposed to avoid a priori! Besides cleaning up the code a little, this restores a nice property: MapMetadataOp returns the same as MapMetadata. llvm-svn: 265229
*	ValueMapper: Add support for seeding metadata with nullptr	Duncan P. N. Exon Smith	2016-04-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Support seeding a ValueMap with nullptr for Metadata entries, a situation I didn't consider in the Metadata/Value split. I added a ValueMapper::getMappedMD accessor that returns an Optional<Metadata*> with the mapped (possibly null) metadata. IRMover needs to use this to avoid modifying the map when it's checking for unneeded subprograms. I updated a call from bugpoint since I find the new code clearer. llvm-svn: 265228
*	Fix "warning: variabl 'XX’ set but not used" in release build (variable ↵	Mehdi Amini	2016-04-02	1	-1/+1
\| \| \| \| \| \| \|	used in assertion, NFC) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 265220
*	Create a typedef GlobalValue::GUID for uint64_t and RAUW (NFC)	Mehdi Amini	2016-04-02	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This should make the code more readable, especially all the map declarations. Reviewers: tejohnson Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18721 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 265215
*	[PGO] Use a helper function to find all indirect call-sites	Rong Xu	2016-04-01	2	-26/+46
\| \| \| \| \| \| \| \| \| \|	Use a helper function to find all the direct-calls-sites in a function. Also split the code into a separated file as this will be use by indirect-call-promotion transformation. Differential Revision: http://reviews.llvm.org/D18704 llvm-svn: 265199
*	LowerBitSets: Move declarations to separate namespace.	Peter Collingbourne	2016-04-01	1	-0/+1
\| \| \| \| \| \|	Should fix modules build. llvm-svn: 265176
*	[sancov] save entry block from pruning (it is always full dominator)	Mike Aizatsky	2016-04-01	1	-3/+3
\| \| \| \|	llvm-svn: 265168
*	[InstCombine] Don't sink an instr after a catchswitch	David Majnemer	2016-04-01	1	-1/+5
\| \| \| \| \| \|	A catchswitch is a terminator, instructions cannot be inserted after it. llvm-svn: 265158
*	[SLPVectorizer] Don't insert an extractelement before a catchswitch	David Majnemer	2016-04-01	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	A catchswitch cannot be preceded by another instruction in the same basic block (other than a PHI node). Instead, insert the extract element right after the materialization of the vectorized value. This isn't optimal but is a reasonable compromise given the constraints of WinEH. This fixes PR27163. llvm-svn: 265157
*	[PGO] Refactor PGOFuncName meta data code to be used in clang	Rong Xu	2016-04-01	1	-8/+2
\| \| \| \| \| \| \| \| \|	Refactor the code that gets and creates PGOFuncName meta data so that it can be used in clang's value profile annotation. Differential Revision: http://reviews.llvm.org/D18623 llvm-svn: 265149
*	Add a module Hash in the bitcode and the combined index, implementing a kind ↵	Mehdi Amini	2016-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	of "build-id" This is intended to be used for ThinLTO incremental build. Differential Revision: http://reviews.llvm.org/D18213 This is a recommit of r265095 after fixing the Windows issues. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 265111
*	Revert "Add support for computing SHA1 in LLVM"	Mehdi Amini	2016-04-01	1	-1/+1
\| \| \| \| \| \| \| \|	This reverts commit r265096, r265095, and r265094. Windows build is broken, and the validation does not pass. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 265102
*	Don't insert stackrestore on deoptimizing returns	Sanjoy Das	2016-04-01	1	-2/+4
\| \| \| \| \| \| \| \|	They're not necessary (since the stack pointer is trivially restored on return), and the way LLVM inserts the stackrestore calls breaks the IR (we get a stackrestore between the deoptimize call and the return). llvm-svn: 265101
*	Don't insert lifetime end markers on deoptimizing returns	Sanjoy Das	2016-04-01	1	-2/+5
\| \| \| \| \| \| \| \| \|	They're not necessary (since the lifetime of the alloca is trivially over due to the return), and the way LLVM inserts the lifetime.end markers breaks the IR (we get a lifetime end marker between the deoptimize call and the return). llvm-svn: 265100
*	Add a module Hash in the bitcode and the combined index, implementing a kind ↵	Mehdi Amini	2016-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	of "build-id" This is intended to be used for ThinLTO incremental build. Differential Revision: http://reviews.llvm.org/D18213 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 265095
*	Preserve blockaddress use edges in the module splitter.	Evgeniy Stepanov	2016-03-31	1	-45/+46
\| \| \| \| \| \| \| \|	"blockaddress" can not apply to an external function. All blockaddress constant uses must belong to the same module as the definition of the target function. llvm-svn: 265061
*	[NVPTX] Infer __nvvm_reflect as nounwind, readnone	David Majnemer	2016-03-31	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	This patch simply mirrors the attributes we give to @llvm.nvvm.reflect to the __nvvm_reflect libdevice call. This shaves about 30% of the code in libdevice away because of CSE opportunities. It's also helps us figure out that libdevice implementations of transcendental functions don't have side-effects. llvm-svn: 265060
*	Preserve extern_weak linkage in CloneModule.	Evgeniy Stepanov	2016-03-31	1	-10/+15
\| \| \| \| \| \| \|	Only force "extern" linkage if the function used to be a definition in the source module. Declarations keep their original linkage. llvm-svn: 265043
*	Minor code cleanup /NFC	Xinliang David Li	2016-03-31	1	-4/+6
\| \| \| \|	llvm-svn: 265025
*	Introduce a @llvm.experimental.guard intrinsic	Sanjoy Das	2016-03-31	4	-5/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed on llvm-dev[1]. This change adds the basic boilerplate code around having this intrinsic in LLVM: - Changes in Intrinsics.td, and the IR Verifier - A lowering pass to lower @llvm.experimental.guard to normal control flow - Inliner support [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18527 llvm-svn: 264976
*	AMDGPU: Add frexp_exp intrinsic	Matt Arsenault	2016-03-30	1	-5/+16
\| \| \| \|	llvm-svn: 264944
*	AMDGPU: Constant folding for frexp_mant	Matt Arsenault	2016-03-30	1	-0/+14
\| \| \| \|	llvm-svn: 264943
*	Cloning: Reduce complexity of debug info cloning and fix correctness issue.	Peter Collingbourne	2016-03-30	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit r260791 contained an error in that it would introduce a cross-module reference in the old module. It also introduced O(N^2) complexity in the module cloner by requiring the entire module to be visited for each function. Fix both of these problems by avoiding use of the CloneDebugInfoMetadata function (which is only designed to do intra-module cloning) and cloning function-attached metadata in the same way that we clone all other metadata. Differential Revision: http://reviews.llvm.org/D18583 llvm-svn: 264935
*	Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵	Aaron Ballman	2016-03-30	2	-5/+5
\| \| \| \| \| \|	"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
*	[IndVarSimplify] Don't insert after a catchswitch	David Majnemer	2016-03-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Widening a PHI requires us to insert a trunc. The logical place for this trunc is in the same BB as the PHI. This is not possible if the BB is terminated by a catchswitch. This fixes PR27133. llvm-svn: 264926
*	[PassManager] Make PassManagerBuilder::addExtension take an std::function, ↵	Justin Lebar	2016-03-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than a function pointer. Summary: This gives callers flexibility to pass lambdas with captures, which lets callers avoid the C-style void*-ptr closure style. (Currently, callers in clang store state in the PassManagerBuilderBase arg.) No functional change, and the new API is backwards-compatible. Reviewers: chandlerc Subscribers: joker.eph, cfe-commits Differential Revision: http://reviews.llvm.org/D18613 llvm-svn: 264918
*	[LoopVectorize] Don't vectorize loops when everything will be scalarized	Hal Finkel	2016-03-30	1	-18/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904