bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Speculatively revert r258620 as it is the likely culprid of PR26293.	Quentin Colombet	2016-01-25	1	-11/+78
\| \| \| \|	llvm-svn: 258703
*	[LIR] Add support for structs and hand unrolled loops	Haicheng Wu	2016-01-23	1	-78/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258620
*	Revert "[SLP] Truncate expressions to minimum required bit width"	Matthew Simpson	2016-01-21	1	-143/+11
\| \| \| \| \| \|	This reverts commit r258404. llvm-svn: 258408
*	[SLP] Truncate expressions to minimum required bit width	Matthew Simpson	2016-01-21	1	-11/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change attempts to produce vectorized integer expressions in bit widths that are narrower than their scalar counterparts. The need for demotion arises especially on architectures in which the small integer types (e.g., i8 and i16) are not legal for scalar operations but can still be used in vectors. Like similar work done within the loop vectorizer, we rely on InstCombine to perform the actual type-shrinking. We use the DemandedBits analysis and ComputeNumSignBits from ValueTracking to determine the minimum required bit width of an expression. Differential revision: http://reviews.llvm.org/D15815 llvm-svn: 258404
*	Reapply r257800 with fix	Matthew Simpson	2016-01-15	1	-42/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fix uniques the bundle of getelementptr indices we are about to vectorize since it's possible for the same index to be used by multiple instructions. The original commit message is below. [SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. llvm-svn: 257918
*	Revert "[SLP] Vectorize the index computations of getelementptr instructions."	Matthew Simpson	2016-01-15	1	-217/+41
\| \| \| \| \| \|	This reverts commit r257800. llvm-svn: 257888
*	[SLP] Vectorize the index computations of getelementptr instructions.	Matthew Simpson	2016-01-14	1	-41/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. Differential Revision: http://reviews.llvm.org/D14829 llvm-svn: 257800
*	Remove extra whitespace. NFC.	Junmo Park	2016-01-13	1	-10/+10
\| \| \| \|	llvm-svn: 257578
*	rangify; NFCI	Sanjay Patel	2016-01-12	1	-12/+10
\| \| \| \|	llvm-svn: 257500
*	function names start with a lower case letter ; NFC	Sanjay Patel	2016-01-12	1	-1/+1
\| \| \| \|	llvm-svn: 257496
*	[LV] Avoid creating empty reduction entries (NFC)	Matthew Simpson	2016-01-06	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	This patch prevents us from unintentionally creating entries in the reductions map for PHIs that are not actually reductions. This is currently not an issue since we bail out if we encounter PHIs other than inductions or reductions. However the behavior could become problematic as we add support for additional recurrence types. llvm-svn: 256930
*	[SCEV] Add and use SCEVConstant::getAPInt; NFCI	Sanjoy Das	2015-12-17	1	-2/+2
\| \| \| \|	llvm-svn: 255921
*	[SLPVectorizer] Ensure dominated reduction values.	Charlie Turner	2015-12-16	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When considering incoming values as part of a reduction phi, ensure the incoming value is dominated by said phi. Failing to ensure this property causes miscompiles. Fixes PR25787. Many thanks to Mattias Eriksson for reporting, reducing and analyzing the problem for me. Differential Revision: http://reviews.llvm.org/D15580 llvm-svn: 255792
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-15	1	-31/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691
*	Revert r255460, which still causes test failures on some platforms.	Cong Hou	2015-12-13	1	-106/+31
\| \| \| \| \| \|	Further investigation on the failures is ongoing. llvm-svn: 255463
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	1	-31/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460
*	Revert r255454 as it leads to several test failers on buildbots.	Cong Hou	2015-12-13	1	-106/+31
\| \| \| \|	llvm-svn: 255456
*	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵	Cong Hou	2015-12-13	1	-31/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454
*	AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAA	Hal Finkel	2015-12-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	GlobalsAA's assumptions that passes do not escape globals not previously escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline. http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html Patch by Vaivaswatha Nagaraj! llvm-svn: 255348
*	Re-commit r255115, with the PredicatedScalarEvolution class moved to	Silviu Baranga	2015-12-09	1	-85/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform and Analysis modules: [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255122
*	Revert r255115 until we figure out how to fix the bot failures.	Silviu Baranga	2015-12-09	1	-80/+85
\| \| \| \|	llvm-svn: 255117
*	[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV ↵	Silviu Baranga	2015-12-09	1	-85/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255115
*	Fix a typo in LoopVectorize.cpp. NFC.	Cong Hou	2015-12-05	1	-1/+1
\| \| \| \|	llvm-svn: 254813
*	Fix a typo in LoopVectorize.cpp. NFC.	Cong Hou	2015-12-02	1	-1/+1
\| \| \| \|	llvm-svn: 254549
*	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs.	Charlie Turner	2015-11-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179
*	[LV] Add a helper function, isReductionVariable. NFC.	Chad Rosier	2015-11-19	1	-5/+7
\| \| \| \|	llvm-svn: 253565
*	Fix several long lines (>80) in LoopVectorize.cpp. NFC.	Cong Hou	2015-11-19	1	-13/+19
\| \| \| \|	llvm-svn: 253527
*	Typo.	Chad Rosier	2015-11-17	1	-1/+1
\| \| \| \|	llvm-svn: 253336
*	[SLP] Enable -slp-vectorize-hor by default.	Charlie Turner	2015-11-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Measurements primarily on AArch64 have shown this feature does not significantly effect compile-time. The are no significant perf changes in LNT, but for AArch64 at least, there are wins in third party benchmarks. As discussed on llvm-dev, we're going to try turning this on by default and see how other targets react to the change. llvm-svn: 252733
*	[LoopVectorize] Address post-commit feedback on r250032	James Molloy	2015-11-09	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I]. llvm-svn: 252469
*	Fix SLPVectorizer commutativity reordering	Mehdi Amini	2015-11-06	1	-76/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SLPVectorizer had a very crude way of trying to benefit from associativity: it tried to optimize for splat/broadcast or in order to have the same operator on the same side. This is benefitial to the cost model and allows more vectorization to occur. This patch improve the logic and make the detection optimal (locally, we don't look at the full tree but only at the immediate children). Should fix https://llvm.org/bugs/show_bug.cgi?id=25247 Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D13996 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 252337
*	LoopVectorizer - skip 'bitcast' between GEP and load.	Elena Demikhovsky	2015-11-03	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \|	Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double, double* %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907
*	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵	Cong Hou	2015-11-02	1	-28/+102
\| \| \| \| \| \| \| \| \| \| \| \|	larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 llvm-svn: 251850
*	[SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning	Silviu Baranga	2015-11-02	1	-97/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SCEV Predicates represent conditions that typically cannot be derived from static analysis, but can be used to reduce SCEV expressions to forms which are usable for different optimizers. ScalarEvolution now has the rewriteUsingPredicate method which can simplify a SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions need to be made a new SCEV Predicate would be created and added to the set. Each time after calling getSCEV, the user will call the rewriteUsingPredicate method. We add two types of predicates SCEVPredicateSet - implements a set of predicates SCEVEqualPredicate - tests for equality between two SCEV expressions We use the SCEVEqualPredicate to re-implement stride versioning. Every time we version a stride, we will add a SCEVEqualPredicate to the context. Instead of adding specific stride checks, LoopVectorize now adds a more generic SCEV check. We only need to add support for this in the LoopVectorizer since this is the only pass that will do stride versioning. Reviewers: mzolotukhin, anemet, hfinkel, sanjoy Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D13595 llvm-svn: 251800
*	Revert the revision 251592 as it fails a test on some platforms.	Cong Hou	2015-10-29	1	-93/+28
\| \| \| \|	llvm-svn: 251617
*	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵	Cong Hou	2015-10-29	1	-28/+93
\| \| \| \| \| \| \| \|	larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. llvm-svn: 251592
*	Whitespace.	NAKAMURA Takumi	2015-10-27	1	-1/+1
\| \| \| \|	llvm-svn: 251437
*	Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"	NAKAMURA Takumi	2015-10-27	1	-16/+3
\| \| \| \| \| \| \|	It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. llvm-svn: 251436
*	[SLP] Be more aggressive about reduction width selection.	Charlie Turner	2015-10-27	1	-12/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach. It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize. I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT. The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something! Reviewers: nadav, jmolloy Subscribers: mssimpso, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D14116 llvm-svn: 251428
*	[SLP] Try a bit harder to find reduction PHIs	Charlie Turner	2015-10-27	1	-5/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads. There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT. Reviewers: jmolloy, mcrosier, nadav Subscribers: mssimpso, nadav, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14063 llvm-svn: 251425
*	[SLP] Treat SelectInsts as reduction values.	Charlie Turner	2015-10-27	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values. The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here. Reviewers: jmolloy, mzolotukhin, spatel, nadav Subscribers: nadav, mssimpso, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D13949 llvm-svn: 251424
*	Loop Vectorizer - skipping "bitcast" before GEP	Elena Demikhovsky	2015-10-26	1	-3/+16
\| \| \| \| \| \| \| \| \| \|	Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit. In some cases we have a "bitcast" between GEP and memory instruction. I added code that skips the "bitcast". http://reviews.llvm.org/D13886 llvm-svn: 251291
*	Refactor: Simplify boolean conditional return statements in ↵	Michael Zolotukhin	2015-10-24	2	-11/+5
\| \| \| \| \| \| \| \| \| \| \| \|	lib/Transforms/Vectorize (NFC). Summary: Use clang-tidy to simplify boolean conditional return statements Differential Revision: http://reviews.llvm.org/D10003 Patch by Richard<legalize@xmission.com> llvm-svn: 251206
*	SLPVectorizer: AllSameOpcode* starts "true" only for instructions	Mehdi Amini	2015-10-23	1	-3/+4
\| \| \| \| \| \| \|	r251085 wasn't as NFC as intended... From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 251087
*	SLPVectorizer: refactor reorderInputsAccordingToOpcode (NFC)	Mehdi Amini	2015-10-23	1	-52/+81
\| \| \| \| \| \| \|	This is intended to simplify the changes needed to solve PR25247. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 251085
*	Vectorize: Remove implicit ilist iterator conversions, NFC	Duncan P. N. Exon Smith	2015-10-19	3	-85/+91
\| \| \| \| \| \| \| \| \| \| \| \| \|	Besides the usual, I finally added an overload to `BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead of `BasicBlock::iterator`. Someone can go back and remove this overload later (after updating the callers I'm going to skip going forward), but the most common call seems to be `BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's better to add `->getIterator()` to every one than have the overload. It's pretty hard to get the usage wrong. llvm-svn: 250745
*	Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().	Elena Demikhovsky	2015-10-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 llvm-svn: 250686
*	[LoopVectorize] Shrink integer operations into the smallest type possible	James Molloy	2015-10-12	1	-11/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed on them. For targets with native i8 or i16 operations, usually InstCombine can shrink the arithmetic type down again. However InstCombine refuses to create illegal types, so for targets without i8 or i16 registers, the lengthening and shrinking remains. Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when their scalar equivalents do not, so during vectorization it is important to remove these lengthens and truncates when deciding the profitability of vectorization. The algorithm this uses starts at truncs and icmps, trawling their use-def chains until they terminate or instructions outside the loop are found (or unsafe instructions like inttoptr casts are found). If the use-def chains starting from different root instructions (truncs/icmps) meet, they are unioned. The demanded bits of each node in the graph are ORed together to form an overall mask of the demanded bits in the entire graph. The minimum bitwidth that graph can be truncated to is the bitwidth minus the number of leading zeroes in the overall mask. The intention is that this algorithm should "first do no harm", so it will never insert extra cast instructions. This is why the use-def graphs are unioned, so that subgraphs with different minimum bitwidths do not need casts inserted between them. This algorithm works hard to reduce compile time impact. DemandedBits are only queried if there are extends of illegal types and if a truncate to an illegal type is seen. In the general case, this results in a simple linear scan of the instructions in the loop. No non-noise compile time impact was seen on a clang bootstrap build. llvm-svn: 250032
*	inariant.group handling in GVN	Piotr Padlewski	2015-10-02	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The most important part required to make clang devirtualization works ( ͡°͜ʖ ͡°). The code is able to find non local dependencies, but unfortunatelly because the caller can only handle local dependencies, I had to add some restrictions to look for dependencies only in the same BB. http://reviews.llvm.org/D12992 llvm-svn: 249196
*	[SLP] Don't vectorize loads of non-packed types (like i1, i2).	Michael Zolotukhin	2015-09-30	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943