bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Allow vectorization of few missed llvm intrinsic calls in BBVectorizor by ↵	Karthik Bhat	2014-04-24	1	-1/+218
\| \| \| \| \| \|	handling them in isVectorizableIntrinsic function. llvm-svn: 207085
*	[InstCombine][x86] Constant fold psll intrinsics.	Michael J. Spencer	2014-04-24	1	-0/+110
\| \| \| \| \| \| \| \| \| \| \| \|	This excludes avx512 as I don't have hardware to verify. It excludes _dq variants because they are represented in the IR as <{2,4} x i64> when it's actually a byte shift of the entire i{128,265}. This also excludes _dq_bs as they aren't at all supported by the backend. There are also no corresponding instructions in the ISA. I have no idea why they exist... llvm-svn: 207058
*	Optimize some special cases for SSE4a insertqi	Filipe Cabecinhas	2014-04-24	1	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Since the upper 64 bits of the destination register are undefined when performing this operation, we can substitute it and let the optimizer figure out that only a copy is needed. Also added range merging, if an instruction copies a range that can be merged with a previous copied range. Added test cases for both optimizations. Reviewers: grosbach, nadav CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3357 llvm-svn: 207055
*	Handle addrspacecast when looking at memcpys from globals	Matt Arsenault	2014-04-24	1	-4/+63
\| \| \| \|	llvm-svn: 207054
*	Convert test to FileCheck	Matt Arsenault	2014-04-23	1	-1/+5
\| \| \| \|	llvm-svn: 207015
*	[LV] Statistics numbers for LoopVectorize introduced: a number of analyzed ↵	Alexander Musman	2014-04-23	1	-0/+66
\| \| \| \| \| \| \| \| \| \| \|	loops & a number of vectorized loops. Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized. Patch by Zinovy Nis Differential Revision: http://reviews.llvm.org/D3438 llvm-svn: 206956
*	[Constant Hoisting] Materialize the constant before the cloned cast instruction.	Juergen Ributzka	2014-04-22	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \|	In the case where the constant comes from a cloned cast instruction, the materialization code has to go before the cloned cast instruction. This commit fixes the method that finds the materialization insertion point by making it aware of this case. This fixes <rdar://problem/15532441> llvm-svn: 206913
*	Simplify a vpermil* with constant mask.	Rafael Espindola	2014-04-21	1	-0/+30
\| \| \| \| \| \| \| \|	With a constant mask a vpermil* is just a shufflevector. This patch implements that simplification. This allows us to produce denser code. It should also allow more folding down the line. llvm-svn: 206801
*	Fix PR7272 in -tailcallelim instead of the inliner	Reid Kleckner	2014-04-21	2	-4/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The -tailcallelim pass should be checking if byval or inalloca args can be captured before marking calls as tail calls. This was the real root cause of PR7272. With a better fix in place, revert the inliner change from r105255. The test case it introduced still passes and has been moved to test/Transforms/Inline/byval-tail-call.ll. Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D3403 llvm-svn: 206789
*	Add missing config file for newly added test case introduced by r206563.	Jiangning Liu	2014-04-18	1	-0/+6
\| \| \| \|	llvm-svn: 206567
*	This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.	Jiangning Liu	2014-04-18	2	-0/+84
\| \| \| \| \| \| \| \|	A new test case is also added for ARM64. Patched by Z.Zheng llvm-svn: 206563
*	Fix bug 19437 - Only add discriminators for DWARF 4 and above.	Diego Novillo	2014-04-17	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This prevents the discriminator generation pass from triggering if the DWARF version being used in the module is prior to 4. Reviewers: echristo, dblaikie CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3413 llvm-svn: 206507
*	Reverse 206485.	Gerolf Hoflehner	2014-04-17	2	-27/+1
\| \| \| \| \| \| \| \| \|	After some discussions the preferred semantics of the always_inline attribute is inline always when the compiler can determine that it it safe to do so. llvm-svn: 206487
*	Atomics: promote ARM's IR-based atomics pass to CodeGen.	Tim Northover	2014-04-17	3	-0/+546
\| \| \| \| \| \| \| \| \| \| \| \|	Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485
*	Inline a function when the always_inline attribute	Gerolf Hoflehner	2014-04-17	2	-1/+27
\| \| \| \| \| \| \| \| \| \|	is set even when it contains a indirect branch. The attribute overrules correctness concerns like the escape of a local block address. This is for rdar://16501761 llvm-svn: 206429
*	Add lifetime markers for allocas created to hold byval arguments, make them	Julien Lerouge	2014-04-15	2	-1/+28
\| \| \| \| \| \|	appear in the InlineFunctionInfo. llvm-svn: 206308
*	vect.omp.persistence.ll REQUIRES asserts due to -debug-only.	NAKAMURA Takumi	2014-04-15	1	-0/+1
\| \| \| \|	llvm-svn: 206271
*	D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata	Alexey Bataev	2014-04-15	1	-0/+87
\| \| \| \|	llvm-svn: 206266
*	Revert "Revert r206045, "Fix shift by constants for vector.""	Matt Arsenault	2014-04-14	2	-0/+145
\| \| \| \| \| \|	Fix cases where the Value itself is used, and not the constant value. llvm-svn: 206214
*	Whitespace.	NAKAMURA Takumi	2014-04-14	1	-8/+8
\| \| \| \|	llvm-svn: 206154
*	Revert r206045, "Fix shift by constants for vector."	NAKAMURA Takumi	2014-04-14	1	-72/+8
\| \| \| \| \| \|	It broke some builders, at least, i686. llvm-svn: 206153
*	[PowerPC] [Constant Hoisting] Enable constant hoisting on PPC	Hal Finkel	2014-04-13	3	-0/+93
\| \| \| \| \| \| \| \| \| \|	Implements the various TTI functions to enable constant hoisting on PPC. The only significant test-suite change is this: MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup (which essentially reverses the slowdown from r206120). llvm-svn: 206141
*	Recognize test for overflow in integer multiplication.	Serge Pavlov	2014-04-13	1	-0/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If multiplication involves zero-extended arguments and the result is compared as in the patterns: %mul32 = trunc i64 %mul64 to i32 %zext = zext i32 %mul32 to i64 %overflow = icmp ne i64 %mul64, %zext or %overflow = icmp ugt i64 %mul64 , 0xffffffff then the multiplication may be replaced by call to umul.with.overflow. This change fixes PR4917 and PR4918. Differential Revision: http://llvm-reviews.chandlerc.com/D2814 llvm-svn: 206137
*	[ARM64] Never hoist the shift value of a shift instruction.	Juergen Ributzka	2014-04-12	1	-0/+9
\| \| \| \| \| \| \|	There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. llvm-svn: 206101
*	[ARM64] Fix the cost model for cheap large constants.	Juergen Ributzka	2014-04-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally the cost model would give up for large constants and just return the maximum cost. This is not what we want for constant hoisting, because some of these constants are large in bitwidth, but are still cheap to materialize. This commit fixes the cost model to either return TCC_Free if the cost cannot be determined, or accurately calculate the cost even for large constants (bitwidth > 128). This fixes <rdar://problem/16591573>. llvm-svn: 206100
*	Add the ability to use GEPs for address sinking in CGP	Hal Finkel	2014-04-12	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092
*	Fix shift by constants for vector.	Matt Arsenault	2014-04-11	1	-8/+72
\| \| \| \| \| \|	ashr <N x iM>, <N x iM> M -> undef llvm-svn: 206045
*	Reapply "SLPVectorizer: Ignore users that are insertelements we can ↵	Arnold Schwaighofer	2014-04-10	1	-0/+24
\| \| \| \| \| \| \| \| \|	reschedule them" This commit reapplies 205018. After 205855 we should correctly vectorize intrinsics. llvm-svn: 205965
*	[ARM64] Fix immediate cost calculation for types larger than i64.	Juergen Ributzka	2014-04-10	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	The immediate cost calculation code was hitting an assertion in the included test case, because APInt was still internally 128-bits. Truncating it to 64-bits fixed the issue. Fixes <rdar://problem/16572521>. llvm-svn: 205947
*	SLPVectorizer: Only vectorize intrinsics whose operands are widened equally	Arnold Schwaighofer	2014-04-09	1	-0/+36
\| \| \| \| \| \| \| \| \|	The vectorizer only knows how to vectorize intrinics by widening all operands by the same factor. Patch by Tyler Nowicki! llvm-svn: 205855
*	[Constant Hoisting][ARM64] Enable constant hoisting for ARM64.	Juergen Ributzka	2014-04-08	2	-0/+26
\| \| \| \| \| \| \| \|	This implements the target-hooks for ARM64 to enable constant hoisting. This fixes <rdar://problem/14774662> and <rdar://problem/16381500>. llvm-svn: 205791
*	Handle vlas during inline cost computation if they'll be turned	Eric Christopher	2014-04-07	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \|	into a constant size alloca by inlining. Ran a run over the testsuite, no results out of the noise, fixes the testcase in the PR. PR19115. llvm-svn: 205710
*	Update the test to use FileCheck.	Juergen Ributzka	2014-04-04	1	-2/+8
\| \| \| \|	llvm-svn: 205647
*	ARM: yet another round of ARM test clean ups	Saleem Abdulrasool	2014-04-03	2	-2/+2
\| \| \| \|	llvm-svn: 205586
*	Fix PR19270 - type mismatch caused by invalid optimization.	Eli Bendersky	2014-04-03	1	-1/+16
\| \| \| \| \| \|	Patch by Jingyue Wu. llvm-svn: 205547
*	Add test case for [Constant Hoisting] Erase dead cast instructions (r204538).	Juergen Ributzka	2014-04-02	1	-0/+16
\| \| \| \|	llvm-svn: 205484
*	typo	Adrian Prantl	2014-04-02	1	-1/+1
\| \| \| \|	llvm-svn: 205473
*	Add comments and test case for [X86TTI] Make constant base pointers for ↵	Juergen Ributzka	2014-04-02	1	-0/+24
\| \| \| \| \| \|	GetElementPtr opaque (r204739). llvm-svn: 205468
*	Add test case for [Stackmaps][X86TTI] Fix think-o in getIntImmCost ↵	Juergen Ributzka	2014-04-02	1	-0/+17
\| \| \| \| \| \|	calculation (r204738). llvm-svn: 205464
*	SLPVectorizer: compare entire intrinsic for SLP compatibility.	Tim Northover	2014-04-02	2	-0/+21
\| \| \| \| \| \| \| \| \|	Some Intrinsics are overloaded to the extent that return type equality (all that's been checked up to now) does not guarantee that the arguments are the same. In these cases SLP vectorizer should not recurse into the operands, which can be achieved by comparing them as "Function *" rather than simply the ID. llvm-svn: 205424
*	[LoopVectorizer] Count dependencies of consecutive pointers as uniforms	Hal Finkel	2014-04-02	2	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the purpose of calculating the cost of the loop at various vectorization factors, we need to count dependencies of consecutive pointers as uniforms (which means that the VF = 1 cost is used for all overall VF values). For example, the TSVC benchmark function s173 has: ... %3 = add nsw i64 %indvars.iv, 16000 %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3 ... and we must realize that the add will be a scalar in order to correctly deduce it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all dependencies of a consecutive pointer must be a scalar (uniform), and so we simply need to add all consecutive pointers to the worklist that currently detects collects uniforms. Fixes PR19296. llvm-svn: 205387
*	Implement X86TTI::getUnrollingPreferences	Hal Finkel	2014-04-01	3	-10/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides an initial implementation of getUnrollingPreferences for x86. getUnrollingPreferences is used by the generic (concatenation) unroller, which is distinct from the unrolling done by the loop vectorizer. Many modern x86 cores have some kind of uop cache and loop-stream detector (LSD) used to efficiently dispatch small loops, and taking full advantage of this requires unrolling small loops (small here means 10s of uops). These caches also have limits on the number of taken branches in the loop, and so we also cap the loop unrolling factor based on the maximum "depth" of the loop. This is currently calculated with a partial DFS traversal (partial because it will stop early if the path length grows too much). This is still an approximation, and one that is both conservative (because it does not account for branches eliminated via block placement) and optimistic (because it is only recording the maximum depth over minimum paths). Nevertheless, because the loops that fit in these uop caches are so small, it is not clear how much the details matter. The original set of patches posted for review produced the following test-suite performance results (from the TSVC benchmark) at that time: ControlLoops-dbl - 13% speedup ControlLoops-flt - 15% speedup Reductions-dbl - 7.5% speedup llvm-svn: 205348
*	Move partial/runtime unrolling late in the pipeline	Hal Finkel	2014-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic (concatenation) loop unroller is currently placed early in the standard optimization pipeline. This is a good place to perform full unrolling, but not the right place to perform partial/runtime unrolling. However, most targets don't enable partial/runtime unrolling, so this never mattered. However, even some x86 cores benefit from partial/runtime unrolling of very small loops, and follow-up commits will enable this. First, we need to move partial/runtime unrolling late in the optimization pipeline (importantly, this is after SLP and loop vectorization, as vectorization can drastically change the size of a loop), while keeping the full unrolling where it is now. This change does just that. llvm-svn: 205264
*	Revert "SLPVectorizer: Ignore users that are insertelements we can ↵	Arnold Schwaighofer	2014-03-31	1	-24/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	reschedule them" This reverts commit r205018. Conflicts: lib/Transforms/Vectorize/SLPVectorizer.cpp test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll This is breaking libclc build. llvm-svn: 205260
*	[X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well	Adam Nemet	2014-03-31	1	-0/+40
\| \| \| \| \| \| \| \| \|	Pretty obvious follow-on to r205159 to also handle conversion from double besides float. Fixes <rdar://problem/16373208> llvm-svn: 205253
*	[X86] Adjust cost of FP_TO_UINT v8f32->v8i32	Adam Nemet	2014-03-30	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no direct AVX instruction to convert to unsigned. I have some ideas how we may be able to do this with three vector instructions but the current backend just bails on this to get it scalarized. See the comment why we need to adjust the cost returned by BasicTTI. The test is a bit roundabout (and checks assembly rather than bit code) because I'd like it to work even if at some point we could vectorize this conversion. Fixes <rdar://problem/16371920> llvm-svn: 205159
*	llvm/test/Transforms/LoopStrengthReduce/ARM64/lsr-*.ll: Add explicit triple ↵	NAKAMURA Takumi	2014-03-30	2	-2/+2
\| \| \| \| \| \|	arm64-unknown for targeting pecoff. llvm-svn: 205125
*	ARM64: initial backend import	Tim Northover	2014-03-29	11	-3/+478
\| \| \| \| \| \| \| \| \| \| \| \|	This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. llvm-svn: 205090
*	SLPVectorizer: Take credit for free extractelement instructions	Arnold Schwaighofer	2014-03-28	1	-0/+25
\| \| \| \| \| \| \| \| \|	Extract element instructions that will be removed when vectorzing lower the cost. Patch by Arch D. Robison! llvm-svn: 205020
*	SLPVectorizer: Ignore users that are insertelements we can reschedule them	Arnold Schwaighofer	2014-03-28	1	-0/+24
\| \| \| \| \| \|	Patch by Arch D. Robison! llvm-svn: 205018