bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DAGCombine] Don't assume integer-type legailty in ↵	Hal Finkel	2015-02-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reduceBuildVecConvertToConvertBuildVec DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from [US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a vector operation. We check operation legality of the conversion, but fail to check legality of the integer vector type itself. Because targets don't normally override operation legality defaults for illegal types, we need to check this also. This came up in the context of the QPX vector entensions for PowerPC (which can have legal floating-point vector types without corresponding legal integer vector types). No in-tree test case for this yes, but one can be added once the QPX support has been committed. llvm-svn: 230176
*	Add generic fmad DAG node.	Matt Arsenault	2015-02-20	1	-95/+159
\| \| \| \| \| \| \| \| \| \| \|	This allows sharing of FMA forming combines to work with instructions that have the same semantics as a separate multiply and add. This is expand by default, and only formed post legalization so it shouldn't have much impact on targets that do not want it. llvm-svn: 230070
*	[CodeGen] Use ArrayRef instead of std::vector&. NFC.	Ahmed Bougacha	2015-02-19	1	-1/+1
\| \| \| \| \| \|	The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925
*	[x86,sdag] Two interrelated changes to the x86 and sdag code.	Chandler Carruth	2015-02-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835
*	Canonicalize splats as build_vectors (PR22283)	Sanjay Patel	2015-02-17	1	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 llvm-svn: 229511
*	Prefer SmallVector::append/insert over push_back loops.	Benjamin Kramer	2015-02-17	1	-6/+2
\| \| \| \| \| \|	Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500
*	SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too	Mehdi Amini	2015-02-16	1	-2/+46
\| \| \| \| \| \| \|	Update SPARC tests to match. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229438
*	[x86] Fix PR22377, a regression with the new vector shuffle legality	Chandler Carruth	2015-02-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	test. This was just a matter of the DAG combine for vector shuffles being too aggressive. This is a bit of a grey area, but I think generally if we can re-use intermediate shuffles, we should. Certainly, given the test cases I have available, this seems like the right call. llvm-svn: 229285
*	CodeGen: Canonicalize access to function attributes, NFC	Duncan P. N. Exon Smith	2015-02-14	1	-12/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) Also, add `Function::getFnStackAlignment()`, and canonicalize: getAttributes().getStackAlignment(AttributeSet::FunctionIndex) => getFnStackAlignment() llvm-svn: 229208
*	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line ↵	Benjamin Kramer	2015-02-12	1	-1/+1
\| \| \| \| \| \| \| \|	with countTrailingZeros Update all callers. llvm-svn: 228930
*	[CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).	Ahmed Bougacha	2015-02-12	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \|	We used to do this DAG combine, but it's not always correct: If the first fp_round isn't a value preserving truncation, it might introduce a tie in the second fp_round, that wouldn't occur in the single-step fp_round we want to fold to. In other words, double rounding isn't the same as rounding. Differential Revision: http://reviews.llvm.org/D7571 llvm-svn: 228911
*	Fix SelectionDAG compile time issue with alias analysis.	Jonas Paulsson	2015-02-11	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \|	Add new token factor node and its users to worklist if alias analysis is turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause a lot of new token factors to be inserted into the DAG, and they need to be optimized to avoid significant slow-downs. Reviewed by Hal Finkel. llvm-svn: 228841
*	Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp.	Jonas Paulsson	2015-02-10	1	-2/+2
\| \| \| \|	llvm-svn: 228700
*	[x86] Fix PR22524: the DAG combiner was incorrectly handling illegal	Chandler Carruth	2015-02-10	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nodes when folding bitcasts of constants. We can't fold things and then check after-the-fact whether it was legal. Once we have formed the DAG node, arbitrary other nodes may have been collapsed to it. There is no easy way to go back. Instead, we need to test for the specific folding cases we're interested in and ensure those are legal first. This could in theory make this less powerful for bitcasting from an integer to some vector type, but AFAICT, that can't actually happen in the SDAG so its fine. Now, we only whitelist specific int->fp and fp->int bitcasts for post-legalization folding. I've added the test case from the PR. (Also as a note, this does not appear to be in 3.6, no backport needed) llvm-svn: 228656
*	[CodeGen] Add hook/combine to form vector extloads, enabled on X86.	Ahmed Bougacha	2015-02-05	1	-12/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325
*	Revert r227242 - Merge vector stores into wider vector stores (PR21711).	Quentin Colombet	2015-01-27	1	-54/+30
\| \| \| \| \| \| \|	This commit creates infinite loop in DAG combine for in the LLVM test-suite for aarch64 with mcpu=cylcone (just having neon may be enough to expose this). llvm-svn: 227272
*	Merge vector stores into wider vector stores (PR21711)	Sanjay Patel	2015-01-27	1	-30/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). The 'f3' test case in that report presents a situation where we have two 128-bit stores extracted from a 256-bit source vector. Instead of producing this: vmovaps %xmm0, (%rdi) vextractf128 $1, %ymm0, 16(%rdi) This patch merges the 128-bit stores into a single 256-bit store: vmovups %ymm0, (%rdi) Differential Revision: http://reviews.llvm.org/D7208 llvm-svn: 227242
*	merge consecutive stores of extracted vector elements (PR21711)	Sanjay Patel	2015-01-22	1	-92/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 llvm-svn: 226845
*	[DAGCombine] Produce better code for constant splats	Michael Kuperstein	2015-01-22	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \|	This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 Fixed recommit of r226811. llvm-svn: 226816
*	Revert r226811, MSVC accepts code sane compilers don't.	Michael Kuperstein	2015-01-22	1	-19/+1
\| \| \| \|	llvm-svn: 226814
*	[DAGCombine] Produce better code for constant splats	Michael Kuperstein	2015-01-22	1	-1/+19
\| \| \| \| \| \| \| \| \|	This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 llvm-svn: 226811
*	Fixed a bug in type legalizer for masked load/store intrinsics.	Elena Demikhovsky	2015-01-22	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \|	The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808
*	Fixed a comment	Elena Demikhovsky	2015-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 226806
*	Fixed a bug in narrowing store operation.	Elena Demikhovsky	2015-01-22	1	-2/+5
\| \| \| \| \| \| \| \| \|	Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type, since the size of VT (1 bit) is not equal to its actual store size(8 bits). Added a test provided by David (dag@cray.com) llvm-svn: 226805
*	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))	Tim Northover	2015-01-21	1	-0/+11
\| \| \| \| \| \| \|	It can help with argument juggling on some targets, and is generally a good idea. llvm-svn: 226740
*	Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"	Tim Northover	2015-01-21	1	-11/+0
\| \| \| \| \| \| \| \|	It hadn't gone through review yet, but was still on my local copy. This reverts commit r226663 llvm-svn: 226665
*	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))	Tim Northover	2015-01-21	1	-0/+11
\| \| \| \|	llvm-svn: 226663
*	Improve DAG combine pass on certain IR vector patterns	Mehdi Amini	2015-01-17	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 226360
*	[cleanup] Re-sort all the #include lines in LLVM using	Chandler Carruth	2015-01-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974
*	DAG Combiner: Fold SelectCC When Cond is UNDEF	Mehdi Amini	2015-01-14	1	-4/+7
\| \| \| \| \| \| \| \| \|	In case folding a node end up with a NaN as operand for the select, the folding of the condition of the selectcc node returns "UNDEF". Differential Revision: http://reviews.llvm.org/D6889 llvm-svn: 225952
*	DAGCombiner: simplify by using condition variables; NFC	Matthias Braun	2015-01-13	1	-16/+12
\| \| \| \|	llvm-svn: 225836
*	R600: Implement getRecipEstimate	Matt Arsenault	2015-01-13	1	-1/+2
\| \| \| \| \| \| \| \| \|	This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. llvm-svn: 225828
*	Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now ↵	Olivier Sallenave	2015-01-13	1	-63/+70
\| \| \| \| \| \|	guarded with that hook. llvm-svn: 225795
*	Combine fcmp + select to fminnum / fmaxnum if no nans and legal	Matt Arsenault	2015-01-13	1	-0/+59
\| \| \| \| \| \| \|	Also require unsafe FP math for no since there isn't a way to test for signed zeros. llvm-svn: 225744
*	[DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-10/+24
\| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. llvm-svn: 225492
*	Partial fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-96/+95
\| \| \| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. llvm-svn: 225485
*	[SelectionDAG] Allow targets to specify legality of extloads' result	Ahmed Bougacha	2015-01-08	1	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
*	More FMA folding opportunities.	Olivier Sallenave	2015-01-07	1	-1/+133
\| \| \| \|	llvm-svn: 225380
*	Test commit	Olivier Sallenave	2015-01-07	1	-0/+1
\| \| \| \|	llvm-svn: 225368
*	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition ↵	Craig Topper	2015-01-05	1	-1/+1
\| \| \| \| \| \|	into the assert. llvm-svn: 225160
*	Revert "merge consecutive stores of extracted vector elements"	Alexey Samsonov	2014-12-31	1	-75/+4
\| \| \| \| \| \| \|	This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
*	Always assert in DAGCombine and not only when -debug is enabled	Mehdi Amini	2014-12-23	1	-5/+6
\| \| \| \| \| \| \| \| \|	Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. llvm-svn: 224779
*	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of ↵	Michael Kuperstein	2014-12-23	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759
*	merge consecutive stores of extracted vector elements	Sanjay Patel	2014-12-19	1	-4/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611
*	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.	Michael Kuperstein	2014-12-17	1	-11/+22
\| \| \| \| \| \| \| \| \| \|	This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429
*	Add target hook for whether it is profitable to reduce load widths	Matt Arsenault	2014-12-12	1	-0/+3
\| \| \| \| \| \| \| \|	Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
*	Fix a few instances found in SelectionDAG where we were not handling F16 at ↵	Owen Anderson	2014-12-09	1	-2/+0
\| \| \| \| \| \|	parity with F32 and F64. llvm-svn: 223760
*	[InstCombine] Minor optimization for bswap with binary ops	Simon Pilgrim	2014-12-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349
*	Masked Load / Store Intrinsics - the CodeGen part.	Elena Demikhovsky	2014-12-04	1	-0/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
*	Revert "Masked Vector Load and Store Intrinsics."	Duncan P. N. Exon Smith	2014-11-28	1	-161/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936