bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-10/+24
\| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. llvm-svn: 225492
*	Partial fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-96/+95
\| \| \| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. llvm-svn: 225485
*	Masked Load/Store - fixed a bug in type legalization.	Elena Demikhovsky	2015-01-08	3	-3/+107
\| \| \| \|	llvm-svn: 225441
*	[SelectionDAG] Allow targets to specify legality of extloads' result	Ahmed Bougacha	2015-01-08	3	-31/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
*	More FMA folding opportunities.	Olivier Sallenave	2015-01-07	1	-1/+133
\| \| \| \|	llvm-svn: 225380
*	Test commit	Olivier Sallenave	2015-01-07	1	-0/+1
\| \| \| \|	llvm-svn: 225368
*	Introduce an example statepoint GC strategy	Philip Reames	2015-01-07	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365
*	SelectionDAGBuilder: move constant initialization out of loop	Mehdi Amini	2015-01-06	1	-15/+19
\| \| \| \| \| \| \| \| \| \|	No semantic change intended. Reviewers: resistor Differential Revision: http://reviews.llvm.org/D6834 llvm-svn: 225278
*	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition ↵	Craig Topper	2015-01-05	1	-1/+1
\| \| \| \| \| \|	into the assert. llvm-svn: 225160
*	Revert "merge consecutive stores of extracted vector elements"	Alexey Samsonov	2014-12-31	1	-75/+4
\| \| \| \| \| \| \|	This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
*	Masked Load/Store - Changed the order of parameters in intrinsics.	Elena Demikhovsky	2014-12-25	1	-5/+7
\| \| \| \| \| \| \|	No functional changes. The documentation is coming. llvm-svn: 224829
*	Always assert in DAGCombine and not only when -debug is enabled	Mehdi Amini	2014-12-23	1	-5/+6
\| \| \| \| \| \| \| \| \|	Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. llvm-svn: 224779
*	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of ↵	Michael Kuperstein	2014-12-23	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759
*	Enable (sext x) == C --> x == (trunc C) combine	Matt Arsenault	2014-12-21	1	-9/+26
\| \| \| \| \| \| \| \| \|	Extend the existing code which handles this for zext. This makes this more useful for targets with ZeroOrNegativeOne BooleanContent and obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne) since the constant will now be shrunk to i1. llvm-svn: 224691
*	merge consecutive stores of extracted vector elements	Sanjay Patel	2014-12-19	1	-4/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611
*	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.	Michael Kuperstein	2014-12-17	1	-11/+22
\| \| \| \| \| \| \| \| \| \|	This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429
*	SelectionDAG switch lowering: use 'unsigned' to count destination popularity	Hans Wennborg	2014-12-16	1	-2/+2
\| \| \| \| \| \| \| \| \|	SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases seems unnecessary. Also fix a missing CHECK in the test case. llvm-svn: 224393
*	merge consecutive loads that are offset from a base address	Sanjay Patel	2014-12-16	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads when the first load was offset from a base address. This patch recognizes that pattern and subtracts the offset before comparing the second load to see if it is consecutive. The codegen change in the new test case improves from: vmovsd 32(%rdi), %xmm0 vmovsd 48(%rdi), %xmm1 vmovhpd 56(%rdi), %xmm1, %xmm1 vmovhpd 40(%rdi), %xmm0, %xmm0 vinsertf128 $1, %xmm1, %ymm0, %ymm0 To: vmovups 32(%rdi), %ymm0 An existing test case is also improved from: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovsd 24(%rdi), %xmm2 vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0] vmovhpd 8(%rdi), %xmm1, %xmm3 To: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovhpd 24(%rdi), %xmm0, %xmm0 vmovhpd 8(%rdi), %xmm1, %xmm1 This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ). Differential Revision: http://reviews.llvm.org/D6642 llvm-svn: 224379
*	Silence more static analyzer warnings.	Michael Ilseman	2014-12-15	1	-1/+2
\| \| \| \| \| \| \| \|	Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255
*	Add target hook for whether it is profitable to reduce load widths	Matt Arsenault	2014-12-12	1	-0/+3
\| \| \| \| \| \| \| \|	Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
*	IR: Split Metadata from Value	Duncan P. N. Exon Smith	2014-12-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802
*	Fix a few instances found in SelectionDAG where we were not handling F16 at ↵	Owen Anderson	2014-12-09	2	-3/+5
\| \| \| \| \| \|	parity with F32 and F64. llvm-svn: 223760
*	InstrProf: An intrinsic and lowering for instrumentation based profiling	Justin Bogner	2014-12-08	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672
*	SelectionDAG switch lowering: Replace unreachable default with most popular ↵	Hans Wennborg	2014-12-06	1	-17/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. SimplifyCFG currently does this transformation, but I'm working towards changing that so we can optimize harder based on unreachable defaults. Differential Revision: http://reviews.llvm.org/D6510 llvm-svn: 223566
*	[InstCombine] Minor optimization for bswap with binary ops	Simon Pilgrim	2014-12-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349
*	Masked Load / Store Intrinsics - the CodeGen part.	Elena Demikhovsky	2014-12-04	8	-0/+429
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
*	[PowerPC] Implement readcyclecounter for PPC32	Hal Finkel	2014-12-02	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161
*	Restructure some assertion checking based on post commit feedback by Aaron ↵	Philip Reames	2014-12-02	1	-7/+7
\| \| \| \| \| \|	and Tom. llvm-svn: 223150
*	Appease a build bot complaining about an unused variable that's used in an ↵	Philip Reames	2014-12-02	1	-0/+1
\| \| \| \| \| \|	assertion. llvm-svn: 223142
*	[Statepoints 3/4] Statepoint infrastructure for garbage collection: ↵	Philip Reames	2014-12-02	6	-0/+814
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137
*	Revert r223049, r223050 and r223051 while investigating test failures.	Hans Wennborg	2014-12-01	1	-42/+17
\| \| \| \| \| \|	I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054
*	SelectionDAG switch lowering: Replace unreachable default with most popular ↵	Hans Wennborg	2014-12-01	1	-17/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. llvm-svn: 223049
*	[stack protector] Set edge weights for newly created basic blocks.	Akira Hatanaka	2014-12-01	2	-4/+7
\| \| \| \| \| \| \| \| \|	This commit fixes a bug in stack protector pass where edge weights were not set when new basic blocks were added to lists of successor basic blocks. Differential Revision: http://reviews.llvm.org/D5766 llvm-svn: 222987
*	Switch lowering: reformat some for loops etc. NFC	Hans Wennborg	2014-11-29	1	-7/+5
\| \| \| \|	llvm-svn: 222962
*	Switch lowering: Fix broken 'Figure out which block is next' code	Hans Wennborg	2014-11-29	1	-0/+3
\| \| \| \| \| \| \|	This doesn't seem to have worked in a long time, but other optimizations would clean it up. llvm-svn: 222961
*	Revert "Masked Vector Load and Store Intrinsics."	Duncan P. N. Exon Smith	2014-11-28	8	-430/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
*	Converted back to Unix format (after my last commit 222632)	Elena Demikhovsky	2014-11-23	1	-3241/+3241
\| \| \| \|	llvm-svn: 222636
*	Masked Vector Load and Store Intrinsics.	Elena Demikhovsky	2014-11-23	8	-3127/+3557
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
*	Don't repeat class/function/variable names in comments. NFC.	Sanjay Patel	2014-11-21	1	-47/+35
\| \| \| \|	llvm-svn: 222555
*	Less space; NFC	Sanjay Patel	2014-11-21	1	-8/+4
\| \| \| \|	llvm-svn: 222546
*	[DAG] Teach how to turn a build_vector into a shuffle if some of the ↵	Andrea Di Biagio	2014-11-21	1	-11/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536
*	[DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.	Andrea Di Biagio	2014-11-21	1	-153/+73
\| \| \| \| \| \| \| \|	This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522
*	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same ↵	Hao Liu	2014-11-21	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510
*	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2	Simon Pilgrim	2014-11-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340
*	Update SetVector to rely on the underlying set's insert to return a ↵	David Blaikie	2014-11-19	10	-19/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
*	Fix an incorrect chain operand when expanding INSERT_VECTOR operations ↵	Owen Anderson	2014-11-18	1	-1/+1
\| \| \| \| \| \| \| \|	through the stack. Patch by Daniil Troshkov! llvm-svn: 222254
*	Fix optimisations of SELECT_CC which assumed result is boolean	Oliver Stannard	2014-11-17	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123
*	Move register class name strings to a single array in MCRegisterInfo to ↵	Craig Topper	2014-11-17	1	-2/+2
\| \| \| \| \| \| \| \|	reduce static table size and number of relocation entries. Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table. llvm-svn: 222118
*	Replace a couple asserts with static_asserts.	Craig Topper	2014-11-17	1	-2/+2
\| \| \| \|	llvm-svn: 222114
*	Convert some EVTs to MVTs where only a SimpleValueType is needed.	Craig Topper	2014-11-16	3	-12/+12
\| \| \| \|	llvm-svn: 222109