bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[WebAssembly] Temporarily disable this test, as it depends on additional ↵	Dan Gohman	2015-09-08	1	-0/+3
\| \| \| \| \| \|	patches that aren't yet checked in. llvm-svn: 247011
*	AVX512: kunpck encoding implementation	Igor Breger	2015-09-08	1	-1/+26
\| \| \| \| \| \| \| \|	Added tests for encoding. Differential Revision: http://reviews.llvm.org/D12061 llvm-svn: 247010
*	[WebAssembly] Enable SSA lowering and other pre-regalloc passes	Dan Gohman	2015-09-08	1	-0/+22
\| \| \| \|	llvm-svn: 247008
*	[mips] Reserve address spaces 1-255 for software use.	Daniel Sanders	2015-09-08	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: And define them to have noop casts with address spaces 0-255. Reviewers: pekka.jaaskelainen Subscribers: pekka.jaaskelainen, llvm-commits Differential Revision: http://reviews.llvm.org/D12678 llvm-svn: 246990
*	AVX-512: Lowering for 512-bit vector shuffles.	Elena Demikhovsky	2015-09-08	3	-506/+367
\| \| \| \| \| \| \| \|	Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981
*	[X86][MMX] Added missing stack folding tests for MMX/SSSE3 instructions	Simon Pilgrim	2015-09-06	1	-2/+146
\| \| \| \|	llvm-svn: 246949
*	[X86][AVX512] Added 512-bit vector shift tests.	Simon Pilgrim	2015-09-06	3	-0/+894
\| \| \| \| \| \|	Only works for avx512f (dq) targets so far - need to add avx512bw tests once char/short shifts are supported. llvm-svn: 246943
*	[SelectionDAG] Swap commutative binops before constant-based folding	Hal Finkel	2015-09-06	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In searching for a fix for the underlying code-quality bug highlighted by r246937 (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS), I ran across this: We generically canonicalize commutative binary-operation nodes in SDAG getNode so that, if only one operand is a constant, it will be on the RHS. However, we were doing this only after a bunch of constant-based simplification checks that all assume this canonical form (that any constant will be on the RHS). Moving the operand-swapping canonicalization prior to these checks seems like the right thing to do (and, as it turns out, causes SDAG to completely fold away the computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine would do). llvm-svn: 246938
*	[PowerPC] Don't commute trivial rlwimi instructions	Hal Finkel	2015-09-06	1	-0/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937
*	[X86] Updated vector lzcnt tests. Added missing vec512 tests.	Simon Pilgrim	2015-09-05	3	-118/+1526
\| \| \| \|	llvm-svn: 246927
*	[X86] Updated vector tzcnt tests. Added vec512 tests.	Simon Pilgrim	2015-09-05	3	-12/+975
\| \| \| \|	llvm-svn: 246922
*	[X86] Updated vector popcnt tests. Added vec512 tests.	Simon Pilgrim	2015-09-05	3	-44/+200
\| \| \| \|	llvm-svn: 246921
*	[PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation	Hal Finkel	2015-09-05	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from an input pattern that looks like this: and(or(x, c1), c2) but the associated logic does not work if there are bits that are 1 in c1 but 0 in c2 (these are normally canonicalized away, but that can't happen if the 'or' has other users. Make sure we abort the transformation if such bits are discovered. Fixes PR24704. llvm-svn: 246900
*	[X86][AVX] Test tidyup + regeneration. NFCI.	Simon Pilgrim	2015-09-04	1	-8/+14
\| \| \| \|	llvm-svn: 246863
*	Fix the testcase in r246790	Steven Wu	2015-09-04	1	-1/+1
\| \| \| \| \| \|	Using generic neon syntax to avoid test failure on apple platforms. llvm-svn: 246833
*	[PowerPC] Try harder to find a base+offset when looking for consecutive accesses	Hal Finkel	2015-09-03	1	-0/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When forming permutation-based unaligned vector loads, we need to know whether it is valid to read ahead of the requested address by a full vector length. Doing so is more efficient (and allows for more CSE with later loads), but could trigger a page fault if invalid. To determine validity, we look for other loads in the same block that access the relevant address range. The relevant point here is that we need to do this as part of the process of forming permutation-based vector loads, and this happens quite early in the SDAG pipeline - specifically before many of the address calculations are fully canonicalized. As a result, we need to try harder to recognize base+offset address computations, because they still might appear as chain of adds (base+offset+offset, for example). To account for this, we'll look through chains of adds, accumulating the constant offsets. llvm-svn: 246813
*	[PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic	Hal Finkel	2015-09-03	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \|	If you compute the MMO offset using unsigned arithmetic, you end up with a large positive offset instead of a small negative one. In theory, this could cause bad instruction-scheduling decisions later. I noticed this by inspection from the debug output, and using that for the regression test is the best I can do right now. llvm-svn: 246805
*	[AArch64] Improve ISel using across lane addition reduction.	Chad Rosier	2015-09-03	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In vectorized add reduction code, the final "reduce" step is sub-optimal. This change wll combine : ext v1.16b, v0.16b, v0.16b, #8 add v0.4s, v1.4s, v0.4s dup v1.4s, v0.s[1] add v0.4s, v1.4s, v0.4s into addv s0, v0.4s PR21371 http://reviews.llvm.org/D12325 Patch by Jun Bum Lim <junbuml@codeaurora.org>! llvm-svn: 246790
*	[ARM] Add a test case for revision 243956.	Quentin Colombet	2015-09-03	1	-0/+35
\| \| \| \|	llvm-svn: 246785
*	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."	Chad Rosier	2015-09-03	1	-105/+0
\| \| \| \| \| \| \| \|	This reverts commit r246769. This appears to have broken Multisource/Benchmarks/tramp3d-v4. llvm-svn: 246782
*	[x86] enable machine combiner reassociations for scalar 'xor' insts	Sanjay Patel	2015-09-03	1	-0/+47
\| \| \| \|	llvm-svn: 246781
*	check for fastness before merging in DAGCombiner::MergeConsecutiveStores()	Sanjay Patel	2015-09-03	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771
*	[AArch64] Improve load/store optimizer to handle LDUR + LDR.	Chad Rosier	2015-09-03	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \|	This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. llvm-svn: 246769
*	[WinEH] Add llvm.eh.exceptionpointer intrinsic	Joseph Tremoulet	2015-09-03	2	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This intrinsic can be used to extract a pointer to the exception caught by a given catchpad. Its argument has token type and must be a `catchpad`. Also clarify ExtendingLLVM documentation regarding overloaded intrinsics. Reviewers: majnemer, andrew.w.kaylor, sanjoy, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12533 llvm-svn: 246752
*	AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, ↵	Igor Breger	2015-09-03	5	-34/+271
\| \| \| \| \| \| \| \| \| \|	vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750
*	[X86] Require 32-byte alignment for 32-byte VMOVNTs.	Ahmed Bougacha	2015-09-02	3	-7/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733
*	[X86] Cleanup nontemporal tests a little. NFC.	Ahmed Bougacha	2015-09-02	2	-11/+11
\| \| \| \| \|	Also: add a missing test for movntiq. llvm-svn: 246730
*	[PowerPC] Cleanup cost model for unaligned vector loads/stores	Hal Finkel	2015-09-02	1	-0/+580
\| \| \| \| \| \| \| \| \| \|	I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712
*	use "unpredictable" metadata in fast-isel when splitting compares	Sanjay Patel	2015-09-02	1	-0/+52
\| \| \| \| \| \| \| \|	This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch. Differential Revision: http://reviews.llvm.org/D12342 llvm-svn: 246692
*	use "unpredictable" metadata in SelectionDAG when splitting compares	Sanjay Patel	2015-09-02	1	-2/+28
\| \| \| \| \| \| \| \|	This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch. Differential Revision: http://reviews.llvm.org/D12343 llvm-svn: 246691
*	[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE	Hal Finkel	2015-09-02	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \|	LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the TableGen-generated matching code, and it does this by testing the same predicates used by the TableGen files. Unfortunately, when we added new P8Altivec-only predicates, we started universally testing them in LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8, we'd end up with a selection failure. llvm-svn: 246675
*	[x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accesses	Sanjay Patel	2015-09-02	2	-29/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a continuation of the fix from: http://reviews.llvm.org/D10662 and discussion in: http://reviews.llvm.org/D12154 Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) assumes that unaligned scalar accesses are always ok, so this changes allowsMisalignedMemoryAccesses() to match that behavior. Differential Revision: http://reviews.llvm.org/D12543 llvm-svn: 246658
*	[X86][AVX512VLBW] add support in byte shift and SAD	Asaf Badouh	2015-09-02	3	-1/+45
\| \| \| \| \| \| \| \| \|	add byte shift left/right add SAD - compute sum of absolute differences Differential Revision: http://reviews.llvm.org/D12479 llvm-svn: 246654
*	AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S ↵	Igor Breger	2015-09-02	2	-0/+156
\| \| \| \| \| \| \| \| \| \|	instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642
*	AVX512: Implemented encoding and intrinsics for vshufps/d.	Igor Breger	2015-09-02	3	-0/+123
\| \| \| \| \| \| \| \|	Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11709 llvm-svn: 246640
*	AVX-512: store <4 x i1> and <2 x i1> values in memory	Elena Demikhovsky	2015-09-02	1	-0/+14
\| \| \| \| \| \| \| \|	Enabled DAG pattern lowering for SKX with DQI predicate. Differential Revision: http://reviews.llvm.org/D12550 llvm-svn: 246625
*	Optimization for Gather/Scatter with uniform base	Elena Demikhovsky	2015-09-02	1	-0/+105
\| \| \| \| \| \| \| \| \|	Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence. While looking for uniform base, we want to use the scalar base pointer of GEP, if exists. Differential Revision: http://reviews.llvm.org/D11121 llvm-svn: 246622
*	[CodeGen] Fix FREM on 32-bit MSVC on x86	Vedant Kumar	2015-09-02	1	-0/+12
\| \| \| \| \| \| \| \|	Patch by Dylan McKay! Differential Revision: http://reviews.llvm.org/D12099 llvm-svn: 246615
*	[ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.	Ahmed Bougacha	2015-09-01	1	-0/+16
\| \| \| \| \| \| \| \| \|	The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only takes constant indices, but it does accept variables. Bail out for those: we can't use them, as the shuffles we want to reconstruct do require constant masks. llvm-svn: 246594
*	[AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0.	Ahmed Bougacha	2015-09-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	This matches the ARM behavior. In both cases, the register is part of the optional Performance Monitors extension, so, add the feature, and enable it for the A-class processors we support. Differential Revision: http://reviews.llvm.org/D12425 llvm-svn: 246555
*	AVX512: Implemented intrinsics for valign.	Igor Breger	2015-09-01	1	-0/+73
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D12526 llvm-svn: 246551
*	use CHECK-LABEL for more precision	Sanjay Patel	2015-09-01	1	-7/+7
\| \| \| \|	llvm-svn: 246547
*	[ARM][AArch64] Turn on by default interleaved access lowering	Silviu Baranga	2015-09-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Interleaved access lowering removes a memory operation and a sequence of vector shuffles and replaces it with a series of memory operations. This should be always beneficial. This pass in only enabled on ARM/AArch64. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12145 llvm-svn: 246540
*	Distribute the weight on the edge from switch to default statement to edges ↵	Cong Hou	2015-09-01	10	-30/+221
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generated in lowering switch. Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors. For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution. There are some exceptions: For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it. For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it. When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it. In other cases, the default weight is evenly distributed to successors. Differential Revision: http://reviews.llvm.org/D12418 llvm-svn: 246522
*	remove unnecessary/conflicting target info	Sanjay Patel	2015-09-01	1	-3/+0
\| \| \| \|	llvm-svn: 246514
*	fixed test to specify triple rather than arch and CPU	Sanjay Patel	2015-09-01	1	-2/+2
\| \| \| \|	llvm-svn: 246513
*	[DAGCombine] Fixup SETCC legality checking	Hal Finkel	2015-08-31	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 246507
*	WebAssembly: generate load/store	JF Bastien	2015-08-31	5	-0/+296
\| \| \| \| \| \| \| \| \|	Summary: This handles all load/store operations that WebAssembly defines, and handles those necessary for C++ such as i1. I left a FIXME for outstanding features which aren't required for now. Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff llvm-svn: 246500
*	Fix CHECK directives that weren't checking.	Hans Wennborg	2015-08-31	6	-19/+17
\| \| \| \|	llvm-svn: 246485
*	[x86] enable machine combiner reassociations for scalar 'or' insts	Sanjay Patel	2015-08-31	1	-0/+47
\| \| \| \|	llvm-svn: 246481