bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Pulled out repeated target shuffle decodes into helper functions. ↵	Simon Pilgrim	2016-02-07	1	-136/+89
\| \| \| \| \| \| \| \| \| \|	NFCI. Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions. The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively. llvm-svn: 260032
*	AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.	Igor Breger	2016-02-07	3	-70/+89
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024
*	[X86][AVX512] Added support for VPMOVZX shuffle decoding.	Simon Pilgrim	2016-02-06	1	-75/+35
\| \| \| \|	llvm-svn: 260007
*	[X86][SSE] Moved shuffle decode CASE macros earlier. NFC.	Simon Pilgrim	2016-02-06	1	-48/+48
\| \| \| \| \| \|	To allow the helper functions to make use of them. llvm-svn: 259997
*	[X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input types	Simon Pilgrim	2016-02-06	3	-75/+47
\| \| \| \| \| \| \| \|	First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement. This should also make it easier to decode X86ISD::VZEXT target shuffles in the future. llvm-svn: 259995
*	[X86][SSE] Don't replace an existing 32-bit load with its duplicate	Simon Pilgrim	2016-02-06	1	-1/+2
\| \| \| \| \| \| \| \|	If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991
*	Comment fix	Simon Pilgrim	2016-02-06	1	-1/+1
\| \| \| \|	llvm-svn: 259990
*	[AArch64] Add the scheduling model for Exynos-M1	Evandro Menezes	2016-02-06	2	-2/+361
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A). Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover Subscribers: aemerson, rengolin, MatzeB Differential Revision: http://reviews.llvm.org/D16644 llvm-svn: 259958
*	[AArch64] Refactoring aarch64-ldst-opt. NCF.	Jun Bum Lim	2016-02-05	1	-25/+38
\| \| \| \| \| \| \|	Remove narrow load / store instructions from getMatchingPairOpcode(), and add getMatchingWideOpcode(). llvm-svn: 259914
*	AMDGPU: Account for LDS alignment	Matt Arsenault	2016-02-05	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912
*	AMDGPU: Preserve alignments on new created globals	Matt Arsenault	2016-02-05	1	-2/+10
\| \| \| \| \| \| \|	Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911
*	AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo	Tom Stellard	2016-02-05	5	-96/+28
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900
*	AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors	Tom Stellard	2016-02-05	2	-10/+10
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16863 llvm-svn: 259897
*	AMDGPU/SI: Correctly initialize SIInsertWaits pass	Tom Stellard	2016-02-05	3	-7/+22
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16724 llvm-svn: 259894
*	[WebAssembly] Update the select instructions' operand orders to match the spec.	Dan Gohman	2016-02-05	2	-16/+16
\| \| \| \|	llvm-svn: 259893
*	Fix for PR 26193	Nemanja Ivanovic	2016-02-05	1	-1/+1
\| \| \| \| \| \| \|	This is a simple fix for a PowerPC intrinsic that was incorrectly defined (the return type was incorrect). llvm-svn: 259886
*	Move classes defined in a cpp file into an anonymous namespace.	Benjamin Kramer	2016-02-05	1	-0/+2
\| \| \| \| \| \|	No functionality change intended. llvm-svn: 259883
*	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."	Renato Golin	2016-02-05	1	-76/+21
\| \| \| \| \| \|	This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881
*	Fix for PR 26356	Nemanja Ivanovic	2016-02-04	1	-5/+4
\| \| \| \| \| \| \| \| \|	Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840
*	[AArch64] Bound the number of instructions we scan when searching for updates.	Chad Rosier	2016-02-04	1	-14/+26
\| \| \| \| \| \| \|	This only impacts the creation of pre-/post-index instructions. The bound was set high enough such that it did not change code generation for SPEC200X. llvm-svn: 259828
*	[X86][SSE] Select domain for 32/64-bit partial loads for ↵	Simon Pilgrim	2016-02-04	3	-33/+41
\| \| \| \| \| \| \| \| \| \|	EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816
*	[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).	Chad Rosier	2016-02-04	1	-21/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812
*	[AArch64] Multiply extended 32-bit ints with `[U\|S]MADDL'	Silviu Baranga	2016-02-04	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U\|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800
*	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to ↵	Simon Pilgrim	2016-02-04	2	-17/+44
\| \| \| \| \| \| \| \| \| \|	EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796
*	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."	Chad Rosier	2016-02-04	1	-77/+22
\| \| \| \| \| \|	This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795
*	AVX-512: Fixed a bug in FMA instruction selection on KNL	Elena Demikhovsky	2016-02-04	1	-1/+1
\| \| \| \| \| \| \| \|	The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792
*	[AArch64] Improve load/store optimizer to handle LDUR + LDR.	Chad Rosier	2016-02-04	1	-22/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790
*	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic	Michael Zuckerman	2016-02-04	3	-11/+42
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789
*	[X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.	Simon Pilgrim	2016-02-04	1	-60/+84
\| \| \| \|	llvm-svn: 259771
*	[X86] Use hash table in LEA optimization pass.	Andrey Turetskiy	2016-02-04	1	-150/+247
\| \| \| \| \| \| \| \|	Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time. Differential Revision: http://reviews.llvm.org/D16404 llvm-svn: 259770
*	[NVPTX] Disable performance optimizations when OptLevel==None	Jingyue Wu	2016-02-04	1	-21/+36
\| \| \| \| \| \| \| \| \| \|	Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749
*	clean up; NFC	Sanjay Patel	2016-02-03	1	-15/+13
\| \| \| \|	llvm-svn: 259720
*	ARM: support TLS for WoA	Saleem Abdulrasool	2016-02-03	5	-0/+62
\| \| \| \| \| \| \| \| \| \| \|	Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676
*	[ARM] Move GNUEABI divmod to __aeabi_divmod*	Renato Golin	2016-02-03	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657
*	[mips] Remove redundant inclusions of MipsAnalyzeImmediate.h	Daniel Sanders	2016-02-03	9	-8/+1
\| \| \| \|	llvm-svn: 259655
*	Fix for PR 26381	Nemanja Ivanovic	2016-02-03	1	-1/+1
\| \| \| \| \| \|	Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645
*	[mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections	Simon Atanasyan	2016-02-03	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641
*	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to ↵	Simon Pilgrim	2016-02-03	4	-124/+121
\| \| \| \| \| \| \| \| \| \| \| \|	EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635
*	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.	Kyle Butt	2016-02-03	1	-19/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617
*	Revert r259576: Disable the vzeroupper insertion pass on PS4.	Yunzhong Gao	2016-02-03	1	-3/+0
\| \| \| \| \| \|	Will re-implement based on review feedback. llvm-svn: 259615
*	Disable the vzeroupper insertion pass on PS4.	Yunzhong Gao	2016-02-02	1	-0/+3
\| \| \| \| \| \| \| \|	See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation. Original patch by: Sean Silva llvm-svn: 259576
*	AMDGPU: Do not promote allocas with non-inbounds GEPs	Matt Arsenault	2016-02-02	1	-0/+7
\| \| \| \| \| \| \| \|	If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
*	AMDGPU: Handle promoting memmove	Matt Arsenault	2016-02-02	1	-0/+24
\| \| \| \| \| \|	Also add missing tests for the others. llvm-svn: 259558
*	[X86] Fix the merging of SP updates in prologue/epilogue insertions.	Quentin Colombet	2016-02-02	1	-2/+7
\| \| \| \| \| \| \| \| \|	When the merging was involving LEAs, we were taking the wrong immediate from the list of operands. rdar://problem/24446069 llvm-svn: 259553
*	AMDGPU: Skip promote alloca with no optimizations	Matt Arsenault	2016-02-02	2	-2/+2
\| \| \| \|	llvm-svn: 259551
*	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca	Matt Arsenault	2016-02-02	1	-27/+21
\| \| \| \| \| \|	Mostly convert to use range loops. llvm-svn: 259550
*	AMDGPU: Report AMDGPUPromoteAlloca changed the function	Matt Arsenault	2016-02-02	1	-22/+21
\| \| \| \|	llvm-svn: 259547
*	AMDGPU: Whitelist handled intrinsics	Matt Arsenault	2016-02-02	1	-8/+36
\| \| \| \| \| \| \|	We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546
*	AMDGPU: Use inbounds when calculating workitem offset	Matt Arsenault	2016-02-02	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545
*	Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.	Eugene Zelenko	2016-02-02	2	-5/+1
\| \| \| \| \| \|	Differential revision: http://reviews.llvm.org/D16793 llvm-svn: 259539