bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix spelling mistakes in Hexagon target comments. NFC.	Simon Pilgrim	2016-11-17	9	-12/+12
\| \| \| \| \| \|	Identified by Pedro Giffuni in PR27636. llvm-svn: 287248
*	Fix spelling mistakes in X86 target comments. NFC.	Simon Pilgrim	2016-11-17	3	-5/+5
\| \| \| \| \| \|	Identified by Pedro Giffuni in PR27636. llvm-svn: 287247
*	Revert "AMDGPU: Enable ConstrainCopy DAG mutation"	Konstantin Zhuravlyov	2016-11-17	1	-3/+0
\| \| \| \| \| \| \| \|	This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
*	Wdocumentation fix	Simon Pilgrim	2016-11-17	1	-5/+5
\| \| \| \|	llvm-svn: 287224
*	[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves	Simon Pilgrim	2016-11-17	1	-19/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves. If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer). This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases). Partial fix for PR30845 Differential Revision: https://reviews.llvm.org/D26590 llvm-svn: 287223
*	[ARM] Relax restriction on variadic functions for tailcall optimization	Pablo Barrio	2016-11-17	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Variadic functions can be treated in the same way as normal functions with respect to the number and types of parameters. Reviewers: grosbach, olista01, t.p.northover, rengolin Subscribers: javed.absar, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26748 llvm-svn: 287219
*	[X86] RegCall - Handling v64i1 in 32/64 bit target	Oren Ben Simhon	2016-11-17	5	-91/+357
\| \| \| \| \| \| \| \| \| \|	Register Calling Convention defines a new behavior for v64i1 types. This type should be saved in GPR. However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit). Differential Revision: https://reviews.llvm.org/D26181 llvm-svn: 287217
*	[X86] Fix formatting. NFC	Craig Topper	2016-11-17	1	-2/+2
\| \| \| \|	llvm-svn: 287211
*	[XRay] Support AArch64 in LLVM	Dean Michael Berris	2016-11-17	2	-1/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds XRay support in LLVM for AArch64 targets. This patch is one of a series: Clang: https://reviews.llvm.org/D26415 compiler-rt: https://reviews.llvm.org/D26413 Author: rSerge Reviewers: rengolin, dberris Subscribers: amehsan, aemerson, llvm-commits, iid_iunknown Differential Revision: https://reviews.llvm.org/D26412 llvm-svn: 287209
*	[CMake] NFC. Updating CMake dependency specifications	Chris Bieneman	2016-11-17	3	-6/+9
\| \| \| \| \| \|	This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system. llvm-svn: 287206
*	[AMDGPU] Custom lower f16 = fp_round f64	Konstantin Zhuravlyov	2016-11-17	2	-0/+23
\| \| \| \|	llvm-svn: 287203
*	[AMDGPU] Promote f16/i16 conversions to f32/i32	Konstantin Zhuravlyov	2016-11-17	2	-58/+8
\| \| \| \|	llvm-svn: 287201
*	[AMDGPU] Expand `br_cc` for f16	Konstantin Zhuravlyov	2016-11-17	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199
*	[AVR] Wrap all methods in the pseudo expansion pass in an anon namespace	Dylan McKay	2016-11-16	1	-2/+2
\| \| \| \| \| \| \|	The '-fpermissive' compiler flag complains if the template specializations used in the class are used in a different namespace. llvm-svn: 287176
*	[AVR] Remove unused method from AVRTargetMachine	Dylan McKay	2016-11-16	1	-3/+0
\| \| \| \|	llvm-svn: 287173
*	[x86] allow FP-logic ops when one operand is FP and result is FP	Sanjay Patel	2016-11-16	1	-14/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We save an inter-register file move this way. If there's any CPU where the FP logic is slower, we could transform this back to int-logic in MachineCombiner. This helps, but doesn't solve, PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 The 'andn' test shows that we're missing a pattern match to recognize the xor with -1 constant as a 'not' op. llvm-svn: 287171
*	[AVR] Add the pseudo instruction expansion pass	Dylan McKay	2016-11-16	3	-1/+1433
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A lot of the pseudo instructions are required because LLVM assumes that all integers of the same size as the pointer size are legal. This means that it will not currently expand 16-bit instructions to their 8-bit variants because it thinks 16-bit types are legal for the operations. This also adds all of the CodeGen tests that required the pass to run. Reviewers: arsenm, kparzysz Subscribers: wdng, mgorny, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D26577 llvm-svn: 287162
*	X86: Simplify X86ISD::Wrapper operand checks. NFCI.	Peter Collingbourne	2016-11-16	2	-18/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and invert the other. Also update the documentation comment for X86ISD::Wrapper. Differential Revision: https://reviews.llvm.org/D26731 llvm-svn: 287160
*	ARM: fix CodeGen for 64-bit shifts.	Tim Northover	2016-11-16	1	-17/+31
\| \| \| \| \| \| \| \| \|	One half of the shifts obviously needed conditional selection based on whether the shift amount is more than 32-bits, but leaving the other half as the natural shift isn't acceptable either: it's undefined behaviour to shift a 32-bit value by more than 31. llvm-svn: 287149
*	AMDGPU: Enable ConstrainCopy DAG mutation	Matt Arsenault	2016-11-16	1	-0/+3
\| \| \| \| \| \| \|	This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
*	[AArch64] Handle vector types in replaceZeroVectorStore.	Geoff Berry	2016-11-16	1	-20/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extend replaceZeroVectorStore to handle more vector type stores, floating point zero vectors and set alignment more accurately on split stores. This is a follow-up change to r286875. This change fixes PR31038. Reviewers: MatzeB Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26682 llvm-svn: 287142
*	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass	Tom Stellard	2016-11-16	4	-26/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
*	[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save ↵	Sanjay Patel	2016-11-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122
*	[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with ↵	Simon Pilgrim	2016-11-16	2	-18/+15
\| \| \| \| \| \| \| \| \| \| \| \|	generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen. LLVM counterpart to D26686 Differential Revision: https://reviews.llvm.org/D26736 llvm-svn: 287108
*	[mips] Fix unsigned/signed type error	Simon Dardis	2016-11-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MipsFastISel uses a a class to represent addresses with a signed member to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress all treated the offset as being positive. In cases where the offset was actually negative and a frame pointer was used, this would cause the constant synthesis routine to crash as it would generate an unexpected instruction sequence when frame indexes are replaced. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D26192 llvm-svn: 287099
*	[mips] not instruction alias	Simon Dardis	2016-11-16	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the single operand form of the not alias to microMIPS and MIPS along with additional tests. This partially resolves PR/30381. Thanks to Sean Bruno for reporting the issue! llvm-svn: 287097
*	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics.	Ayman Musa	2016-11-16	1	-4/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 287087
*	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul	Craig Topper	2016-11-16	1	-25/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083
*	[AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFC	Konstantin Zhuravlyov	2016-11-16	1	-23/+18
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26711 llvm-svn: 287077
*	AArch64: Use DeadRegisterDefinitionsPass before regalloc.	Matthias Braun	2016-11-16	2	-33/+26
\| \| \| \| \| \| \| \| \|	Doing this before register allocation reduces register pressure as we do not even have to allocate a register for those dead definitions. Differential Revision: https://reviews.llvm.org/D26111 llvm-svn: 287076
*	[AMDGPU] Handle f16 select{_cc}	Konstantin Zhuravlyov	2016-11-16	3	-15/+13
\| \| \| \| \| \| \| \| \| \|	- Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074
*	Always use relative jump table encodings on PowerPC64.	Joerg Sonnenberger	2016-11-16	2	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
*	AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument	Jan Vesely	2016-11-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056
*	[AArch64] Add support for Qualcomm's Falkor CPU.	Chad Rosier	2016-11-15	3	-0/+12
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036
*	AMDGPU/SI: Fix pattern for i16 = sign_extend i1	Tom Stellard	2016-11-15	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035
*	GlobalISel: remove unused variable to silence warning.	Tim Northover	2016-11-15	2	-2/+1
\| \| \| \|	llvm-svn: 287027
*	AMDGPU: Enable store clustering	Matt Arsenault	2016-11-15	3	-1/+13
\| \| \| \| \| \| \|	Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
*	[AArch64] Lower multiplication by a constant int to shl+add+shl	Haicheng Wu	2016-11-15	1	-9/+39
\| \| \| \| \| \| \| \| \| \| \|	Lower a = b * C where C = (2^n + 1) * 2^m to add w0, w0, w0, lsl n lsl w0, w0, m Differential Revision: https://reviews.llvm.org/D229245 llvm-svn: 287019
*	AMDGPU: Analyze mubuf with immediate soffset	Matt Arsenault	2016-11-15	1	-1/+6
\| \| \| \| \| \| \|	Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018
*	AMDGPU: Fix return after else	Matt Arsenault	2016-11-15	1	-8/+14
\| \| \| \|	llvm-svn: 287015
*	AMDGPU: Replace assert(false) with unreachable	Matt Arsenault	2016-11-15	3	-11/+17
\| \| \| \|	llvm-svn: 287013
*	[AMDGPU] Add wave barrier builtin	Stanislav Mekhanoshin	2016-11-15	3	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007
*	vector load store with length (left justified) llvm portion	Zaara Syeda	2016-11-15	1	-4/+16
\| \| \| \|	llvm-svn: 286993
*	fix formatting; NFC	Sanjay Patel	2016-11-15	1	-1/+1
\| \| \| \|	llvm-svn: 286989
*	[ARM] GlobalISel: Remove unused members. NFCI	Diana Picus	2016-11-15	3	-8/+4
\| \| \| \| \| \|	This silences some warnings that I didn't see with my host compiler. llvm-svn: 286981
*	[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)	Simon Pilgrim	2016-11-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979
*	[ARM] Make sure GlobalISel is only initialized once. NFCI	Diana Picus	2016-11-15	1	-12/+12
\| \| \| \| \| \| \| \| \|	Move some code inside the proper 'if' block to make sure it is only run once, when the subtarget is first created. Things can still break if we use different ARM target machines or if we have functions with different 'target-cpu' or 'target-features', we should fix that too in the future. llvm-svn: 286974
*	[PowerPC] Implement BE VSX load/store builtins - llvm portion.	Tony Jiang	2016-11-15	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967
*	[X86][FastISel] Assert that we are dealing with arithmetic with overflow ↵	Zvi Rackover	2016-11-15	1	-0/+3
\| \| \| \| \| \|	intrinsics. NFC llvm-svn: 286961
*	[AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>	Sam Kolton	2016-11-15	3	-47/+47
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is needed to be able to use this flags in InstrMappings. Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26666 llvm-svn: 286960