bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AArch64] Add basic support for Qualcomm's Saphira CPU.	Chad Rosier	2017-09-25	4	-0/+4
\| \| \| \|	llvm-svn: 314105
*	[X86] Make IFMA instructions during isel so we can fold broadcast loads.	Craig Topper	2017-09-24	1	-6/+3
\| \| \| \| \| \|	This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
*	[X86] Add tests to show missed opportunities to fold broadcast loads into ↵	Craig Topper	2017-09-24	1	-0/+85
\| \| \| \| \| \| \| \|	IFMA instructions when the load is on operand1 of the instrinsic. We need to enable commuting during isel to catch this since the load folding tables can't handle broadcasts. llvm-svn: 314082
*	[X86] Add IFMA instructions to the load folding tables and make them ↵	Craig Topper	2017-09-24	1	-0/+70
\| \| \| \| \| \|	commutable for the multiply operands. llvm-svn: 314080
*	[X86][SSE] Add more tests for shuffle combining with extracted vector ↵	Simon Pilgrim	2017-09-24	1	-0/+56
\| \| \| \| \| \|	elements (PR22415) llvm-svn: 314077
*	[X86][SSE] Add support for extending bool vectors bitcasted from scalars	Simon Pilgrim	2017-09-24	3	-6582/+1099
\| \| \| \| \| \| \| \| \| \|	This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type. Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases. Differential Revision: https://reviews.llvm.org/D35320 llvm-svn: 314076
*	[PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGE	Nemanja Ivanovic	2017-09-24	4	-0/+524
\| \| \| \| \| \| \| \|	As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential review. llvm-svn: 314073
*	[AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare ↵	Craig Topper	2017-09-24	2	-70/+37
\| \| \| \| \| \| \| \|	instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072
*	[X86] Regenerate i64 to v2f32 bitcast test	Simon Pilgrim	2017-09-23	1	-3/+30
\| \| \| \|	llvm-svn: 314068
*	[x86] reduce 64-bit mask constant to 32-bits by right shifting	Sanjay Patel	2017-09-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up from D38181 (r314023). We have to put 64-bit constants into a register using a separate instruction, so we should try harder to avoid that. From what I see, we're not likely to encounter this pattern in the DAG because the upstream setcc combines from this don't (usually?) produce this pattern. If we fix that, then this will become more relevant. Since the cost of handling this case is just loosening the predicate of the existing fold, we might as well do it now. llvm-svn: 314064
*	[x86] add an add+shift test for follow-up suggestion from D38181; NFC	Sanjay Patel	2017-09-23	1	-0/+21
\| \| \| \|	llvm-svn: 314063
*	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT	Nemanja Ivanovic	2017-09-23	14	-0/+1200
\| \| \| \| \| \| \| \|	As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314062
*	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE	Nemanja Ivanovic	2017-09-23	17	-26/+1428
\| \| \| \| \| \| \| \|	As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314060
*	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT	Nemanja Ivanovic	2017-09-23	8	-7/+610
\| \| \| \| \| \| \| \|	As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314055
*	[x86] remove over-specified platform from test config	Sanjay Patel	2017-09-22	1	-27/+92
\| \| \| \|	llvm-svn: 314027
*	[x86] swap order of srl (and X, C1), C2 when it saves size	Sanjay Patel	2017-09-22	8	-288/+293
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. There are also better improvements based on known-bits that allow us to eliminate the mask entirely. As noted, this could be extended. There are potentially other wins from always shifting first, but doing that reveals a tangle of problems in other pattern matching. We do this transform generically in instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this in the backend. Differential Revision: https://reviews.llvm.org/D38181 llvm-svn: 314023
*	[XRay] support conditional return on PPC.	Tim Shen	2017-09-22	2	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest. Reviewers: dberris, echristo Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D38102 llvm-svn: 314005
*	Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass	Pranav Bhandarkar	2017-09-22	1	-0/+86
\| \| \| \| \| \| \| \| \|	If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 313993
*	[x86] remove unnecessary OS specifier from test	Sanjay Patel	2017-09-22	1	-190/+178
\| \| \| \|	llvm-svn: 313986
*	[x86] auto-generate complete checks; NFC	Sanjay Patel	2017-09-22	1	-15/+59
\| \| \| \|	llvm-svn: 313985
*	[x86] update test to use FileCheck; NFC	Sanjay Patel	2017-09-22	1	-1/+17
\| \| \| \|	llvm-svn: 313984
*	[X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has ↵	Alexander Ivchenko	2017-09-22	3	-124/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant arguments Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64]. One example of where it is useful is: before (20 bytes) <foo>: test $0x1,%dil mov $0x307e,%ax mov $0xffff,%cx cmovne %ax,%cx movzwl %cx,%eax retq after (18 bytes) <foo>: test $0x1,%dil mov $0x307e,%ecx mov $0xffff,%eax cmovne %ecx,%eax retq Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36711 llvm-svn: 313982
*	Recommit r310809 with a fix for the spill problem	Nemanja Ivanovic	2017-09-22	25	-96/+971
\| \| \| \| \| \| \| \| \| \|	This patch re-commits the patch that was pulled out due to a problem it caused, but with a fix for the problem. The fix was reviewed separately by Eric Christopher and Hal Finkel. Differential Revision: https://reviews.llvm.org/D38054 llvm-svn: 313978
*	[ARM] Add missing selection patterns for vnmla	Simon Pilgrim	2017-09-22	1	-2/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the following function: double fn1(double d0, double d1, double d2) { double a = -d0 - d1 * d2; return a; } on ARM, LLVM generates code along the lines of vneg.f64 d0, d0 vmls.f64 d0, d1, d2 i.e., a negate and a multiply-subtract. The attached patch adds instruction selection patterns to allow it to generate the single instruction vnmla.f64 d0, d1, d2 (multiply-add with negation) instead, like GCC does. Committed on behalf of @gergo- (Gergö Barany) Differential Revision: https://reviews.llvm.org/D35911 llvm-svn: 313972
*	[X86] Updating the test case for FMF propagation.	Jatin Bhateja	2017-09-22	1	-2/+14
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D38163 llvm-svn: 313964
*	AArch64: support SwiftCC properly on AAPCS64	Saleem Abdulrasool	2017-09-22	1	-0/+18
\| \| \| \| \| \| \| \| \|	The previous SwiftCC support for AAPCS64 was partially correct. It setup swiftself parameters in the proper register but failed to setup swifterror in the correct register. This would break compilation of swift code for non-Darwin AAPCS64 conforming environments. llvm-svn: 313956
*	[Hexagon] - Fix testcase for the HexagonVectorLoopCarriedReuse pass.	Pranav Bhandarkar	2017-09-21	1	-0/+86
\| \| \| \|	llvm-svn: 313936
*	Revert "Add a testfile that I missed in a previous commit that added ↵	Rafael Espindola	2017-09-21	1	-86/+0
\| \| \| \| \| \| \| \| \| \|	HexagonVectorLoopCarriedReuse pass" This reverts commit r313926. It was failing in some bots. llvm-svn: 313931
*	Add a testfile that I missed in a previous commit that	Pranav Bhandarkar	2017-09-21	1	-0/+86
\| \| \| \| \| \|	added HexagonVectorLoopCarriedReuse pass llvm-svn: 313926
*	[AArch64] Fix bug in store of vector 0 DAGCombine.	Geoff Berry	2017-09-21	3	-6/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid using XZR/WZR directly as operands to split stores of zero vectors. Doing so can lead to the XZR/WZR being used by an instruction that doesn't allow it (e.g. add). Fixes bug 34674. Reviewers: t.p.northover, efriedma, MatzeB Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38146 llvm-svn: 313916
*	[SelectionDAG] Pick correct frame index in LowerArguments	Bjorn Pettersson	2017-09-21	1	-0/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SelectionDAGISel::LowerArguments is associating arguments with frame indices (FuncInfo->setArgumentFrameIndex). That information is later on used by EmitFuncArgumentDbgValue to create DBG_VALUE instructions that denotes that a variable can be found on the stack. I discovered that for our (big endian) out-of-tree target the association created by SelectionDAGISel::LowerArguments sometimes is wrong. I've seen this happen when a 64-bit value is passed on the stack. The argument will occupy two stack slots (frame index X, and frame index X+1). The fault is that a call to setArgumentFrameIndex is associating the 64-bit argument with frame index X+1. The effect is that the debug information (DBG_VALUE) will point at the least significant part of the arguement on the stack. When printing the argument in a debugger I will get the wrong value. I managed to create a test case for PowerPC that seems to show the same kind of problem. The bugfix will look at the datalayout, taking endianness into account when examining a BUILD_PAIR node, assuming that the least significant part is in the first operand of the BUILD_PAIR. For big endian targets we should use the frame index from the second operand, as the most significant part will be stored at the lower address (using the highest frame index). Reviewers: bogner, rnk, hfinkel, sdardis, aprantl Reviewed By: aprantl Subscribers: nemanjai, aprantl, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37740 llvm-svn: 313901
*	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} ↵	Artem Belevich	2017-09-21	2	-0/+97
\| \| \| \| \| \| \| \|	instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898
*	[x86] add more tests for node-level FMF; NFC	Sanjay Patel	2017-09-21	1	-0/+45
\| \| \| \|	llvm-svn: 313893
*	Fix buildbot failures, add mtriple to gpr-vsr-spill.ll	Zaara Syeda	2017-09-21	1	-1/+1
\| \| \| \|	llvm-svn: 313890
*	[Power9] Spill gprs to vector registers rather than stack	Zaara Syeda	2017-09-21	1	-0/+24
\| \| \| \| \| \| \| \| \| \|	This patch updates register allocation to enable spilling gprs to volatile vector registers rather than the stack. It can be enabled for Power9 with option -ppc-enable-gpr-to-vsr-spills. Differential Revision: https://reviews.llvm.org/D34815 llvm-svn: 313886
*	[X86][SSE] Add PSHUFLW/PSHUFHW tests inspired by PR34686	Simon Pilgrim	2017-09-21	2	-0/+96
\| \| \| \|	llvm-svn: 313883
*	[SystemZ] Improve optimizeCompareZero()	Jonas Paulsson	2017-09-21	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	More conversions to load-and-test can be made with this patch by adding a forward search in optimizeCompareZero(). Review: Ulrich Weigand https://reviews.llvm.org/D38076 llvm-svn: 313877
*	[X86] Adding a testpoint for fast-math flags propagation.	Jatin Bhateja	2017-09-21	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38127 llvm-svn: 313869
*	AMDGPU: Add option to stress calls	Matt Arsenault	2017-09-21	1	-0/+36
\| \| \| \| \| \| \|	This inverts the behavior of the AlwaysInline pass to mark every function not already marked alwaysinline as noinline. llvm-svn: 313865
*	AMDGPU: Fix crash on immediate operand	Matt Arsenault	2017-09-21	1	-0/+58
\| \| \| \| \| \| \| \|	We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852
*	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.	Artem Belevich	2017-09-20	1	-0/+94
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820
*	AMDGPU: Start selecting v_mad_mixhi_f16	Matt Arsenault	2017-09-20	2	-41/+212
\| \| \| \|	llvm-svn: 313814
*	X86: treat SwiftCC as Win64_CC on Win64	Saleem Abdulrasool	2017-09-20	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	The Swift CC is identical to Win64 CC with the exception of swift error being passed in r12 which is a CSR. However, since this calling convention is only used in swift -> swift code, it does not impact interoperability and can be treated entirely as Win64 CC. We would previously incorrectly lower the frame setup as we did not treat the frame as conforming to Win64 specifications. llvm-svn: 313813
*	AMDGPU: Start selecting v_mad_mixlo_f16	Matt Arsenault	2017-09-20	2	-0/+281
\| \| \| \| \| \| \| \|	Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806
*	AMDGPU: Fix encoding of op_sel for mad_mix* opcodes	Matt Arsenault	2017-09-20	1	-24/+24
\| \| \| \|	llvm-svn: 313797
*	CodeGen: support SwiftError SwiftCC on Windows x64	Saleem Abdulrasool	2017-09-20	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	Add support for passing SwiftError through a register on the Windows x64 calling convention. This allows the use of swifterror attributes on parameters which is used by the swift front end for the `Error` parameter. This partially enables building the swift standard library for Windows x86_64. llvm-svn: 313791
*	[X86][SSE] Add PR22415 test case	Simon Pilgrim	2017-09-20	1	-0/+22
\| \| \| \|	llvm-svn: 313755
*	Recommit [MachineCombiner] Update instruction depths incrementally for large ↵	Florian Hahn	2017-09-20	3	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751
*	[IfConversion] Add testcases [NFC]	Mikael Holmen	2017-09-20	6	-0/+211
\| \| \| \| \| \| \|	These tests should have been included in r310697 / D34099 but apparently I missed them. llvm-svn: 313737
*	AMDGPU: Match load d16 hi instructions	Matt Arsenault	2017-09-20	5	-16/+525
\| \| \| \| \| \| \| \| \| \| \| \|	Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716