bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[globalisel][tablegen] Honour priority order within nested instructions.	Daniel Sanders	2018-01-17	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \|	It appears that we haven't been prioritizing rules that contain nested instructions properly. InstructionOperandMatcher didn't override isHigherPriorityThan so it never compared the instructions/operands/predicates inside nested instructions. Fixes PR35926. Thanks to Diana Picus for the bug report. llvm-svn: 322754
*	Revert [PowerPC] This reverts commit rL322721	Zaara Syeda	2018-01-17	2	-88/+0
\| \| \| \| \| \|	Failing build bots. Revert the commit now. llvm-svn: 322748
*	Use a got to access a hidden weak undefined on MachO.	Rafael Espindola	2018-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to link __attribute__((weak, visibility("hidden"))) extern int foo; int *main(void) { return &foo; } on OS X fails with ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64 The problem being that 0 cannot be computed as a fixed difference from %rip. Exactly the same issue exists on ELF and we can use the same solution. llvm-svn: 322739
*	[ARM] Optimize {s,u}mul.with.overflow.	Joel Galenson	2018-01-17	1	-0/+40
\| \| \| \| \| \| \| \|	This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG. Differential revision: https://reviews.llvm.org/D40922 llvm-svn: 322738
*	[ARM] Optimize {s,u}{add,sub}.with.overflow.	Joel Galenson	2018-01-17	2	-24/+54
\| \| \| \| \| \| \| \|	The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other). Differential revision: https://reviews.llvm.org/D38378 llvm-svn: 322737
*	[X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit ↵	Craig Topper	2018-01-17	6	-93/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	elements and use a 64-bit broadcast If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done. We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement. I've also restricted this to AVX2 only because we can only broadcast loads under AVX. Differential Revision: https://reviews.llvm.org/D42086 llvm-svn: 322730
*	[X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to ↵	Craig Topper	2018-01-17	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \|	introduce bitcasts to i64 in 32-bit mode We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine. The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast. This fixes pr35972. llvm-svn: 322724
*	[X86][SSE] Add v4i16 PMULLD tests	Simon Pilgrim	2018-01-17	1	-0/+72
\| \| \| \|	llvm-svn: 322723
*	[PowerPC] Add handling for ColdCC calling convention and a pass to mark	Zaara Syeda	2018-01-17	2	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721
*	[SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp.	Jonas Paulsson	2018-01-17	1	-0/+11953
\| \| \| \| \| \| \| \|	BRCTH is capable of a long branch which needs to be recognized during branch relaxation. This is done by checking for ExtraRelaxSize == 0. Review: Ulrich Weigand llvm-svn: 322688
*	[ARM GlobalISel] Add instselect tests for G_FPEXT and G_FPTRUNC	Diana Picus	2018-01-17	1	-0/+55
\| \| \| \| \| \| \|	G_FPEXT and G_FPTRUNC are handled by TableGen'erated code, just add tests. llvm-svn: 322665
*	[AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian	Pablo Barrio	2018-01-17	1	-0/+207
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Loading a vector of 4 half-precision FP sometimes results in an LD1 of 2 single-precision FP + a reversal. This results in an incorrect byte swap due to the conversion from little endian to big endian. In order to generate the correct byte swap, it is easier to generate the correct LD1 of 4 half-precision FP, thus avoiding the subsequent reversal. Reviewers: craig.topper, jmolloy, olista01 Reviewed By: olista01 Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41863 llvm-svn: 322663
*	[ARM GlobalISel] Map G_FPEXT and G_FPTRUNC to FPR	Diana Picus	2018-01-17	1	-0/+45
\| \| \| \|	llvm-svn: 322657
*	[AMDGPU] add LDS f32 intrinsics	Daniil Fukalov	2018-01-17	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \|	added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656
*	[ARM GlobalISel] Legalize G_FPEXT and G_FPTRUNC	Diana Picus	2018-01-17	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \|	Mark G_FPEXT and G_FPTRUNC as legal or libcall, depending on hardware support, but only for conversions between float and double. Also add the necessary boilerplate so that the LegalizerHelper can introduce the required libcalls. This also works only for float and double, but isn't too difficult to extend when the need arises. llvm-svn: 322651
*	[X86] Don't mutate shuffle arguments after early-out for AVX512	Benjamin Kramer	2018-01-17	1	-0/+40
\| \| \| \| \| \| \| \| \| \|	The match* functions have the annoying behavior of modifying its inputs. Save and restore the inputs, just in case the early out for AVX512 is hit. This is still not great and its only a matter of time this kind of bug happens again, but I couldn't come up with a better pattern without rewriting significant chunks of this code. Fixes PR35977. llvm-svn: 322644
*	[X86][AVX] Add extra 'interleaved+lanepermute' shuffle test	Simon Pilgrim	2018-01-17	1	-0/+51
\| \| \| \| \| \|	Possible missed opportunity to use 64-bit lane permute on AVX1 in lowerShuffleAsRepeatedMaskAndLanePermute llvm-svn: 322628
*	Allow usage of X86-prefixes as separate instrs.	Andrew V. Tischenko	2018-01-17	1	-1/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D42102 llvm-svn: 322623
*	[MC] Fix -stack-size-section on ARM	Sean Eveson	2018-01-17	1	-0/+30
\| \| \| \| \| \| \| \|	Change symbol values in the stack_size section from being 8 bytes, to being a target dependent size. Differential Revision: https://reviews.llvm.org/D42108 llvm-svn: 322619
*	[X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions	Simon Pilgrim	2018-01-16	2	-4/+4
\| \| \| \| \| \|	For some reason they don't have a trailing i like the packed equivalents. llvm-svn: 322600
*	[GlobalISel][TableGen] Add support for SDNodeXForm	Volkan Keles	2018-01-16	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds CustomRenderer which renders the matched operands to the specified instruction. Targets can enable the matching of SDNodeXForm by adding a definition that inherits from GICustomOperandRenderer and GISDNodeXFormEquiv as follows. def gi_imm8 : GICustomOperandRenderer<"renderImm8”>, GISDNodeXFormEquiv<imm8_xform>; Custom renderer functions should be of the form: void render(MachineInstrBuilder &MIB, const MachineInstr &I); Reviewers: dsanders, ab, rovka Reviewed By: dsanders Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet Differential Revision: https://reviews.llvm.org/D42012 llvm-svn: 322582
*	[X86][MMX] Accept UNDEF upper bits for MOVD GR32->MMX	Simon Pilgrim	2018-01-16	1	-78/+48
\| \| \| \|	llvm-svn: 322574
*	[X86][MMX] Improve MMX constant generation	Simon Pilgrim	2018-01-16	2	-23/+12
\| \| \| \| \| \|	Extend the MMX zero code to take any constant with zero'd upper 32-bits llvm-svn: 322553
*	[DebugInfo] Unify dumping of address ranges	Jonas Devlieghere	2018-01-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch unifies the printing of address ranges as [0x0, 0x1). rdar://34822059 Reviewers: aprantl, dblaikie Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D42056 llvm-svn: 322543
*	[BPF] Teach DAG2DAG AND elimination about load intrinsics	Yonghong Song	2018-01-16	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As commented on the existing code: // The Reg operand should be a virtual register, which is defined // outside the current basic block. DAG combiner has done a pretty // good job in removing truncating inside a single basic block. However, when the Reg operand comes from bpf_load_[byte \| half \| word] intrinsics, the generic optimizer doesn't understand their results are zero extended, so these single basic block elimination opportunities were missed. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 322534
*	[X86][MMX] Add support for MMX zero vector creation	Simon Pilgrim	2018-01-15	2	-34/+22
\| \| \| \| \| \| \| \| \| \|	As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register. This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well. Differential Revision: https://reviews.llvm.org/D41908 llvm-svn: 322525
*	[X86][SSE] Add custom execution domain fixing for ↵	Simon Pilgrim	2018-01-15	49	-1156/+821
\| \| \| \| \| \| \| \| \| \|	BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873) Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW. Differential Revision: https://reviews.llvm.org/D42042 llvm-svn: 322524
*	[x86] add tests to show missed constant shrinking (PR35907); NFC	Sanjay Patel	2018-01-15	1	-4/+81
\| \| \| \|	llvm-svn: 322523
*	[x86] regenerate test checks; NFC	Sanjay Patel	2018-01-15	1	-7/+21
\| \| \| \|	llvm-svn: 322522
*	[x86] regenerate test checks; NFC	Sanjay Patel	2018-01-15	1	-127/+249
\| \| \| \|	llvm-svn: 322521
*	[x86] regenerate test checks; NFC	Sanjay Patel	2018-01-15	1	-4/+7
\| \| \| \|	llvm-svn: 322519
*	[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32	Stanislav Mekhanoshin	2018-01-15	1	-3/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500
*	[Hexagon] Rewrite LowerVECTOR_SHUFFLE for 32-/64-bit vectors	Krzysztof Parzyszek	2018-01-15	2	-0/+152
\| \| \| \| \| \| \|	The old implementation was not always correct. The new one recognizes more shuffles that match specific instructions. llvm-svn: 322498
*	[SystemZ] Check for legality before doing LOAD AND TEST transformations.	Jonas Paulsson	2018-01-15	1	-0/+52
\| \| \| \| \| \| \| \| \| \|	Since a load and test instruction treat its operands as signed, it can only replace a logical compare for EQ/NE uses. Review: Ulrich Weigand https://bugs.llvm.org/show_bug.cgi?id=35662 llvm-svn: 322488
*	Update BTVER2 sched numbers for some AVX instructions (xmm version).	Andrew V. Tischenko	2018-01-15	3	-29/+29
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D40067 llvm-svn: 322485
*	Revert "[DAG] Elide overlapping stores"	Benjamin Kramer	2018-01-15	1	-2/+3
\| \| \| \| \| \| \|	This reverts commit r322085. Internal PPC testing is still showing the same symptoms as when this patch landed the last time. llvm-svn: 322474
*	[X86][SSE] Tag PR21137 test case	Simon Pilgrim	2018-01-14	1	-1/+2
\| \| \| \| \| \|	The test was added ages ago, but we didn't comment where it came from. llvm-svn: 322465
*	[X86] Add test cases for D41794.	Craig Topper	2018-01-14	2	-0/+408
\| \| \| \|	llvm-svn: 322464
*	[X86][SSE] Add PR22391 test case	Simon Pilgrim	2018-01-14	1	-0/+45
\| \| \| \|	llvm-svn: 322463
*	[X86] Autoupgrade kunpck intrinsics using vector operations instead of ↵	Craig Topper	2018-01-14	4	-72/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scalar operations Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free. Reviewers: spatel, RKSimon, zvi, jina.nahias Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42018 llvm-svn: 322462
*	[X86] Regenerate fp128 test	Simon Pilgrim	2018-01-14	1	-11/+18
\| \| \| \|	llvm-svn: 322460
*	[X86][SSE] Support combining MOVLHPS undef inputs	Simon Pilgrim	2018-01-14	1	-2/+1
\| \| \| \|	llvm-svn: 322459
*	[X86][SSE] Add v2f64 3u shuffle test	Simon Pilgrim	2018-01-14	1	-0/+14
\| \| \| \| \| \|	Shows a missed opportunity to remove a unnecessary move compared to 31 shuffle mask. llvm-svn: 322458
*	[x86] auto-generate complete checks; NFC	Sanjay Patel	2018-01-14	2	-21/+73
\| \| \| \|	llvm-svn: 322457
*	[X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output ↵	Craig Topper	2018-01-14	8	-304/+15
\| \| \| \| \| \|	types have the same number of elements. llvm-svn: 322455
*	[X86] Add X86ISD::VTRUNC to computeKnownBitsForTargetNode.	Craig Topper	2018-01-14	1	-2/+2
\| \| \| \| \| \| \| \|	We have to take special care to avoid the cases where the result of the truncate would be padded with zero elements. Ideally we'd just use ISD::TRUNCATE for these cases instead. llvm-svn: 322454
*	[X86] Improve legalization of vXi16/vXi8 selects.	Craig Topper	2018-01-14	4	-68/+70
\| \| \| \| \| \| \| \|	Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization. Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already. llvm-svn: 322450
*	[X86] Add an avx512bw command line to the avx512-vec-cmp.ll test. Add some ↵	Craig Topper	2018-01-14	1	-143/+283
\| \| \| \| \| \| \| \|	additional test cases. Additional test cases cover selects with i16/i8 conditions that are only 128/256-bits wide, but the compares are 512-bits wide and can only produce k-registers. We should be able to artificially widen the selects to avoid moving the k-register to an xmm/ymm register. llvm-svn: 322449
*	X86: Add pattern matching for PMADDWD	Zvi Rackover	2018-01-13	1	-555/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to the existing match as part of a loop-reduction, add a straightforward pattern match for DAG-contained patterns. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D41811 llvm-svn: 322446
*	[X86] Regenerate double shift tests	Simon Pilgrim	2018-01-13	3	-47/+70
\| \| \| \|	llvm-svn: 322444