bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + ↵	Simon Pilgrim	2017-04-03	11	-102/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VECTOR_SHUFFLE It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging. This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values. There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch. Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this. Differential Revision: https://reviews.llvm.org/D31373 llvm-svn: 299387
*	x86 interrupt calling convention: re-align stack pointer on 64-bit if an ↵	Amjad Aboud	2017-04-03	2	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	error code was pushed The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated. This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue. Fixes Bug 26413 Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D30049 llvm-svn: 299383
*	Revert "[DAGCombine] A shuffle of a splat is always the splat itself"	Zvi Rackover	2017-04-03	1	-6/+10
\| \| \| \| \| \| \| \| \| \|	This reverts commit r299047 which is incorrect because the simplification may result in incorrect propogation of undefs to users of the folded shuffle. Thanks to Andrea Di Biagio for pointing this out. llvm-svn: 299368
*	[X86][MMX] Improve support for folding fptosi from XMM to MMX	Simon Pilgrim	2017-04-02	1	-7/+3
\| \| \| \|	llvm-svn: 299338
*	[X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64	Simon Pilgrim	2017-04-02	1	-4/+2
\| \| \| \|	llvm-svn: 299336
*	[X86][MMX] Added support for subvector extraction to MMX register	Simon Pilgrim	2017-04-02	1	-5/+3
\| \| \| \|	llvm-svn: 299335
*	Regenerate test with codegen. NFCI.	Simon Pilgrim	2017-04-02	1	-4/+10
\| \| \| \|	llvm-svn: 299333
*	Regenerate test with codegen. NFCI.	Simon Pilgrim	2017-04-02	1	-4/+89
\| \| \| \|	llvm-svn: 299332
*	Regenerate test. NFCI.	Simon Pilgrim	2017-04-02	1	-56/+56
\| \| \| \|	llvm-svn: 299331
*	[X86][MMX] Add generic fptosi 4f32-4i32 test	Simon Pilgrim	2017-04-02	1	-0/+39
\| \| \| \|	llvm-svn: 299328
*	[DAGCombiner] enable vector transforms for any/all {sign} bits set/clear	Sanjay Patel	2017-04-01	1	-39/+27
\| \| \| \| \| \| \| \|	The code already allowed vector types in via "isInteger" (which might want a more specific name), so use splat-friendly constant predicates to match those types. llvm-svn: 299304
*	[PowerPC, x86] add vector tests for any/all {sign} bits set/clear; NFC	Sanjay Patel	2017-04-01	1	-0/+121
\| \| \| \|	llvm-svn: 299303
*	[DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, ↵	Craig Topper	2017-04-01	1	-0/+32
\| \| \| \| \| \| \| \| \| \|	B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector. This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros. Fixes PR32484. llvm-svn: 299291
*	[DAGCombiner] add fold for 'All sign bits set?'	Sanjay Patel	2017-03-31	1	-5/+3
\| \| \| \| \| \| \| \| \| \|	(and (setlt X, 0), (setlt Y, 0)) --> (setlt (and X, Y), 0) We have 7 similar folds, but this one got away. The fact that the x86 test with a branch didn't change is probably a separate bug. We may also be missing this and the related folds in instcombine. llvm-svn: 299252
*	[x86] add/consolidate tests for setcc+setcc+and/or; NFC	Sanjay Patel	2017-03-31	3	-72/+347
\| \| \| \|	llvm-svn: 299238
*	[AVX-512] Update lowering for gather/scatter prefetch intrinsics to match ↵	Craig Topper	2017-03-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers. Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too. Fixes PR32411. llvm-svn: 299234
*	[DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ↵	Simon Pilgrim	2017-03-31	1	-17/+11
\| \| \| \| \| \| \| \|	ASHR and INSERT_VECTOR_ELT Followup to D31311 llvm-svn: 299221
*	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits	Simon Pilgrim	2017-03-31	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
*	[CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.	Ahmed Bougacha	2017-03-30	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the long-term, we want to replace statistics with something finer-grained that lets us gather per-function data. Remarks are that replacement. Create an ORE instance in SelectionDAGISel, and pass it to SelectionDAG. SelectionDAG was used so that we can emit remarks from all SelectionDAG-related code, including TargetLowering and DAGCombiner. This isn't used in the current patch but Adam tells me he's interested for the fp-contract combines. Use the ORE instance to emit FastISel failures as remarks (instead of the mix of dbgs() dumps and statistics that we currently have). Eventually, we want to have an API that tells us whether remarks are enabled (http://llvm.org/PR32352) so that we don't emit expensive remarks (in this case, dumping IR) when it's not needed. For now, use 'isEnabled' as a crude replacement. This does mean that the replacement for '-fast-isel-verbose' is now '-pass-remarks-missed=isel'. Additionally, clang users also need to enable remark diagnostics, using '-Rpass-missed=isel'. This also removes '-fast-isel-verbose2': there are no static statistics that we want to only enable in asserts builds, so we can always use the remarks regardless of the build type. Differential Revision: https://reviews.llvm.org/D31405 llvm-svn: 299093
*	[DAGCombine] A shuffle of a splat is always the splat itself	Zvi Rackover	2017-03-30	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add a simplification: shuffle (splat-shuffle), undef, M --> splat-shuffle Fixes pr32449 Patch by Sanjay Patel Reviewers: eli.friedman, RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31426 llvm-svn: 299047
*	[CodeGen] clean up and add tests for scalar and-of-setcc; NFC	Sanjay Patel	2017-03-29	2	-13/+33
\| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=32401 llvm-svn: 299034
*	[X86][AVX2] Prevent unary interleaving patterns from calling ↵	Simon Pilgrim	2017-03-29	1	-0/+15
\| \| \| \| \| \|	lowerVectorShuffleAsSplitOrBlend (PR32453) llvm-svn: 298993
*	[X86][MMX] Added generic sitofp test to compare against existing cvtdq2ps test.	Simon Pilgrim	2017-03-29	1	-0/+34
\| \| \| \|	llvm-svn: 298989
*	[AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to ↵	Craig Topper	2017-03-29	1	-11/+11
\| \| \| \| \| \|	GR32 is enough. llvm-svn: 298985
*	[AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where ↵	Craig Topper	2017-03-29	11	-96/+110
\| \| \| \| \| \| \| \|	we can just use COPY_TO_REGCLASS instead. This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together. llvm-svn: 298984
*	[SDAG] Remove -enable-fmf-dag	Adam Nemet	2017-03-28	1	-1/+1
\| \| \| \| \| \| \|	This is no longer needed as spotted by Sanjay in https://reviews.llvm.org/D31165. llvm-svn: 298963
*	[AVX-512] Add test case that was supposed to go with r298957.	Craig Topper	2017-03-28	1	-0/+69
\| \| \| \|	llvm-svn: 298959
*	[X86][MMX] Match MMX fp_to_sint conversions from XMM registers	Simon Pilgrim	2017-03-28	1	-41/+16
\| \| \| \| \| \| \| \| \| \|	We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack). This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc. Differential Revision: https://reviews.llvm.org/D30868 llvm-svn: 298943
*	[x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality	Sanjay Patel	2017-03-28	1	-19/+39
\| \| \| \| \| \| \|	Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933
*	[SDAG] Deal with deleted node in PromoteIntShiftOp	Nirav Dave	2017-03-28	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deal with case that initial node is deleted during dag-combine leading to an assertional failure in promoteIntShiftOp. Fixes PR32420. Reviewers: spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31403 llvm-svn: 298931
*	Add reproducer test for pr32449. NFC.	Zvi Rackover	2017-03-28	1	-0/+44
\| \| \| \|	llvm-svn: 298930
*	[X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW	Simon Pilgrim	2017-03-28	3	-12/+12
\| \| \| \|	llvm-svn: 298929
*	[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers	Craig Topper	2017-03-28	54	-2321/+2768
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
*	[x86] add separate check prefix for SSE; NFC	Sanjay Patel	2017-03-28	1	-1/+20
\| \| \| \| \| \| \|	We want to check each test on each target, so we need another prefix when SSE and AVX diverge (as they will if we handle 32-byte and higher). llvm-svn: 298926
*	[SDAG] Avoid deleted SDNodes PromoteIntBinOp	Nirav Dave	2017-03-28	2	-0/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from being used. Fixes PR32340 and PR32345. Reviewers: hfinkel, dbabokin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31148 llvm-svn: 298923
*	[x86] add AVX2 run to show 256-bit opportunity; NFC	Sanjay Patel	2017-03-28	1	-17/+17
\| \| \| \|	llvm-svn: 298918
*	[GlobalISel][X86] support G_FRAME_INDEX instruction selection.	Igor Breger	2017-03-28	3	-49/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: G_LOAD/G_STORE, add alternative RegisterBank mapping. For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank. Reviewers: zvi, rovka, qcolombet, ab Reviewed By: zvi Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30979 llvm-svn: 298907
*	[X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave ↵	Gadi Haber	2017-03-27	2	-56/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in AVX2 This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1. The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2. In this case using vector unpack instructions is more efficient. Reviewers: zvi delena igorb craig.topper guyblank eladcohen m_zuckerman aymanmus RKSimon llvm-svn: 298840
*	[X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL ↵	Simon Pilgrim	2017-03-26	1	-1/+0
\| \| \| \| \| \|	instructions llvm-svn: 298806
*	[X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk	Simon Pilgrim	2017-03-26	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481) The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X Differential Revision: https://reviews.llvm.org/D31200 llvm-svn: 298805
*	Regenerate test	Simon Pilgrim	2017-03-26	1	-1/+1
\| \| \| \|	llvm-svn: 298803
*	Regenerate test	Simon Pilgrim	2017-03-26	1	-7/+7
\| \| \| \| \| \|	The CHECK-DAG aren't necessary and get in the way of automated checks llvm-svn: 298802
*	Regenerate tests to remove duplicated checks	Simon Pilgrim	2017-03-26	1	-241/+118
\| \| \| \|	llvm-svn: 298801
*	[GlobalISel][X86] support G_FRAME_INDEX instruction selection.	Igor Breger	2017-03-26	2	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Support G_FRAME_INDEX instruction selection. Reviewers: zvi, rovka, ab, qcolombet Reviewed By: ab Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30980 llvm-svn: 298800
*	[X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, ↵	Simon Pilgrim	2017-03-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	(NumSignBits-1)) Part 3 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298782
*	[X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI	Simon Pilgrim	2017-03-25	1	-2/+2
\| \| \| \| \| \| \| \|	Part 2 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298780
*	[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality	Sanjay Patel	2017-03-25	1	-11/+9
\| \| \| \| \| \| \| \| \|	This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775
*	[X86][SSE] Add extra computeNumSignBits test case for D31311.	Simon Pilgrim	2017-03-25	1	-0/+47
\| \| \| \|	llvm-svn: 298774
*	[x86] add 32-bit RUN for better memcmp coverage; NFC	Sanjay Patel	2017-03-24	1	-102/+244
\| \| \| \|	llvm-svn: 298744
*	[X86][SSE] Add ashr + mask test cases.	Simon Pilgrim	2017-03-24	1	-0/+26
\| \| \| \| \| \|	Test cases showing cases where we're missing an opportunity to lshr a value with an extended sign to avoid loading a mask llvm-svn: 298716