bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Optimization of inserting vxi1 sub vector into vXi1 vector	Wang, Pengfei	2020-01-03	1	-55/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
*	[X86] Rewrite to the vXi1 subvector insertion code to not rely on the value ↵	Craig Topper	2019-10-02	1	-33/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495
*	[X86] Add VMOVSSZrrk/VMOVSDZrrk/VMOVSSZrrkz/VMOVSDZrrkz to getUndefRegClearance.	Craig Topper	2019-09-26	1	-2/+2
\| \| \| \| \| \| \| \| \|	We have isel patterns that can put an IMPLICIT_DEF on one of the sources for these instructions. So we should make sure we break any dependencies there. This should be done by just using one of the other sources. llvm-svn: 373025
*	[X86] Pass v32i16/v64i8 in zmm registers on KNL target.	Craig Topper	2019-08-30	1	-14/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495
*	[X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions.	Craig Topper	2019-07-06	1	-2/+2
\| \| \| \| \| \| \| \|	These patterns are the same as the MOVLPDmr and MOVHPDmr patterns, but with a bitcast at the end. We can just select the PD instruction and let execution domain fixing switch to PS. llvm-svn: 365267
*	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ	Simon Pilgrim	2019-06-26	1	-8/+10
\| \| \| \| \| \|	Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432
*	Teach the DAGCombine to fold this pattern(c1 and c2 is constant).	QingShan Zhang	2019-06-26	1	-20/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382
*	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ↵	Simon Pilgrim	2019-06-25	1	-22/+22
\| \| \| \| \| \| \| \| \| \|	ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290
*	[X86] Custom lower CONCAT_VECTORS of v2i1	Benjamin Kramer	2019-05-28	1	-0/+104
\| \| \| \| \| \| \|	The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814
*	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵	Craig Topper	2019-05-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
*	[SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC	Simon Pilgrim	2019-03-25	1	-61/+60
\| \| \| \| \| \| \| \| \| \|	First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D59363 llvm-svn: 356938
*	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. ↵	Craig Topper	2019-01-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918
*	[x86] allow vector load narrowing with multi-use values	Sanjay Patel	2018-11-10	1	-70/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
*	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded.	Craig Topper	2018-11-09	1	-22/+22
\| \| \| \| \| \| \| \|	This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539
*	[DAGCombiner] narrow vector binops when extraction is cheap	Sanjay Patel	2018-10-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
*	[X86] Move promotion of vector and/or/xor from legalization to DAG combine	Craig Topper	2018-10-15	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487
*	[X86] Handle COPYs of physregs better (regalloc hints)	Simon Pilgrim	2018-09-19	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578
*	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing ↵	Craig Topper	2018-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083
*	[DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)	Vedant Kumar	2018-05-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches tryToFoldExtOfLoad to set the right location on a newly-created extload. With that in place, the logic for performing a certain ([s\|z]ext (load ...)) combine becomes identical for sexts and zexts, and we can get rid of one copy of the logic. The test case churn is due to dependencies on IROrders inherited from the wrong SDLoc. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46158 llvm-svn: 332118
*	[SelectionDAG] Support some SimplifySetCC cases for comparing against vector ↵	Craig Topper	2018-03-01	1	-27/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	splats of constants. This supports things like (setcc ugt X, 0) -> (setcc ne X, 0) I've restricted to only make changes to vectors before legalize ops because I doubt all targets have accurate condition code legality information for vectors given how little we did before. Differential Revision: https://reviews.llvm.org/D42948 llvm-svn: 326495
*	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 ↵	Craig Topper	2018-02-28	1	-1/+1
\| \| \| \| \| \| \| \|	and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375
*	[X86] Promote 16-bit cmovs to 32-bits	Craig Topper	2018-02-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions. This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits. Differential Revision: https://reviews.llvm.org/D43327 llvm-svn: 325601
*	[X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 ↵	Craig Topper	2018-02-12	1	-2/+1
\| \| \| \| \| \| \| \|	helper fragments. This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold. llvm-svn: 324863
*	[X86] Use min/max for vector ult/ugt compares if avoids a sign flip.	Craig Topper	2018-02-11	1	-40/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare. I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior. Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into. I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42935 llvm-svn: 324842
*	Followup on Proposal to move MIR physical register namespace to '$' sigil.	Puyan Lotfi	2018-01-31	1	-72/+72
\| \| \| \| \| \| \| \| \| \| \| \|	Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
*	[X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and ↵	Craig Topper	2018-01-23	1	-24/+22
\| \| \| \| \| \| \| \| \| \|	inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212
*	[X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the ↵	Craig Topper	2018-01-23	1	-137/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default of promoting to v32i8. Summary: For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register. I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes. There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined. I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc)) Overall this removes something like 13000 CHECK lines from lit tests. Reviewers: zvi, RKSimon, delena, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42031 llvm-svn: 323201
*	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for ↵	Craig Topper	2018-01-18	1	-5/+5
\| \| \| \| \| \| \| \|	consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820
*	[X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output ↵	Craig Topper	2018-01-14	1	-2/+1
\| \| \| \| \| \|	types have the same number of elements. llvm-svn: 322455
*	[X86] Make v2i1 and v4i1 legal types without VLX	Craig Topper	2018-01-07	1	-61/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
*	[X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all ↵	Craig Topper	2018-01-01	1	-1/+0
\| \| \| \| \| \| \| \|	sign bits. If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB. llvm-svn: 321617
*	[SelectionDAG] Reverse the order of operands in the ISD::ADD created by ↵	Craig Topper	2017-12-22	1	-36/+18
\| \| \| \| \| \| \| \| \| \|	TargetLowering::getVectorElementPointer so that the FrameIndex is on the left. This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint. I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent. llvm-svn: 321370
*	[X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a ↵	Craig Topper	2017-12-22	1	-62/+30
\| \| \| \| \| \| \| \|	non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements. Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code. llvm-svn: 321369
*	[X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.	Craig Topper	2017-12-22	1	-15/+12
\| \| \| \|	llvm-svn: 321334
*	[X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering ↵	Craig Topper	2017-12-21	1	-4/+4
\| \| \| \| \| \| \| \|	extract_vector_elt from vXi1 with a non-const index. We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing. llvm-svn: 321315
*	[SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with ↵	Craig Topper	2017-12-15	1	-0/+589
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-constant index Summary: Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1. We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires. For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that. For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them. Reviewers: RKSimon, delena, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40942 llvm-svn: 320849
*	[X86] Don't zero the upper bits of the k-register before extracting a single ↵	Craig Topper	2017-12-14	1	-20/+6
\| \| \| \| \| \| \| \|	bit from a vXi1. This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway. llvm-svn: 320723
*	[X86] Make ANY_EXTEND from vXi1 Custom for more types.	Craig Topper	2017-12-14	1	-128/+7
\| \| \| \| \| \|	We should be able to support ANY_EXTEND for any types we support ZERO_EXTEND for. llvm-svn: 320675
*	[X86] Handle alls version of vXi1 insert_vector_elt with a constant index ↵	Craig Topper	2017-12-08	1	-82/+71
\| \| \| \| \| \| \| \| \| \| \| \|	without falling back to shuffles. We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert. This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position. The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift. llvm-svn: 320120
*	[X86] Fix InsertBitToMaskVector to only issue KSHIFTS of native size so that ↵	Craig Topper	2017-12-07	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \|	upper bits are properly zeroed. There's no v2i1 or v4i1 kshift, and v8i1 is only supported with AVXDQ. Isel has fake patterns to extend these types to native shifts, but makes no guarantees about the value of any bits shifted in when shifting right. This patch promotes the vector to a type that supports a native shift first and only allows inserting into the msb of a native sized shift. I've constructed this in a way that doesn't do the promotion if we're going to fallback to using a xmm/ymm/zmm shuffle. I think I have a plan to remove the shuffle fall back entirely. In which case we this can be simplified, but I wanted to fix the correctness issue first. llvm-svn: 320081
*	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.	Francis Visoiu Mistrih	2017-12-07	1	-52/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022
*	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih	2017-12-04	1	-108/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
*	[CodeGen] Print register names in lowercase in both MIR and debug output	Francis Visoiu Mistrih	2017-11-28	1	-52/+52
\| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187
*	[X86] Add a pass to convert instruction chains between domains.	Guy Blank	2017-10-22	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288
*	[X86][SSE] Add extractps/pextrd equivalence to domain tables	Simon Pilgrim	2017-10-21	1	-8/+8
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D39135 llvm-svn: 316274
*	[X86][AVX512] Regenerate element insertion/extraction tests	Simon Pilgrim	2017-10-10	1	-352/+171
\| \| \| \|	llvm-svn: 315322
*	[MC] Suppress .Lcfi labels when emitting textual assembly	Reid Kleckner	2017-10-10	1	-99/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - \| llvm-mc \| llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259
*	[X86] Use correct subvector index when combining two insert subvectors ↵	Craig Topper	2017-09-28	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	featuring zero vectors. Previously we were using one of the subvector indices twice. The included test case causes an assert without this change. Thanks to Simon Pilgrim for catching this. llvm-svn: 314429
*	[X86][SKX][KNL] Updated regression tests to use -mattr instead of -mcpu ↵	Gadi Haber	2017-09-27	1	-277/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flag.NFC. NFC. Updated 8 regression tests to use -mattr instead of -mcpu flag as follows: -mcpu=knl --> -mattr=+avx512f -mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target. Reviewers: delena, zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38222 Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14 llvm-svn: 314306
*	[X86] Teach the execution domain fixing tables to use movlhps inplace of ↵	Craig Topper	2017-09-18	1	-1/+1
\| \| \| \| \| \| \| \|	unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509