bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][TableGen] Allow timm to appear in output patterns. Use it to remove ↵	Craig Topper	2019-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ConvertToTarget opcodes from the X86 isel table. We're now using a lot more TargetConstant nodes in SelectionDAG. But we were still telling isel to convert some of them to TargetConstants even though they already are. This is because isel emits a conversion anytime the output pattern has a an 'imm'. I guess for patterns in instructions we take the 'timm' from the 'set' pattern, but for Pat patterns with explcicit output we previously had to say 'imm' since 'timm' wasn't allowed in outputs. llvm-svn: 372525
*	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"	Matt Arsenault	2019-09-19	1	-8/+8
\| \| \| \| \| \| \| \| \|	This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
*	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"	Hans Wennborg	2019-09-19	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
*	GlobalISel: Don't materialize immarg arguments to intrinsics	Matt Arsenault	2019-09-19	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
*	[X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.	Craig Topper	2019-08-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	The upper 4 bits of the immediate byte are used to encode a register. We need to limit the explicit immediate to fit in the remaining 4 bits. Fixes PR42899. llvm-svn: 368123
*	[X86] Remove the _alt forms of XOP VPCOM instructions. Use a combination of ↵	Craig Topper	2019-03-17	1	-20/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	custom printing and custom parsing to achieve the same result and more Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible. The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b". The _alt form on the other hand parsed and printed like any other instruction with no specialness. With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax. I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off. I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them. Differential Revision: https://reviews.llvm.org/D59398 llvm-svn: 356343
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[X86] Stop promoting vector and/or/xor/andn to vXi64.	Craig Topper	2018-10-26	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \|	These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408
*	Recommit r344877 "[X86] Stop promoting integer loads to vXi64"	Craig Topper	2018-10-22	1	-38/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
*	Revert r344877 "[X86] Stop promoting integer loads to vXi64"	Craig Topper	2018-10-22	1	-37/+38
\| \| \| \| \| \|	Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
*	[X86] Stop promoting integer loads to vXi64	Craig Topper	2018-10-21	1	-38/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
*	[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)	Simon Pilgrim	2018-10-05	1	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868
*	[X86] More additions to the load folding tables based on the autogenerated ↵	Craig Topper	2018-06-16	1	-2/+3
\| \| \| \| \| \| \| \|	tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898
*	[X86] Split ↵	Simon Pilgrim	2018-05-10	1	-2/+2
\| \| \| \| \| \| \| \|	WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999
*	[X86] Add SchedWriteFRnd fp rounding scheduler classes	Simon Pilgrim	2018-05-04	1	-6/+6
\| \| \| \| \| \| \| \|	Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515
*	[X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM ↵	Simon Pilgrim	2018-05-03	1	-4/+8
\| \| \| \| \| \| \| \|	scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472
*	[X86] Convert most remaining XOP uses of X86SchedWritePair scheduler classes ↵	Simon Pilgrim	2018-05-02	1	-88/+102
\| \| \| \| \| \|	to X86SchedWriteWidths. llvm-svn: 331369
*	[X86] Cleanup WriteFAdd/WriteFCmp scheduler classes with more common default ↵	Simon Pilgrim	2018-05-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	values Intel models were targeting x87 instead of packed sse. Also fixes XOP's VFRCZ to use WriteFAdd/WriteFAddY. llvm-svn: 331340
*	[XOP] v4i32 IFMA 'VPMACS' instructions should use the WritePMULLD schedule class	Simon Pilgrim	2018-04-24	1	-15/+28
\| \| \| \|	llvm-svn: 330751
*	[X86] Add variable shuffle schedule classes	Simon Pilgrim	2018-04-11	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806
*	[X86] Add ReadAfterLds to some 3 src instructions	Craig Topper	2018-03-29	1	-6/+20
\| \| \| \| \| \| \| \| \| \|	Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first. I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already. Differential Revision: https://reviews.llvm.org/D44726 llvm-svn: 328834
*	[X86] Fix the SchedRW for XOP vpcom register form instructions to not be ↵	Craig Topper	2018-03-21	1	-2/+2
\| \| \| \| \| \|	marked as loads. llvm-svn: 328071
*	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead ↵	Craig Topper	2018-03-08	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991
*	[X86] Make XOP VPCOM instructions commutable to fold loads during isel.	Craig Topper	2018-02-20	1	-40/+53
\| \| \| \|	llvm-svn: 325547
*	[X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to ↵	Craig Topper	2018-01-24	1	-2/+2
\| \| \| \| \| \|	match other instructions. NFC llvm-svn: 323304
*	[X86] Use Ld scheduler classes for instructions with folded loads.	Craig Topper	2017-12-12	1	-18/+18
\| \| \| \|	llvm-svn: 320459
*	[X86][XOP] Add missing scheduler classes to XOP instructions	Simon Pilgrim	2017-11-21	1	-28/+39
\| \| \| \| \| \|	All match equivalent basic classes (WritePHAdd, WriteFAdd etc.) according to both the AMD 15h SOG and Agner's tables. llvm-svn: 318758
*	[X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.	Simon Pilgrim	2017-09-26	1	-8/+8
\| \| \| \| \| \| \| \|	The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
*	[X86] Remove isel checks for immediate size on floating point compare and ↵	Craig Topper	2017-09-20	1	-2/+2
\| \| \| \| \| \| \| \|	xop compare instructions. NFCI If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway. llvm-svn: 313724
*	[X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen ↵	Ayman Musa	2017-05-28	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087
*	[X86][XOP] Reduce the size of a multiclass by moving more stuff to ↵	Craig Topper	2017-02-18	1	-62/+33
\| \| \| \| \| \| \| \|	parameters instead of doing 128-bit and 256-bit simultaneously. This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions. llvm-svn: 295579
*	Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."	Craig Topper	2017-02-18	1	-61/+18
\| \| \| \| \| \|	Clang has now been fixed to not use these intrinsics. llvm-svn: 295571
*	Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."	Craig Topper	2017-02-18	1	-18/+61
\| \| \| \| \| \|	This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support. llvm-svn: 295565
*	[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.	Craig Topper	2017-02-18	1	-61/+18
\| \| \| \| \| \|	It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses. llvm-svn: 295564
*	[X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns	Simon Pilgrim	2017-01-14	1	-0/+3
\| \| \| \| \| \|	VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator llvm-svn: 292021
*	[X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns	Simon Pilgrim	2017-01-14	1	-1/+8
\| \| \| \| \| \|	VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator llvm-svn: 292020
*	[X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns	Simon Pilgrim	2017-01-14	1	-0/+11
\| \| \| \| \| \|	VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages llvm-svn: 292019
*	[X86][XOP] Add a reversed reg/reg form for VPROT instructions.	Craig Topper	2016-11-26	1	-0/+7
\| \| \| \| \| \|	The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions. llvm-svn: 287961
*	[X86] Create a new instruction format to handle 4VOp3 encoding. This saves ↵	Craig Topper	2016-08-22	1	-4/+4
\| \| \| \| \| \|	one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279424
*	[X86] Create a new instruction format to handle MemOp4 encoding. This saves ↵	Craig Topper	2016-08-22	1	-20/+20
\| \| \| \| \| \|	one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279423
*	[X86] Merge hasVEX_i8ImmReg into the ImmFormat type which had extra unused ↵	Craig Topper	2016-08-22	1	-28/+28
\| \| \| \| \| \|	encodings. This saves one bit in TSFlags. NFC llvm-svn: 279412
*	[X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions	Simon Pilgrim	2016-06-03	1	-16/+26
\| \| \| \| \| \| \| \| \| \| \| \|	This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage. The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls. Mask decoding/target shuffle support will be added in future patches. Differential Revision: http://reviews.llvm.org/D20049 llvm-svn: 271633
*	[X86][XOP] Fixed instruction postfixes to more closely match operands	Simon Pilgrim	2016-03-24	1	-85/+85
\| \| \| \| \| \|	Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305
*	[X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI.	Simon Pilgrim	2016-03-24	1	-15/+14
\| \| \| \|	llvm-svn: 264294
*	[X86][XOP] Support for VPPERM byte shuffle instruction	Simon Pilgrim	2016-03-24	1	-3/+41
\| \| \| \| \| \| \| \|	This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260
*	[X86] Add some missing reversed forms of XOP instructions.	Craig Topper	2016-02-20	1	-0/+29
\| \| \| \|	llvm-svn: 261417
*	[X86][XOP] Add support for the matching of the VPCMOV bit select instruction	Simon Pilgrim	2015-11-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) ) This patch adds tablegen pattern matching for this instruction. Differential Revision: http://reviews.llvm.org/D8841 llvm-svn: 251975
*	[X86][XOP] Add VPROT instruction opcodes	Simon Pilgrim	2015-10-17	1	-33/+13
\| \| \| \| \| \|	Added X86ISD opcodes for VPROT vector rotate by variable and by immediate. llvm-svn: 250620
*	[X86] Change all the i8imm operands in XOP instructions to u8imm so the ↵	Craig Topper	2015-10-13	1	-10/+10
\| \| \| \| \| \|	parser will check the size. llvm-svn: 250147
*	[X86][XOP] Added support for the lowering of 128-bit vector integer ↵	Simon Pilgrim	2015-10-11	1	-12/+16
\| \| \| \| \| \| \| \|	comparisons to XOP PCOM/PCOMU instructions. The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646). llvm-svn: 249976