bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Add load folding isel patterns to scalar_math_patterns and ↵	Craig Topper	2019-06-11	3	-9/+44
\| \| \| \| \| \| \| \|	AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032
*	Revert CMake: Make most target symbols hidden by default	Tom Stellard	2019-06-11	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028
*	CMake: Make most target symbols hidden by default	Tom Stellard	2019-06-10	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
*	[X86] When promoting i16 compare with immediate to i32, try to use ↵	Craig Topper	2019-06-10	1	-19/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920
*	[X86] Disable f32->f64 extload when sse2 is enabled	Craig Topper	2019-06-10	3	-26/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919
*	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.	Craig Topper	2019-06-10	1	-1/+22
\| \| \| \|	llvm-svn: 362915
*	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and ↵	Craig Topper	2019-06-10	3	-138/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914
*	[X86] NFCI : Comment updation for EVEX to VEX translation.	Jatin Bhateja	2019-06-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898
*	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant.	Craig Topper	2019-06-09	2	-15/+0
\| \| \| \| \| \| \|	We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895
*	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into ↵	Craig Topper	2019-06-08	4	-121/+23
\| \| \| \| \| \| \| \|	X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894
*	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.	Jonas Paulsson	2019-06-08	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
*	[X86] Remove unnecessary new line escape from the end of a macro. NFC	Craig Topper	2019-06-07	1	-1/+1
\| \| \| \|	llvm-svn: 362837
*	[x86] narrow extract subvector of vector select	Sanjay Patel	2019-06-07	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797
*	[X86] -march=cooperlake (llvm)	Pengfei Wang	2019-06-07	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776
*	[X86] Make a bunch of merge masked binops commutable for loading folding.	Craig Topper	2019-06-06	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746
*	[X86] Make masked floating point equality/ordered compares commutable for ↵	Craig Topper	2019-06-06	2	-7/+17
\| \| \| \| \| \| \| \|	load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717
*	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect ↵	Craig Topper	2019-06-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675
*	[X86] Fix mistake that marked ↵	Craig Topper	2019-06-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647
*	[X86] Add the vector integer min/max instructions to ↵	Craig Topper	2019-06-05	1	-0/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629
*	[x86] split more 256-bit stores of concatenated vectors	Sanjay Patel	2019-06-05	1	-3/+4
\| \| \| \| \| \| \| \|	As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620
*	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.	Simon Pilgrim	2019-06-05	1	-12/+17
\| \| \| \| \| \|	Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617
*	[X86][AVX] combineX86ShuffleChain - combine ↵	Simon Pilgrim	2019-06-05	1	-3/+10
\| \| \| \| \| \| \| \| \| \|	shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599
*	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI	Craig Topper	2019-06-05	1	-23/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	-Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565
*	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE ↵	Craig Topper	2019-06-04	3	-357/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535
*	[X86] Fold single-use variable into assert. NFC.	Benjamin Kramer	2019-06-04	1	-2/+2
\| \| \| \| \| \|	Avoids an unused variable warning in Release builds. llvm-svn: 362534
*	[x86] split 256-bit store of concatenated vectors	Sanjay Patel	2019-06-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524
*	[SelectionDAG][x86] limit post-legalization store merging by type	Sanjay Patel	2019-06-04	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|	The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507
*	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern ↵	Simon Pilgrim	2019-06-04	1	-49/+66
\| \| \| \| \| \| \| \|	match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504
*	[X86] Fix the pattern for merge masked vcvtps2pd.	Craig Topper	2019-06-03	1	-4/+1
\| \| \| \| \| \| \| \|	r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440
*	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs	Simon Pilgrim	2019-06-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338
*	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI.	Simon Pilgrim	2019-06-02	1	-8/+11
\| \| \| \| \| \|	Prep work before resurrecting D61257. llvm-svn: 362335
*	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)	Simon Pilgrim	2019-06-02	1	-5/+22
\| \| \| \| \| \|	Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327
*	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> ↵	Simon Pilgrim	2019-06-02	1	-36/+0
\| \| \| \| \| \| \| \| \| \|	bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324
*	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.	Craig Topper	2019-06-02	1	-0/+2
\| \| \| \|	llvm-svn: 362309
*	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading ↵	Craig Topper	2019-06-01	1	-0/+33
\| \| \| \| \| \|	folding tables. llvm-svn: 362288
*	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the ↵	Craig Topper	2019-06-01	1	-4/+2
\| \| \| \| \| \| \| \| \|	output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287
*	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.	Craig Topper	2019-05-31	2	-57/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203
*	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp ↵	Craig Topper	2019-05-31	1	-11/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199
*	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store ↵	Craig Topper	2019-05-31	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193
*	[X86] Add VP2INTERSECT instructions	Pengfei Wang	2019-05-31	16	-0/+301
\| \| \| \| \| \| \| \| \| \|	Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188
*	[X86] Remove result type constraints from the ↵	Craig Topper	2019-05-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175
*	[X86] Remove code that unnecessarily sets EXTLOAD with src type of ↵	Craig Topper	2019-05-30	1	-9/+0
\| \| \| \| \| \| \| \| \| \|	v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170
*	[X86][SSE] Improve bool vector extload (PR26091)	Simon Pilgrim	2019-05-30	1	-0/+15
\| \| \| \| \| \| \| \|	We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081
*	[X86] Add ENQCMD instructions	Pengfei Wang	2019-05-30	6	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \|	For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053
*	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to"	Pengfei Wang	2019-05-29	1	-3/+3
\| \| \| \| \| \|	This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918
*	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to	Pengfei Wang	2019-05-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912
*	[X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC ↵	Fangrui Song	2019-05-29	2	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq (%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910
*	[CodeGen] Add lrint/llrint builtins	Adhemerval Zanella	2019-05-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875
*	Revert "[x86] split 256-bit store of concatenated vectors"	Sanjay Patel	2019-05-28	1	-11/+0
\| \| \| \| \| \| \| \| \|	This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850
*	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets	David Greene	2019-05-28	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843