bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CostModel][X86] Add missing scalar i64->f32 uitofp costs	Simon Pilgrim	2020-01-06	1	-0/+4
\|
*	[X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵	Craig Topper	2019-12-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
*	[NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]	Anna Welker	2019-12-18	1	-5/+7
\| \| \| \| \| \| \|	Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610
*	Rename TTI::getIntImmCost for instructions and intrinsics	Reid Kleckner	2019-12-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[x86] add cost model special-case for insert/extract from element 0	Sanjay Patel	2019-12-06	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023
*	[X86] Remove ProcIntelGLM/ProcIntelGLP/ProcIntelTRM and replace them with a ↵	Craig Topper	2019-12-05	1	-2/+2
\| \| \| \| \| \|	single feature flag covers the two places they were used. Differential Revision: https://reviews.llvm.org/D71048
*	[x86] make SLM extract vector element more expensive than default	Sanjay Patel	2019-11-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
*	[X86] Remove setOperationAction for FP_TO_SINT v8i16.	Craig Topper	2019-11-12	1	-0/+3
\| \| \| \| \| \| \| \|	This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
*	[X86TargetTransformInfo] Fixed warning: Expression 'ISD == ISD::UREM' is ↵	Dávid Bolvanský	2019-11-06	1	-1/+1
\| \| \| \|	always true. NFCI.
*	[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM	Simon Pilgrim	2019-11-06	1	-0/+26
\| \| \| \|	As noted on D59710 we weren't handling the high costs of these operations on SLM.
*	[X86] Lower the cost of avx512 horizontal bool and/or reductions to ↵	Craig Topper	2019-11-04	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.
*	[X86] Reland: Enable YMM memcmp with AVX1	David Zarzycki	2019-11-01	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. This also updates the memcmp-optsize.ll test unlike the first patch. https://reviews.llvm.org/D69658
*	Revert rG0e252ae19ff8d99a59d64442c38eeafa5825d441 : [X86] Enable YMM memcmp ↵	Simon Pilgrim	2019-10-31	1	-2/+3
\| \| \| \| \| \| \| \|	with AVX1 Breaks build bots Differential Revision: https://reviews.llvm.org/D69658
*	[X86] Enable YMM memcmp with AVX1	David Zarzycki	2019-10-31	1	-3/+2
\| \| \| \| \| \| \| \|	Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. https://reviews.llvm.org/D69658
*	[X86] Only look up boolean reduction cost tables if the reduction is not ↵	Craig Topper	2019-10-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pairwise. Summary: We don't pattern match pairwise shuffles in SelectionDAG. So we should only return the optimized costs if its not a pairwise shuffle. I think SLP vectorizer gives priority to non pairwise shuffle if the cost is the same. And the look up for reduction intrinsics passes false for the pairwise flag. So this probably has no real effect today. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69083
*	[Alignment][NFC] getMemoryOpCost uses MaybeAlign	Guillaume Chatelet	2019-10-25	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
*	[CostModel][X86] Add CTLZ scalar costs	Simon Pilgrim	2019-10-14	1	-1/+22
\| \| \| \| \| \| \| \|	Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
*	[CostModel][X86] Add CTPOP scalar costs (PR43656)	Simon Pilgrim	2019-10-14	1	-0/+23
\| \| \| \| \| \| \| \|	Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
*	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store]	Sam Parker	2019-10-14	1	-5/+6
\| \| \| \| \| \| \| \| \|	Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
*	[CostModel][X86] Improve sum reduction costs.	Simon Pilgrim	2019-10-12	1	-22/+23
\| \| \| \| \| \| \| \|	I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
*	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Zi Xuan Wu	2019-10-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
*	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Jinsong Ji	2019-10-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
*	[LoopVectorize][PowerPC] Estimate int and float register pressure separately ↵	Zi Xuan Wu	2019-10-08	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017
*	[X86] Enable inline memcmp() to use AVX512	David Zarzycki	2019-10-04	1	-2/+1
\| \| \| \|	llvm-svn: 373706
*	[X86] Remove -x86-experimental-vector-widening-legalization command line flag	Craig Topper	2019-09-29	1	-71/+15
\| \| \| \| \| \| \| \| \|	This was added back to allow some performance regressions to be investigated. The main perf issue was fixed shortly after adding this back and no other major issues have been reported. So I think its safe to remove this again. llvm-svn: 373174
*	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types	Guillaume Chatelet	2019-09-27	1	-2/+2
\| \| \| \|	llvm-svn: 373081
*	[CostModel][X86] Fix SLM <2 x i64> icmp costs	Simon Pilgrim	2019-09-26	1	-0/+9
\| \| \| \| \| \| \| \|	SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954
*	[Cost][X86] Add more missing vector truncation costs	Simon Pilgrim	2019-09-22	1	-0/+6
\| \| \| \| \| \|	The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514
*	[Cost][X86] Add v2i64 truncation costs	Simon Pilgrim	2019-09-22	1	-0/+4
\| \| \| \| \| \| \| \|	We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs. llvm-svn: 372498
*	[LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align	Guillaume Chatelet	2019-09-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063
*	[X86] Simplify the setOperationAction handling for fp_to_uint by improving ↵	Craig Topper	2019-09-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700
*	[X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening ↵	Craig Topper	2019-08-22	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628
*	[X86] Add back the -x86-experimental-vector-widening-legalization comand ↵	Craig Topper	2019-08-20	1	-13/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	line flag and all associated code, but leave it enabled by default Google is reporting performance issues with the new default behavior and have asked for a way to switch back to the old behavior while we investigate and make fixes. I've restored all of the code that had since been removed and added additional checks of the command flag onto code paths that are not otherwise guarded by a check of getTypeAction. I've also modified the cost model tables to hopefully get us back to the previous costs. Hopefully we won't need to support this for very long since we have no test coverage of the old behavior so we can very easily break it. llvm-svn: 369332
*	[X86] Improve cost model for subvector extraction of less than 128-bit vectors	Craig Topper	2019-08-15	1	-0/+33
\| \| \| \| \| \| \| \|	Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022
*	[X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵	Craig Topper	2019-08-14	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \|	128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
*	Recommit r367901 "[X86] Enable ↵	Craig Topper	2019-08-07	1	-9/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
*	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."	Mitch Phillips	2019-08-06	1	-45/+9
\| \| \| \| \| \| \| \| \|	This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
*	[X86] Enable -x86-experimental-vector-widening-legalization by default.	Craig Topper	2019-08-05	1	-9/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
*	[ExpandMemCmp] Honor prefer-vector-width.	Clement Courbet	2019-06-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384
*	[ExpandMemCmp] Move all options to TargetTransformInfo.	Clement Courbet	2019-06-25	1	-25/+16
\| \| \| \| \| \|	Split off from D60318. llvm-svn: 364281
*	[LV] Suppress vectorization in some nontemporal cases	Warren Ristow	2019-06-17	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581
*	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs	Simon Pilgrim	2019-06-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338
*	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI.	Simon Pilgrim	2019-06-02	1	-8/+11
\| \| \| \| \| \|	Prep work before resurrecting D61257. llvm-svn: 362335
*	[CostModel][X86] Add min/max reduction costs for all SSE targets	Simon Pilgrim	2019-05-11	1	-6/+90
\| \| \| \| \| \| \| \|	The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528
*	[TTI][X86] Make getAddressComputationCost cost value const. NFCI.	Simon Pilgrim	2019-05-05	1	-1/+1
\| \| \| \|	llvm-svn: 359999
*	[CostModel][X86] Add bool anyof/allof reduction costs	Simon Pilgrim	2019-04-17	1	-0/+42
\| \| \| \| \| \| \| \|	On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574
*	[CostModel][X86] Masked load legalization requires an binary-shuffle not a ↵	Simon Pilgrim	2019-04-07	1	-2/+2
\| \| \| \| \| \| \| \|	select (PR39812) Expansion/truncation is better described by SK_PermuteTwoSrc than SK_Select llvm-svn: 357864
*	[X86MacroFusion] Handle branch fusion (AMD CPUs).	Clement Courbet	2019-03-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171
*	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and ↵	Craig Topper	2019-03-21	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687