bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[CostModel][X86] Fix SSE1 FADD/FSUB costs	Simon Pilgrim	2019-01-04	1	-2/+2
\| \| \| \| \| \| \| \|	Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
*	[CostModel][X86] Add SSE1 fp cost tests	Simon Pilgrim	2019-01-04	1	-40/+184
\| \| \| \|	llvm-svn: 350401
*	[CostModel][X86] Add truncate cost tests to cover all legal destination types	Simon Pilgrim	2019-01-03	1	-9/+137
\| \| \| \| \| \|	We were only testing costs for legal source vector element counts llvm-svn: 350323
*	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)	Simon Pilgrim	2019-01-03	2	-144/+524
\| \| \| \| \| \|	Costs for real SSE2 instructions llvm-svn: 350295
*	[X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)	Simon Pilgrim	2019-01-03	2	-0/+510
\| \| \| \|	llvm-svn: 350293
*	[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise ↵	Craig Topper	2018-12-13	1	-51/+31
\| \| \| \| \| \| \| \| \| \| \| \|	arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision: https://reviews.llvm.org/D55615 llvm-svn: 349072
*	[CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max ↵	Craig Topper	2018-12-10	9	-668/+668
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55480 llvm-svn: 348739
*	[CostModel][X86] Fix overcounting arithmetic cost in illegal types in ↵	Craig Topper	2018-12-07	19	-1021/+965
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision: https://reviews.llvm.org/D55397 llvm-svn: 348621
*	[X86] Remove -costmodel-reduxcost=true from the experimental vector ↵	Craig Topper	2018-12-05	18	-144/+144
\| \| \| \| \| \| \| \|	reduction intrinsic tests as it appears to be unnecessary. NFC I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here. llvm-svn: 348340
*	[X86] Add more cost model tests for vector reductions with narrow vector ↵	Craig Topper	2018-12-05	18	-0/+531
\| \| \| \| \| \|	types. NFC llvm-svn: 348339
*	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.	Jonas Paulsson	2018-12-03	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141
*	[TTI] Reduction costs only need to include a single extract element cost ↵	Simon Pilgrim	2018-12-01	20	-2050/+2050
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
*	[X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in ↵	Craig Topper	2018-12-01	6	-6/+6
\| \| \| \| \| \|	interleave/strided load/store cost model tests. llvm-svn: 348056
*	[SystemZ::TTI] i8/i16 operands extension costs revisited	Jonas Paulsson	2018-11-30	4	-44/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961
*	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵	Craig Topper	2018-11-28	1	-32/+15
\| \| \| \| \| \| \| \| \| \| \| \|	AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
*	[X86] Add some cost model entries for sext/zext for avx512bw	Craig Topper	2018-11-28	2	-22/+22
\| \| \| \| \| \| \| \| \| \|	This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785
*	[X86] Add a combine for back to back VSRAI instructions	Craig Topper	2018-11-28	1	-3/+3
\| \| \| \| \| \| \| \|	Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784
*	[SystemZ::TTI] Improve cost for compare of i64 with extended i32 load	Jonas Paulsson	2018-11-28	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \|	CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735
*	[SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.	Jonas Paulsson	2018-11-28	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \|	AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734
*	[SystemZ::TTI] Improved cost values for comparison against memory.	Jonas Paulsson	2018-11-28	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733
*	[SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.	Jonas Paulsson	2018-11-28	1	-0/+67
\| \| \| \| \| \| \| \| \| \|	Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732
*	[X86] Add test cases to show that we don't properly take ↵	Craig Topper	2018-11-28	1	-0/+144
\| \| \| \| \| \| \| \|	-mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost. llvm-svn: 347722
*	[X86] Add exhaustive cost model testing for sext/zext for all vector types ↵	Craig Topper	2018-11-27	2	-0/+958
\| \| \| \| \| \| \| \| \| \|	we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables. llvm-svn: 347719
*	[X86] Add cost model tests for experimental.vector.reduce.* with ↵	Craig Topper	2018-11-27	9	-0/+2852
\| \| \| \| \| \|	-x86-experimental-vector-widening-legalization llvm-svn: 347697
*	[X86] Add cost model test for masked load an store with ↵	Craig Topper	2018-11-27	1	-0/+606
\| \| \| \| \| \|	-x86-experimental-vector-widening-legalization llvm-svn: 347696
*	[X86] Add cost model tests for fp_to_int/int_to_fp with ↵	Craig Topper	2018-11-27	5	-0/+1765
\| \| \| \| \| \|	-x86-experimental-vector-widening-legalization llvm-svn: 347695
*	[X86] Add cost model tests for shifts with ↵	Craig Topper	2018-11-27	3	-0/+1589
\| \| \| \| \| \|	-x86-experimental-vector-widening-legalization. llvm-svn: 347694
*	Revert "[TTI] Reduction costs only need to include a single extract element ↵	Fedor Sergeev	2018-11-26	11	-1049/+1049
\| \| \| \| \| \| \| \| \| \|	cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541
*	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.	Jonas Paulsson	2018-11-22	1	-0/+57
\| \| \| \| \| \| \| \| \| \|	Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445
*	[TTI] Reduction costs only need to include a single extract element cost	Simon Pilgrim	2018-11-15	11	-1049/+1049
\| \| \| \| \| \| \| \| \| \| \| \|	We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970
*	[TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue	Simon Pilgrim	2018-11-14	2	-340/+268
\| \| \| \|	llvm-svn: 346868
*	[CostModel] Add generic expansion funnel shift cost support	Simon Pilgrim	2018-11-14	2	-1274/+2930
\| \| \| \| \| \| \| \|	Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854
*	[CostModel][X86] Fix constant vector XOP rights shifts	Simon Pilgrim	2018-11-13	2	-87/+71
\| \| \| \| \| \| \| \|	We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760
*	[CostModel][X86] Add more cost tests for funnel shifts	Simon Pilgrim	2018-11-13	2	-8/+2853
\| \| \| \| \| \|	Added full uniform/constant coverage for funnel shifts + rotates llvm-svn: 346754
*	[CostModel][X86] Add funnel shift rotation special case costs	Simon Pilgrim	2018-11-12	2	-84/+140
\| \| \| \| \| \|	When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688
*	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs	Simon Pilgrim	2018-11-12	2	-320/+320
\| \| \| \| \| \|	The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683
*	[CostModel][X86] Add some initial cost tests for funnel shifts	Simon Pilgrim	2018-11-12	2	-0/+780
\| \| \| \| \| \|	Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling llvm-svn: 346670
*	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is ↵	Simon Pilgrim	2018-11-12	1	-22/+22
\| \| \| \| \| \|	aligned within the source vector llvm-svn: 346664
*	[SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions	Jonas Paulsson	2018-11-12	1	-136/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663
*	[CostModel] Add more realistic SK_InsertSubvector generic costs.	Simon Pilgrim	2018-11-12	1	-8/+8
\| \| \| \| \| \|	Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662
*	[CostModel] Add more realistic SK_ExtractSubvector generic costs.	Simon Pilgrim	2018-11-12	1	-28/+28
\| \| \| \| \| \| \| \|	Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656
*	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the ↵	Simon Pilgrim	2018-11-09	11	-515/+563
\| \| \| \| \| \|	start of the source vector llvm-svn: 346538
*	[CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput ↵	Simon Pilgrim	2018-11-09	1	-36/+36
\| \| \| \| \| \| \| \|	(PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510
*	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.	Jonas Paulsson	2018-11-02	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \|	Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009
*	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion	Jonas Paulsson	2018-11-01	1	-0/+23
\| \| \| \| \| \| \| \| \| \|	Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818
*	[SystemZ::TTI] Accurate costs for i1->double vector conversions	Jonas Paulsson	2018-11-01	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \|	This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817
*	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)	Simon Pilgrim	2018-10-30	11	-478/+785
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
*	[SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.	Jonas Paulsson	2018-10-30	1	-36/+179
\| \| \| \| \| \| \| \| \| \|	Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596
*	[X86] Add -LABEL to some FileCheck checks. NFC	Craig Topper	2018-10-26	3	-240/+240
\| \| \| \|	llvm-svn: 345407
*	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted.	Jonas Paulsson	2018-10-25	1	-1/+257
\| \| \| \| \| \| \| \| \| \| \| \| \|	The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327