bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[TTI] Add generic UADDO/USUBO costs	Simon Pilgrim	2019-01-24	1	-36/+378
\| \| \| \| \| \| \| \|	Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043
*	[IR] Match intrinsic parameter by scalar/vectorwidth	Simon Pilgrim	2019-01-23	1	-0/+414
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957
*	[CostModel][X86] Add ICMP Predicate specific costs	Simon Pilgrim	2019-01-22	1	-1036/+1036
\| \| \| \| \| \| \| \|	First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810
*	[CostModel][X86] Add XOP icmp cost tests (PR40376)	Simon Pilgrim	2019-01-21	1	-0/+462
\| \| \| \|	llvm-svn: 351741
*	[CostModel][X86] Add explicit vector select costs	Simon Pilgrim	2019-01-20	11	-689/+866
\| \| \| \| \| \| \| \| \| \|	Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685
*	[CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets	Simon Pilgrim	2019-01-20	1	-512/+512
\| \| \| \| \| \|	Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684
*	[CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes	Simon Pilgrim	2019-01-20	3	-330/+4529
\| \| \| \|	llvm-svn: 351682
*	[CostModel][X86] Add masked load/store/gather/scatter tests for ↵	Simon Pilgrim	2019-01-20	2	-458/+660
\| \| \| \| \| \|	SSE2/SSE42/AVX1 targets llvm-svn: 351681
*	[CostModel][X86] Add non-constant vselect cost tests	Simon Pilgrim	2019-01-20	1	-1/+121
\| \| \| \| \| \|	Also add AVX512 costs at the same time llvm-svn: 351680
*	[AMDGPU] Add some missing always-uniform values.	Neil Henning	2019-01-18	1	-1/+19
\| \| \| \| \| \| \| \| \|	This commit adds some missing intrinsics into the isAlwaysUniform list for the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D56845 llvm-svn: 351562
*	Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-15	1	-51/+70
\| \| \| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
*	Remove irrelevant references to legacy git repositories from	James Y Knight	2019-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
*	Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-14	1	-70/+51
\| \| \| \| \| \| \| \| \|	This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
*	[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors	Nikita Popov	2019-01-14	1	-51/+70
\| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
*	[ConstantFolding] Fold undef for integer intrinsics	Nikita Popov	2019-01-11	3	-104/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40110. This implements handling of undef operands for integer intrinsics in ConstantFolding, in particular for the bitcounting intrinsics (ctpop, cttz, ctlz), the with.overflow intrinsics, the saturating math intrinsics and the funnel shift intrinsics. The undef behavior follows what InstSimplify does for the general cas e of non-constant operands. For the bitcount intrinsics (where InstSimplify doesn't do undef handling -- there cannot be a combination of an undef + non-constant operand) I'm using a 0 result if the intrinsic is defined for zero and undef otherwise. Differential Revision: https://reviews.llvm.org/D55950 llvm-svn: 350971
*	[DA][NewPM] Add a printerpass and port the testsuite	Philip Pfaffe	2019-01-08	25	-0/+48
\| \| \| \| \| \| \| \| \|	The new-pm version of DA is untested. Testing requires a printer, so add that and use it in the existing DA tests. Differential Revision: https://reviews.llvm.org/D56386 llvm-svn: 350624
*	[ValueTracking] Adjust comment in test	Michael Ferguson	2019-01-07	1	-1/+2
\| \| \| \| \| \|	Adjusts a comment in this test to verify commit access. llvm-svn: 350569
*	[CostModel][X86] Fix SSE1 FADD/FSUB costs	Simon Pilgrim	2019-01-04	1	-2/+2
\| \| \| \| \| \| \| \|	Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4 Add the other costs so that we're not relying on the default "is legal/custom" cost logic. llvm-svn: 350403
*	Revert patches 348835 and 348571 because they're	Ranjeet Singh	2019-01-04	1	-27/+0
\| \| \| \| \| \|	causing code size performance regressions. llvm-svn: 350402
*	[CostModel][X86] Add SSE1 fp cost tests	Simon Pilgrim	2019-01-04	1	-40/+184
\| \| \| \|	llvm-svn: 350401
*	[ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset	Florian Hahn	2019-01-04	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395
*	[CostModel][X86] Add truncate cost tests to cover all legal destination types	Simon Pilgrim	2019-01-03	1	-9/+137
\| \| \| \| \| \|	We were only testing costs for legal source vector element counts llvm-svn: 350323
*	[X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)	Simon Pilgrim	2019-01-03	2	-144/+524
\| \| \| \| \| \|	Costs for real SSE2 instructions llvm-svn: 350295
*	[X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)	Simon Pilgrim	2019-01-03	2	-0/+510
\| \| \| \|	llvm-svn: 350293
*	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)	Hal Finkel	2019-01-02	3	-0/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220
*	[ConstantFolding] Consolidate and extend bitcount intrinsic tests; NFC	Nikita Popov	2018-12-20	1	-0/+187
\| \| \| \| \| \| \|	Move constant folding tests into ConstantFolding/bitcount.ll and drop various tests in other places. Add coverage for undefs. llvm-svn: 349806
*	[ConstantFolding] Add tests for funnel shifts with undef operands; NFC	Nikita Popov	2018-12-20	1	-0/+167
\| \| \| \|	llvm-svn: 349803
*	[ConstantFolding] Add tests for sat add/sub with undefs; NFC	Nikita Popov	2018-12-20	1	-0/+218
\| \| \| \|	llvm-svn: 349802
*	[ConstantFolding] Split up saturating add/sub tests; NFC	Nikita Popov	2018-12-20	1	-97/+158
\| \| \| \| \| \|	Split each test into a separate function. llvm-svn: 349801
*	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.	Michael Kruse	2018-12-20	2	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
*	[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise ↵	Craig Topper	2018-12-13	1	-51/+31
\| \| \| \| \| \| \| \| \| \| \| \|	arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision: https://reviews.llvm.org/D55615 llvm-svn: 349072
*	Cleanup test case by removing unused attribute dso_local	Ranjeet Singh	2018-12-11	1	-4/+4
\| \| \| \| \| \| \| \| \|	Attribute 'dso_local' generated in bitcode from compiling original C file but isn't needed. Differential Revision: https://reviews.llvm.org/D55521 llvm-svn: 348835
*	[CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max ↵	Craig Topper	2018-12-10	9	-668/+668
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reduction. Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register? Reviewers: RKSimon, spatel, ABataev Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55480 llvm-svn: 348739
*	[CostModel][X86] Fix overcounting arithmetic cost in illegal types in ↵	Craig Topper	2018-12-07	19	-1021/+965
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision: https://reviews.llvm.org/D55397 llvm-svn: 348621
*	Reapply "[DemandedBits][BDCE] Support vectors of integers"	Nikita Popov	2018-12-07	1	-0/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602
*	[IR] Don't assume all functions are 4 byte aligned	Ranjeet Singh	2018-12-07	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here https://reviews.llvm.org/D46110 This was approved previously https://reviews.llvm.org/D55115 (r348215) but when committed it caused failures on the sanitizer buildbots when building llvm with clang (containing this patch). This is now fixed because I've added a check to see if getting the parent module returns null if it does then set the alignment to 0. Differential Revision: https://reviews.llvm.org/D55115 llvm-svn: 348571
*	Revert "[DemandedBits][BDCE] Support vectors of integers"	Nikita Popov	2018-12-07	1	-136/+0
\| \| \| \| \| \| \|	This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558
*	[DemandedBits][BDCE] Support vectors of integers	Nikita Popov	2018-12-06	1	-0/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549
*	[X86] Remove -costmodel-reduxcost=true from the experimental vector ↵	Craig Topper	2018-12-05	18	-144/+144
\| \| \| \| \| \| \| \|	reduction intrinsic tests as it appears to be unnecessary. NFC I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here. llvm-svn: 348340
*	[X86] Add more cost model tests for vector reductions with narrow vector ↵	Craig Topper	2018-12-05	18	-0/+531
\| \| \| \| \| \|	types. NFC llvm-svn: 348339
*	Reverting r348215	Ranjeet Singh	2018-12-04	1	-27/+0
\| \| \| \| \| \|	Causing failures on ubsan buildbot boxes. llvm-svn: 348230
*	[IR] Don't assume all functions are 4 byte aligned	Ranjeet Singh	2018-12-04	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \|	In some cases different alignments for function might be used to save space e.g. thumb mode with -Oz will try to use 2 byte function alignment. Similar patch that fixed this in other areas exists here https://reviews.llvm.org/D46110 Differential Revision: https://reviews.llvm.org/D55115 llvm-svn: 348215
*	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.	Jonas Paulsson	2018-12-03	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141
*	[test] Fix ScalarEvolution test to allow __func__ with prototype	Michal Gorny	2018-12-02	1	-123/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix ScalarEvolution/solve-quadratic.ll test to account for __func__ output listing the complete function prototype rather than just its name, as it does on NetBSD. Example Linux output: GetQuadraticEquation: addrec coeff bw: 4 GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Example NetBSD output: llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): addrec coeff bw: 4 llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Differential Revision: https://reviews.llvm.org/D55162 llvm-svn: 348096
*	[TTI] Reduction costs only need to include a single extract element cost ↵	Simon Pilgrim	2018-12-01	20	-2050/+2050
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076
*	[X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in ↵	Craig Topper	2018-12-01	6	-6/+6
\| \| \| \| \| \|	interleave/strided load/store cost model tests. llvm-svn: 348056
*	[DA] GPUDivergenceAnalysis for unstructured GPU kernels	Nicolai Haehnle	2018-11-30	19	-0/+1215
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048
*	[SystemZ::TTI] i8/i16 operands extension costs revisited	Jonas Paulsson	2018-11-30	4	-44/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961
*	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵	Craig Topper	2018-11-28	1	-32/+15
\| \| \| \| \| \| \| \| \| \| \| \|	AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
*	[X86] Add some cost model entries for sext/zext for avx512bw	Craig Topper	2018-11-28	2	-22/+22
\| \| \| \| \| \| \| \| \| \|	This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785