bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombiner] don't try to extract a fraction of a vector binop and crash ↵	Sanjay Patel	2018-12-05	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	(PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383
*	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)	Simon Pilgrim	2018-12-05	1	-0/+36
\| \| \| \| \| \| \| \| \| \|	This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
*	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes	Simon Pilgrim	2018-12-04	1	-0/+6
\| \| \| \| \| \| \| \|	Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246
*	[DAGCombiner] narrow truncated vector binops when legal	Sanjay Patel	2018-12-03	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195
*	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb ↵	Craig Topper	2018-12-03	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158
*	[DAGCombiner] guard against an oversized shift crash	Sanjay Patel	2018-12-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
*	[DAGCombiner] narrow truncated binops	Sanjay Patel	2018-11-29	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917
*	[x86] limit transform for select-of-fp-constants	Sanjay Patel	2018-11-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526
*	[SelectionDAG] move constant or splat functions to common location	Sanjay Patel	2018-11-25	1	-37/+12
\| \| \| \| \| \| \| \|	rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523
*	[DAG] consolidate shift simplifications	Sanjay Patel	2018-11-23	1	-67/+24
\| \| \| \| \| \| \| \| \| \|	...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502
*	[DAGCombiner] form 'not' ops ahead of shifts (PR39657)	Sanjay Patel	2018-11-22	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
*	[DAGCombiner] refactor select-of-FP-constants transform	Sanjay Patel	2018-11-21	1	-53/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424
*	[DAGCombiner] reduce code duplication; NFC	Sanjay Patel	2018-11-21	1	-33/+30
\| \| \| \|	llvm-svn: 347410
*	[DAGCombiner] look through bitcasts when trying to narrow vector binops	Sanjay Patel	2018-11-20	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356
*	[DAGCombine] Add calls to SimplifyDemandedVectorElts from ↵	Simon Pilgrim	2018-11-20	1	-0/+4
\| \| \| \| \| \| \| \|	visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313
*	[DAGCombiner] reduce code duplication in visitXOR; NFC	Sanjay Patel	2018-11-20	1	-32/+29
\| \| \| \|	llvm-svn: 347278
*	[DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI ↵	Simon Pilgrim	2018-11-19	1	-8/+6
\| \| \| \| \| \| \| \| \| \|	operands (PR21207) Consistently use (!LegalOperations \|\| isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255
*	[SelectionDAG] simplify vector select with undef operand(s)	Sanjay Patel	2018-11-19	1	-3/+2
\| \| \| \|	llvm-svn: 347227
*	[SelectionDAG] add simplifySelect() to reduce code duplication; NFC	Sanjay Patel	2018-11-19	1	-18/+2
\| \| \| \| \| \|	This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210
*	[DAG] add undef simplifications for select nodes	Sanjay Patel	2018-11-18	1	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170
*	DAG combiner: fold (select, C, X, undef) -> X	Stanislav Mekhanoshin	2018-11-16	1	-0/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
*	[DAGCombine] Fix non-deterministic debug output	Sam Parker	2018-11-16	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037
*	[DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops	Craig Topper	2018-11-13	1	-14/+7
\| \| \| \| \| \| \| \| \| \|	It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728
*	[DAGCombiner] Fix load-store forwarding of indexed loads.	Nirav Dave	2018-11-12	1	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654
*	[DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an ↵	Craig Topper	2018-11-10	1	-14/+14
\| \| \| \| \| \| \| \|	SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600
*	[x86] allow vector load narrowing with multi-use values	Sanjay Patel	2018-11-10	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
*	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between ↵	Craig Topper	2018-11-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530
*	[SelectionDAG] swap select_cc operands to enable folding	Alexandros Lamprineas	2018-11-09	1	-32/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484
*	[DAGCombine] Improve alias analysis for chain of independent stores.	Nirav Dave	2018-11-08	1	-59/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432
*	[DAGCombiner] Use tryFoldToZero to simplify some code and make it work ↵	Craig Topper	2018-11-05	1	-8/+2
\| \| \| \| \| \| \| \|	correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119
*	[DAGCombiner] Remove an unused argument from tryFoldToZero. NFC	Craig Topper	2018-11-05	1	-4/+3
\| \| \| \|	llvm-svn: 346118
*	[DAGCombiner] Remove 'else' after return. NFC	Craig Topper	2018-11-04	1	-8/+7
\| \| \| \| \| \|	This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090
*	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG ↵	Craig Topper	2018-11-04	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087
*	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output ↵	Craig Topper	2018-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043
*	[DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the ↵	Simon Pilgrim	2018-11-02	1	-75/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964
*	[DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for ↵	Craig Topper	2018-11-01	1	-16/+13
\| \| \| \| \| \| \| \|	vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908
*	[DAGCombiner] make sure we have a whole-number extract before trying to ↵	Sanjay Patel	2018-11-01	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842
*	[DAGCombiner] Fold 0 div/rem X to 0	David Bolvansky	2018-10-31	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721
*	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad	Bjorn Pettersson	2018-10-30	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636
*	[DAGCombiner] narrow vector binops when extraction is cheap	Sanjay Patel	2018-10-30	1	-11/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602
*	[DAGCombiner] Improve X div/rem Y fold if single bit element type	David Bolvansky	2018-10-30	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575
*	[DAGCombiner] Better constant vector support for FCOPYSIGN.	Craig Topper	2018-10-28	1	-4/+4
\| \| \| \| \| \| \| \|	Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469
*	[DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC	Sanjay Patel	2018-10-26	1	-22/+24
\| \| \| \| \| \| \|	We can extend this code to handle many more cases if an extract is cheap, so prepping for that change. llvm-svn: 345430
*	[NFC] Rename minnan and maxnan to minimum and maximum	Thomas Lively	2018-10-24	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218
*	[SelectionDAG] DAG combiner for fminnan and fmaxnan	Thomas Lively	2018-10-24	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Depends on D52765. Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52768 llvm-svn: 345210
*	[DAG] check more operands for cycles when merging stores.	Tim Northover	2018-10-24	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, we've only checked whether merging stores would cause a cycle via the value argument, but the address and indexed offset arguments are also capable of creating cycles in some situations. The addresses are all base+offset with notionally the same base, but the base SDNode may still be different (e.g. via an indexed load in one case, and an ISD::ADD elsewhere). This allows cycles to creep in if one of these sources depends on another. The indexed offset is usually undef (representing a non-indexed store), but on some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value, again allowing dependency cycles to creep in. llvm-svn: 345200
*	SelectionDAG: Reuse bigger sized constants in memset expansion.	Matthias Braun	2018-10-23	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102
*	Recommit r344877 "[X86] Stop promoting integer loads to vXi64"	Craig Topper	2018-10-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
*	DAG: Change behavior of fminnum/fmaxnum nodes	Matt Arsenault	2018-10-22	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
*	[DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)	Sanjay Patel	2018-10-21	1	-4/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872