summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert patch ofr218493David Xu2014-09-261-14/+0
| | | | llvm-svn: 218494
* Redundant store instructions should be removed as dead codeDavid Xu2014-09-261-0/+14
| | | | llvm-svn: 218493
* Use SDValue bool operator to reduce code. No functional change.Sanjay Patel2014-09-231-9/+6
| | | | llvm-svn: 218314
* Refactor reciprocal square root estimate into target-independent function; NFC.Sanjay Patel2014-09-211-17/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
* Optionally enable more-aggressive FMA formation in DAGCombineHal Finkel2014-09-191-5/+10
| | | | | | | | | | | | | | | | | The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
* Replace dead links to "Hacker's Delight" with general references. NFC.Sanjay Patel2014-09-151-4/+4
| | | | llvm-svn: 217814
* Add DAG combine for shl + add of constants.Matt Arsenault2014-09-111-32/+12
| | | | | | | | | | | | | | Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610
* Combine fmul vector FP constants when unsafe math is allowed.Sanjay Patel2014-09-111-6/+22
| | | | | | | | | | | | | | | | | | | This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599
* Build correct vector filled with undef nodesDavid Xu2014-09-111-4/+20
| | | | llvm-svn: 217570
* Group unsafe fmul math folds together for easier reading. No functional change.Sanjay Patel2014-09-081-6/+10
| | | | llvm-svn: 217399
* Fix the FIXME that was just added in r217390 - remove a bunch of redundant ↵Sanjay Patel2014-09-081-43/+2
| | | | | | | | fold permutations. The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll. llvm-svn: 217393
* group unsafe math folds together for easier readingSanjay Patel2014-09-081-150/+142
| | | | | | Also added a FIXME regarding redundant folds for non-canonicalized constants. llvm-svn: 217390
* Allow vector fsub ops with constants to get the same optimizations as scalars.Sanjay Patel2014-09-051-2/+2
| | | | | | | | This problem is bigger than just fsub, but this is the minimum fix to solve fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve zero subtraction with the same change. llvm-svn: 217286
* clean up; NFCSanjay Patel2014-09-051-2/+2
| | | | llvm-svn: 217278
* Fix interference caused by fmul 2, x -> fadd x, xMatt Arsenault2014-09-021-8/+21
| | | | | | | | If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932
* Fix comment and unnecessary check for FP build_vectors.Matt Arsenault2014-09-021-5/+1
| | | | | | | This was copy-paste from the integer version, but FP build_vectors don't truncate. llvm-svn: 216928
* Enable splitting indexing from loads with TargetConstantsHal Finkel2014-09-021-8/+21
| | | | | | | | | | | | When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant offsets, as there is no guarantee that a backend can handle them on generic ADDs (even if it generates them during address-mode matching) -- and, specifically, applying this transformation directly with TargetConstants caused a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less than ideal. Instead, for non-opaque constants, we can convert them into regular constants for use with the generated ADD (or SUB). llvm-svn: 216908
* Revert "Revert '[DAGCombiner] Split up an indexed load if only the base ↵Hal Finkel2014-09-021-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pointer value is live'" I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The underlying cause of the failure is that pre-inc loads with increments represented by ISD::TargetConstants were being transformed into ISD:::ADDs with ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they were selected as invalid r+r adds. This recommits r208640, rebased and with an exclusion for ISD::TargetConstant increments. This behavior seems correct, although in the future we might want to ask the target to split out the indexing that uses ISD::TargetConstants. Unfortunately, I don't yet have small test case where the relevant invalid 'add' instruction is not itself dead (and thus eliminated by DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things) Original commit message (by Adam Nemet): Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 216898
* Move FNEG next to FABS and make them more similar, so it's easier that they ↵Sanjay Patel2014-08-281-43/+46
| | | | | | can be refactored. NFC. llvm-svn: 216688
* Do not introduce new shuffle patterns after operation legalization if ↵Owen Anderson2014-08-281-2/+1
| | | | | | | | | SHUFFLE_VECTOR was marked custom. The target independent DAG combine has no way to know if the shuffles it is introducing are ones that the target could support or not. llvm-svn: 216678
* Janitorial services: "Don’t duplicate function or class name at the ↵Sanjay Patel2014-08-281-134/+119
| | | | | | beginning of the comment." llvm-svn: 216674
* Remove local TLI vars that are just duplicates of the class var. No ↵Sanjay Patel2014-08-281-2/+0
| | | | | | functional change. llvm-svn: 216673
* Use local vars to improve readability. No functional change.Sanjay Patel2014-08-281-42/+37
| | | | | | | Completes what was started in r216611 and r216623. Used const refs instead of pointers; not sure if one is preferable to the other. llvm-svn: 216672
* Use local variable in visitFADD. No functional change.Sanjay Patel2014-08-271-13/+11
| | | | llvm-svn: 216623
* Group unsafe-math optimizations for fsub into one block. No functional change.Sanjay Patel2014-08-271-14/+17
| | | | llvm-svn: 216616
* Use local variable to improve readability. Sanjay Patel2014-08-271-15/+10
| | | | | | No functional change intended. llvm-svn: 216611
* Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵Craig Topper2014-08-271-8/+4
| | | | | | just letting them be implicitly created. llvm-svn: 216525
* Use range based for loops to avoid needing to re-mention SmallPtrSet size.Craig Topper2014-08-241-4/+3
| | | | llvm-svn: 216351
* name change: isPow2DivCheap -> isPow2SDivCheapSanjay Patel2014-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237
* DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors ↵Benjamin Kramer2014-08-211-1/+6
| | | | | | | | with many arguments. PR20677 llvm-svn: 216175
* Fix fmul combines with constant splat vectorsMatt Arsenault2014-08-161-7/+26
| | | | | | Fixes things like fmul x, 2 -> fadd x, x llvm-svn: 215820
* [DAGCombiner] Improve the folding of target independet shuffles to Undef.Andrea Di Biagio2014-08-161-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | When combining a pair of shuffle nodes, check if the combined shuffle mask is trivially Undef. In case, immediately fold that pair of shuffles to Undef. The lack of checks for undef masks was the root-cause of a poor-codegen bug in the dag combiner. Example: %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6> %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3> Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire sequence to Undef value and therefore we generated: shufps $-123, %xmm1, $xmm0 pshufd $-46, %xmm0, %xmm0 With this patch, the entire shuffle sequence is folded to Undef and no shuffles are generated in the output assembly. Added new test cases to test 'combine-vec-shuffle-5.ll'. llvm-svn: 215797
* optimize vector fneg of bitcasted integer valueSanjay Patel2014-08-141-9/+14
| | | | | | | | | | This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
* [SDAG] Fix a bug in the DAG combiner where we would fail to return theChandler Carruth2014-08-141-5/+1
| | | | | | | | | | | | | | | | | | | | | | | input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being *completely confusing* and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return *that exact node*.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine *just right*. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
* [DAGCombiner] Improved target independent vector shuffle combine rule.Andrea Di Biagio2014-08-131-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
* [x86] Fold extract_vector_elt of a load into the Load's address computation.Michael J. Spencer2014-08-111-90/+125
| | | | llvm-svn: 215409
* Optimize vector fabs of bitcasted constant integer values.Sanjay Patel2014-08-051-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
* [SDAG] Fix a really, really terrible bug in the DAG combiner.Chandler Carruth2014-08-041-2/+2
| | | | | | | | | | | | | | | This code is completely wrong. It is also dead, as if it were to *ever* run, it would crash. Fortunately, after my work to the combiner, it is at least *possible* to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-3/+10
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [x86] Don't add nodes to the combined set (and prune subsequentChandler Carruth2014-08-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | combines) until they are legal. Doing it the old way could, when the stars align *just* right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673
* fix for PR20354 - Miscompile of fabs due to vectorizationSanjay Patel2014-08-031-1/+5
| | | | | | | | | | This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
* [AArch64] Teach DAGCombiner that converting two consecutive loads into a ↵James Molloy2014-08-021-0/+7
| | | | | | | | vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634
* [SDAG] Refactor the code which deletes nodes in the DAG combiner to doChandler Carruth2014-08-021-54/+36
| | | | | | | | | | | | | so using a single helper which adds operands back onto the worklist. Several places didn't rigorously do this but a couple already did. Factoring them together and doing it rigorously is important to delete things recursively early on in the combiner and get a chance to see accurate hasOneUse values. While no existing test cases change, an upcoming patch to add DAG combining logic for PSHUFB requires this to work correctly. llvm-svn: 214623
* Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be ↵Owen Anderson2014-08-021-0/+12
| | | | | | | | | | | | constant-folded during DAGCombine in certain circumstances. Unfortunately, the circumstances required to trigger the issue seem to require a pretty specific interaction of DAGCombines, and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64. The functionality added here is replicated in essentially every other DAG combine, so it seems pretty obviously correct. llvm-svn: 214622
* White space fix.Louis Gerbarg2014-07-311-1/+1
| | | | llvm-svn: 214455
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-7/+8
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* Retain alignment requirements for load->selects modified by DAGCombineLouis Gerbarg2014-07-301-2/+6
| | | | | | | | | | | | | | | | | DAGCombine may choose to rewrite graphs where two loads feed a select into graphs where a select of two addresses feed a load. While it sanity checks the loads to make sure they are broadly equivalent it currently just uses the alignment restriction of the left node. In cases where the right node has stronger alignment requiresment this may lead to bad codegen, such as generating an aligned load where an unaligned load is required. This patch makes the combine generate a load with an alignment that is the same as whichever is more restrictive of the two alignments. Tests included. rdar://17762530 llvm-svn: 214322
* [SDAG] Add DEBUG logging to the legalizer, fixing a "bug" found byChandler Carruth2014-07-281-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | inspection in the proccess, and shuffle the logging in the DAG combiner around a bit. With this it is much easier to follow what the legalizer is doing. It should even accurately present most of the strange legalization operations where a single node is replaced by multiple nodes, etc. There is still some information lost (we log SDNodes not SDValues so we don't log which result is used for which thing), but I think this is much closer to a usable system. Notably, this will make it *much* more apparant when legalization is actually happening inside the combiner, or when there is a cycle caused by interactions of the legalizer and the combiner. The "bug" I fixed here I'm not sure is remotely possible to trigger. We were only adding one of the nodes in a replacement to the updated set rather than all of the nodes in the replacement. Realistically, the worst result of this are nodes not getting back onto the worklist in the DAG combiner. I doubt it is possible to trigger this today, and I certainly don't have any ideas about how, but this at least brings the code into alignment with the principled operation of the routine. llvm-svn: 214105
* [SDAG] When performing post-legalize DAG combining, run the legalizerChandler Carruth2014-07-261-4/+14
| | | | | | | | | | | | | | | | | | | | | | over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020
* Store nodes only have 1 result.Matt Arsenault2014-07-251-1/+1
| | | | llvm-svn: 213928
OpenPOWER on IntegriCloud