summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* Refactor reciprocal square root estimate into target-independent function; NFC.Sanjay Patel2014-09-211-17/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
* Fix crash with an insertvalue that produces an empty object.Peter Collingbourne2014-09-201-0/+6
| | | | llvm-svn: 218171
* Optionally enable more-aggressive FMA formation in DAGCombineHal Finkel2014-09-191-5/+10
| | | | | | | | | | | | | | | | | The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
* Optimize sext/zext insertion algorithm in back-end.Jiangning Liu2014-09-193-8/+61
| | | | | | | | With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. llvm-svn: 218101
* Replace repeated null checks with an assert. NFC.Sanjay Patel2014-09-151-18/+14
| | | | | | | Without a vector to hold the created ops, these functions don't have any use. llvm-svn: 217831
* [FastISel] Move optimizeCmpPredicate to FastISel base class. NFC.Juergen Ributzka2014-09-151-0/+40
| | | | | | Make the optimizeCmpPredicate function available to all targets. llvm-svn: 217822
* Replace dead links to "Hacker's Delight" with general references. NFC.Sanjay Patel2014-09-153-10/+10
| | | | llvm-svn: 217814
* Allow targets to custom legalize vector insertion and extraction.Owen Anderson2014-09-121-0/+8
| | | | llvm-svn: 217711
* Legalizer: Use the scalar bit width when promoting bit counting instrs onBenjamin Kramer2014-09-121-5/+6
| | | | | | | | | vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671
* Add DAG combine for shl + add of constants.Matt Arsenault2014-09-111-32/+12
| | | | | | | | | | | | | | Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610
* Combine fmul vector FP constants when unsafe math is allowed.Sanjay Patel2014-09-111-6/+22
| | | | | | | | | | | | | | | | | | | This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599
* Build correct vector filled with undef nodesDavid Xu2014-09-111-4/+20
| | | | llvm-svn: 217570
* Fast-ISel: Remove dead code after falling back from selecting call ↵Hans Wennborg2014-09-081-15/+10
| | | | | | | | | | | | | | | | | instructions (PR20863) Previously, fast-isel would not clean up after failing to select a call instruction, because it would have called flushLocalValueMap() which moves the insertion point, making SavedInsertPt in selectInstruction() invalid. Fixing this by making SavedInsertPt a member variable, and having flushLocalValueMap() update it. This removes some redundant code at -O0, and more importantly fixes PR20863. Differential Revision: http://reviews.llvm.org/D5249 llvm-svn: 217401
* Group unsafe fmul math folds together for easier reading. No functional change.Sanjay Patel2014-09-081-6/+10
| | | | llvm-svn: 217399
* Fix the FIXME that was just added in r217390 - remove a bunch of redundant ↵Sanjay Patel2014-09-081-43/+2
| | | | | | | | fold permutations. The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll. llvm-svn: 217393
* group unsafe math folds together for easier readingSanjay Patel2014-09-081-150/+142
| | | | | | Also added a FIXME regarding redundant folds for non-canonicalized constants. llvm-svn: 217390
* Allow vector fsub ops with constants to get the same optimizations as scalars.Sanjay Patel2014-09-051-2/+2
| | | | | | | | This problem is bigger than just fsub, but this is the minimum fix to solve fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve zero subtraction with the same change. llvm-svn: 217286
* clean up; NFCSanjay Patel2014-09-051-2/+2
| | | | llvm-svn: 217278
* [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.Juergen Ributzka2014-09-031-50/+50
| | | | | | | | | | This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075
* [FastISel] Rename public visible FastISel functions. NFC.Juergen Ributzka2014-09-032-46/+44
| | | | | | | | | | | | | | | | | | | | | This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074
* Remove resetSubtargetFeatures as it is unused.Eric Christopher2014-09-031-3/+0
| | | | llvm-svn: 217071
* [FastISel] Some long overdue spring cleaning of FastISel.Juergen Ributzka2014-09-031-378/+333
| | | | | | | | | | | | | Things got a little bit messy over the years and it is time for a little bit spring cleaning. This first commit is focused on the FastISel base class itself. It doxyfies all comments, C++11fies the code where it makes sense, renames internal methods to adhere to the coding standard, and clang-formats the files. Reviewed by Eric llvm-svn: 217060
* Reinstate "Nuke the old JIT."Eric Christopher2014-09-021-3/+2
| | | | | | | | Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
* [FastISel] Provide the option to skip target-independent instruction ↵Juergen Ributzka2014-09-021-18/+24
| | | | | | | | | | | | | selection. NFC. This allows the target to disable target-independent instruction selection and jump directly into the target-dependent instruction selection code. This can be beneficial for targets, such as AArch64, which could emit much better code, but never got a chance to do so, because the target-independent instruction selector was able to find an instruction sequence. llvm-svn: 216947
* Fix interference caused by fmul 2, x -> fadd x, xMatt Arsenault2014-09-021-8/+21
| | | | | | | | If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932
* CodeGen: Handle va_start in the entry blockReid Kleckner2014-09-021-24/+16
| | | | | | | | | Also fix a small copy-paste bug in X86ISelLowering where Chain should have been used in place of DAG.getEntryToken(). Fixes PR20828. llvm-svn: 216929
* Fix comment and unnecessary check for FP build_vectors.Matt Arsenault2014-09-021-5/+1
| | | | | | | This was copy-paste from the integer version, but FP build_vectors don't truncate. llvm-svn: 216928
* Change MCSchedModel to be a struct of statically initialized data.Pete Cooper2014-09-021-1/+1
| | | | | | | | This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour Reviewed by Andy Trick and Chandler C llvm-svn: 216919
* Enable splitting indexing from loads with TargetConstantsHal Finkel2014-09-021-8/+21
| | | | | | | | | | | | When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant offsets, as there is no guarantee that a backend can handle them on generic ADDs (even if it generates them during address-mode matching) -- and, specifically, applying this transformation directly with TargetConstants caused a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less than ideal. Instead, for non-opaque constants, we can convert them into regular constants for use with the generated ADD (or SUB). llvm-svn: 216908
* Revert "Revert '[DAGCombiner] Split up an indexed load if only the base ↵Hal Finkel2014-09-021-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pointer value is live'" I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The underlying cause of the failure is that pre-inc loads with increments represented by ISD::TargetConstants were being transformed into ISD:::ADDs with ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they were selected as invalid r+r adds. This recommits r208640, rebased and with an exclusion for ISD::TargetConstant increments. This behavior seems correct, although in the future we might want to ask the target to split out the indexing that uses ISD::TargetConstants. Unfortunately, I don't yet have small test case where the relevant invalid 'add' instruction is not itself dead (and thus eliminated by DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things) Original commit message (by Adam Nemet): Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 216898
* Add a const and munge some commentsReid Kleckner2014-08-291-1/+1
| | | | llvm-svn: 216781
* musttail: Forward regparms of variadic functions on x86_64Reid Kleckner2014-08-291-0/+7
| | | | | | | | | | | | | | | | | | | | | | Summary: If a variadic function body contains a musttail call, then we copy all of the remaining register parameters into virtual registers in the function prologue. We track the virtual registers through the function body, and add them as additional registers to pass to the call. Because this is all done in virtual registers, the register allocator usually gives us good code. If the function does a call, however, it will have to spill and reload all argument registers (ew). Forwarding regparms on x86_32 is not implemented because most compilers don't support varargs in 32-bit with regparms. Reviewers: majnemer Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5060 llvm-svn: 216780
* Do not assume the value passed to memset is an i32.Job Noorman2014-08-291-8/+1
| | | | | | | | | | | | | The code in SelectionDAG::getMemset for some reason assumes the value passed to memset is an i32. This breaks the generated code for targets that only have registers smaller than 32 bits because the value might get split into multiple registers by the calling convention. See the test for the MSP430 target included in the patch for an example. This patch ensures that nothing is assumed about the type of the value. Instead, the type is taken from the selected overload of the llvm.memset intrinsic. llvm-svn: 216716
* Move FNEG next to FABS and make them more similar, so it's easier that they ↵Sanjay Patel2014-08-281-43/+46
| | | | | | can be refactored. NFC. llvm-svn: 216688
* Do not introduce new shuffle patterns after operation legalization if ↵Owen Anderson2014-08-281-2/+1
| | | | | | | | | SHUFFLE_VECTOR was marked custom. The target independent DAG combine has no way to know if the shuffles it is introducing are ones that the target could support or not. llvm-svn: 216678
* Janitorial services: "Don’t duplicate function or class name at the ↵Sanjay Patel2014-08-281-134/+119
| | | | | | beginning of the comment." llvm-svn: 216674
* Remove local TLI vars that are just duplicates of the class var. No ↵Sanjay Patel2014-08-281-2/+0
| | | | | | functional change. llvm-svn: 216673
* Use local vars to improve readability. No functional change.Sanjay Patel2014-08-281-42/+37
| | | | | | | Completes what was started in r216611 and r216623. Used const refs instead of pointers; not sure if one is preferable to the other. llvm-svn: 216672
* [FastISel] Undo phi node updates when falling-back to SelectionDAG.Juergen Ributzka2014-08-281-4/+7
| | | | | | | | | | | | | | | | | | | | The included test case would fail, because the MI PHI node would have two operands from the same predecessor. This problem occurs when a switch instruction couldn't be selected. This happens always, because there is no default switch support for FastISel to begin with. The problem was that FastISel would first add the operand to the PHI nodes and then fall-back to SelectionDAG, which would then in turn add the same operands to the PHI nodes again. This fix removes these duplicate PHI node operands by reseting the PHINodesToUpdate to its original state before FastISel tried to select the instruction. This fixes <rdar://problem/18155224>. llvm-svn: 216640
* [FastISel]Juergen Ributzka2014-08-281-1/+8
| | | | | | | | | | | | | | | | | | | | Currently instructions are folded very aggressively for AArch64 into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. This fix teaches hasTrivialKill to not only check the LLVM IR that the value has a single use, but also to check if the register that represents that value has already been used. This can happen when the instruction with the use was folded into another instruction (in this particular case a load instruction). This fixes rdar://problem/18142857. llvm-svn: 216634
* Use local variable in visitFADD. No functional change.Sanjay Patel2014-08-271-13/+11
| | | | llvm-svn: 216623
* Group unsafe-math optimizations for fsub into one block. No functional change.Sanjay Patel2014-08-271-14/+17
| | | | llvm-svn: 216616
* [FastISel] Fix a potential bug in FastEmitInst_riJuergen Ributzka2014-08-271-2/+1
| | | | | | | | FastEmitInst_ri was constraining the first operand without checking if it is a virtual register. Use constrainOperandRegClass as all the other FastEmitInst_* functions. llvm-svn: 216613
* Use local variable to improve readability. Sanjay Patel2014-08-271-15/+10
| | | | | | No functional change intended. llvm-svn: 216611
* Teach the AArch64 backend about v4f16 and v8f16Oliver Stannard2014-08-271-6/+17
| | | | | | | | This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555
* [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.Chandler Carruth2014-08-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be *exactly* right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537
* Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵Craig Topper2014-08-272-9/+5
| | | | | | just letting them be implicitly created. llvm-svn: 216525
* Use range based for loops to avoid needing to re-mention SmallPtrSet size.Craig Topper2014-08-242-8/+6
| | | | llvm-svn: 216351
* Revert r215611 because it caused the infinite loop in bug 20736. There is a ↵Nick Lewycky2014-08-231-2/+0
| | | | | | reduced testcase in that bug. llvm-svn: 216307
* ARM / x86_64 varargs: Don't save regparms in prologue without va_startReid Kleckner2014-08-221-0/+8
| | | | | | | | | | | | There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294
OpenPOWER on IntegriCloud