summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* Group unsafe-math optimizations for fsub into one block. No functional change.Sanjay Patel2014-08-271-14/+17
| | | | llvm-svn: 216616
* [FastISel] Fix a potential bug in FastEmitInst_riJuergen Ributzka2014-08-271-2/+1
| | | | | | | | FastEmitInst_ri was constraining the first operand without checking if it is a virtual register. Use constrainOperandRegClass as all the other FastEmitInst_* functions. llvm-svn: 216613
* Use local variable to improve readability. Sanjay Patel2014-08-271-15/+10
| | | | | | No functional change intended. llvm-svn: 216611
* Teach the AArch64 backend about v4f16 and v8f16Oliver Stannard2014-08-271-6/+17
| | | | | | | | This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555
* [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.Chandler Carruth2014-08-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be *exactly* right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537
* Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵Craig Topper2014-08-272-9/+5
| | | | | | just letting them be implicitly created. llvm-svn: 216525
* Use range based for loops to avoid needing to re-mention SmallPtrSet size.Craig Topper2014-08-242-8/+6
| | | | llvm-svn: 216351
* Revert r215611 because it caused the infinite loop in bug 20736. There is a ↵Nick Lewycky2014-08-231-2/+0
| | | | | | reduced testcase in that bug. llvm-svn: 216307
* ARM / x86_64 varargs: Don't save regparms in prologue without va_startReid Kleckner2014-08-221-0/+8
| | | | | | | | | | | | There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294
* name change: isPow2DivCheap -> isPow2SDivCheapSanjay Patel2014-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237
* DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors ↵Benjamin Kramer2014-08-211-1/+6
| | | | | | | | with many arguments. PR20677 llvm-svn: 216175
* [ARM] Enable DP copy, load and store instructions for FPv4-SPOliver Stannard2014-08-212-17/+28
| | | | | | | | | | | | | | | | | The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172
* Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid ↵Craig Topper2014-08-212-4/+4
| | | | | | needing to mention the size. llvm-svn: 216158
* Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG ↵Jiangning Liu2014-08-212-48/+3
| | | | | | Builder and type". llvm-svn: 216147
* Fix null reference creation in SelectionDAG constructor.Alexey Samsonov2014-08-201-10/+7
| | | | | | | | | | Store TargetSelectionDAGInfo as a pointer instead of a reference: getSelectionDAGInfo() may not be implemented for certain backends (e.g. it's not currently implemented for R600). This bug is reported by UBSan. llvm-svn: 216129
* Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and typeJiangning Liu2014-08-202-3/+48
| | | | | | | | legalization stage. With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end. llvm-svn: 216066
* Reapply [FastISel] Let the target decide first if it wants to materialize a ↵Juergen Ributzka2014-08-191-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constant (215588). Note: This was originally reverted to track down a buildbot error. This commit exposed a latent bug that was fixed in r215753. Therefore it is reapplied without any modifications. I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new regeressions. Original commit message: This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 216006
* Teach the AArch64 backend to handle f16Oliver Stannard2014-08-181-0/+3
| | | | | | | This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv operations on f16 (half-precision) types by promoting to f32. llvm-svn: 215891
* Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to ↵Craig Topper2014-08-182-4/+4
| | | | | | | | avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870
* Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid ↵Craig Topper2014-08-172-4/+4
| | | | | | needing to mention the size. llvm-svn: 215868
* Fix fmul combines with constant splat vectorsMatt Arsenault2014-08-161-7/+26
| | | | | | Fixes things like fmul x, 2 -> fadd x, x llvm-svn: 215820
* [DAGCombiner] Improve the folding of target independet shuffles to Undef.Andrea Di Biagio2014-08-161-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | When combining a pair of shuffle nodes, check if the combined shuffle mask is trivially Undef. In case, immediately fold that pair of shuffles to Undef. The lack of checks for undef masks was the root-cause of a poor-codegen bug in the dag combiner. Example: %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6> %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3> Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire sequence to Undef value and therefore we generated: shufps $-123, %xmm1, $xmm0 pshufd $-46, %xmm0, %xmm0 With this patch, the entire shuffle sequence is folded to Undef and no shuffles are generated in the output assembly. Added new test cases to test 'combine-vec-shuffle-5.ll'. llvm-svn: 215797
* [FastISel] Remove an performance debugging assert.Juergen Ributzka2014-08-151-1/+0
| | | | | | | | As Jim pointed out this assert isn't really needed to test for correctness, because the code right afterwards does the same check and falls-back to SelectionDAG - as intended. llvm-svn: 215735
* Revert several FastISel commits to track down a buildbot error.Juergen Ributzka2014-08-141-21/+15
| | | | | | | | | | | | This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
* optimize vector fneg of bitcasted integer valueSanjay Patel2014-08-141-9/+14
| | | | | | | | | | This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
* [SDAG] Fix a bug in the DAG combiner where we would fail to return theChandler Carruth2014-08-141-5/+1
| | | | | | | | | | | | | | | | | | | | | | | input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being *completely confusing* and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return *that exact node*.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine *just right*. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
* [SDAG] Fix a case where we would iteratively legalize a node duringChandler Carruth2014-08-141-0/+2
| | | | | | | | | | | | | | | | | | combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611
* [FastISel] Let the target decide first if it wants to materialize a constant.Juergen Ributzka2014-08-131-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
* Canonicalize header guards into a common format.Benjamin Kramer2014-08-135-10/+10
| | | | | | | | | | Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
* [DAGCombiner] Improved target independent vector shuffle combine rule.Andrea Di Biagio2014-08-131-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
* [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsicHal Finkel2014-08-132-3/+5
| | | | | | | | | | | | | | | | | This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512
* [x86] Fold extract_vector_elt of a load into the Load's address computation.Michael J. Spencer2014-08-111-90/+125
| | | | llvm-svn: 215409
* Make this SmallVector size a power of two as suggested by ChandlerHans Wennborg2014-08-111-1/+1
| | | | llvm-svn: 215355
* Increase the size of this SmallVector in CloneNodeWithValues.Hans Wennborg2014-08-111-1/+1
| | | | | | In a Clang bootstrap, the size of this vector was always 6. llvm-svn: 215335
* Add support for scalarizing cttz_zero_undefPetar Jovanovic2014-08-101-0/+1
| | | | | | | | | Follow up to r214266. Add missing case in ScalarizeVectorResult() for cttz_zero_undef. Differential Revision: http://reviews.llvm.org/D4813 llvm-svn: 215330
* Added a TLI hook to signal that the target does not have or does not care aboutPedro Artigas2014-08-081-5/+10
| | | | | | | | floating point exceptions, added use of flag to fold potentially exception raising floating point math in selection DAG. No functionality change, as targets have to explicitly ask for this behavior and none does today. llvm-svn: 215222
* [pr19635] Revert most of r170537, and add new testcase.Patrik Hagglund2014-08-081-1/+1
| | | | | | | | Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190
* [stack protector] Look through bitcasts to get global variableAkira Hatanaka2014-08-071-9/+19
| | | | | | | | | | | | | | | | __stack_chk_guard. Handle the case where the pointer operand of the load instruction that loads the stack guard is not a global variable but instead a bitcast. %StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**) call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot) Original test case provided by Ana Pazos. This fixes PR20558. llvm-svn: 215167
* Temporarily Revert "Nuke the old JIT." as it's not quite ready toEric Christopher2014-08-071-2/+3
| | | | | | | | | | | be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
* Nuke the old JIT.Rafael Espindola2014-08-071-3/+2
| | | | | | | | | I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
* Optimize vector fabs of bitcasted constant integer values.Sanjay Patel2014-08-051-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-055-14/+8
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* [SDAG] Fix a really, really terrible bug in the DAG combiner.Chandler Carruth2014-08-041-2/+2
| | | | | | | | | | | | | | | This code is completely wrong. It is also dead, as if it were to *ever* run, it would crash. Fortunately, after my work to the combiner, it is at least *possible* to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-0414-164/+228
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [x86] Don't add nodes to the combined set (and prune subsequentChandler Carruth2014-08-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | combines) until they are legal. Doing it the old way could, when the stars align *just* right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673
* fix for PR20354 - Miscompile of fabs due to vectorizationSanjay Patel2014-08-031-1/+5
| | | | | | | | | | This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
* [AArch64] Teach DAGCombiner that converting two consecutive loads into a ↵James Molloy2014-08-021-0/+7
| | | | | | | | vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634
* [SDAG] Refactor the code which deletes nodes in the DAG combiner to doChandler Carruth2014-08-021-54/+36
| | | | | | | | | | | | | so using a single helper which adds operands back onto the worklist. Several places didn't rigorously do this but a couple already did. Factoring them together and doing it rigorously is important to delete things recursively early on in the combiner and get a chance to see accurate hasOneUse values. While no existing test cases change, an upcoming patch to add DAG combining logic for PSHUFB requires this to work correctly. llvm-svn: 214623
* Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be ↵Owen Anderson2014-08-021-0/+12
| | | | | | | | | | | | constant-folded during DAGCombine in certain circumstances. Unfortunately, the circumstances required to trigger the issue seem to require a pretty specific interaction of DAGCombines, and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64. The functionality added here is replicated in essentially every other DAG combine, so it seems pretty obviously correct. llvm-svn: 214622
* CodeGen: Remove commented out codeJustin Bogner2014-08-021-2/+0
| | | | | | | These two lines have been commented out for over 4 years. They aren't helping anyone. llvm-svn: 214615
OpenPOWER on IntegriCloud