summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* fix a buggy check that accidentally disabled this xformChris Lattner2006-10-151-1/+1
| | | | llvm-svn: 30967
* Replace custom dispatch code with two uses of InstVisitor. ImprovesNick Lewycky2006-10-121-93/+113
| | | | | | compile-time performance. llvm-svn: 30896
* Implement SROA of unions with mixed pointers/integers in them. This implementsChris Lattner2006-10-081-10/+16
| | | | | | PR892 and Transforms/ScalarRepl/union-pointer.ll:test2 llvm-svn: 30825
* Implement Transforms/ScalarRepl/union-pointer.ll:testChris Lattner2006-10-081-9/+13
| | | | llvm-svn: 30823
* add a new SimplifyDemandedVectorElts method, which works similarly toChris Lattner2006-10-051-8/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SimplifyDemandedBits. The idea is that some operations can be simplified if not all of the computed elements are needed. Some targets (like x86) have a large number of intrinsics that operate on a single element, but pass other elts through unmodified. If those other elements are not needed, the intrinsics can be simplified to scalar operations, and insertelement ops can be removed. This turns (f.e.): ushort %Convert_sse(float %f) { %tmp = insertelement <4 x float> undef, float %f, uint 0 ; <<4 x float>> [#uses=1] %tmp10 = insertelement <4 x float> %tmp, float 0.000000e+00, uint 1 ; <<4 x float>> [#uses=1] %tmp11 = insertelement <4 x float> %tmp10, float 0.000000e+00, uint 2 ; <<4 x float>> [#uses=1] %tmp12 = insertelement <4 x float> %tmp11, float 0.000000e+00, uint 3 ; <<4 x float>> [#uses=1] %tmp28 = tail call <4 x float> %llvm.x86.sse.sub.ss( <4 x float> %tmp12, <4 x float> < float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1] %tmp37 = tail call <4 x float> %llvm.x86.sse.mul.ss( <4 x float> %tmp28, <4 x float> < float 5.000000e-01, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1] %tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp37, <4 x float> < float 6.553500e+04, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00 > ) ; <<4 x float>> [#uses=1] %tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> zeroinitializer ) ; <<4 x float>> [#uses=1] %tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1] %tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1] ret ushort %tmp69 } into: ushort %Convert_sse(float %f) { entry: %tmp28 = sub float %f, 1.000000e+00 ; <float> [#uses=1] %tmp37 = mul float %tmp28, 5.000000e-01 ; <float> [#uses=1] %tmp375 = insertelement <4 x float> undef, float %tmp37, uint 0 ; <<4 x float>> [#uses=1] %tmp48 = tail call <4 x float> %llvm.x86.sse.min.ss( <4 x float> %tmp375, <4 x float> < float 6.553500e+04, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1] %tmp59 = tail call <4 x float> %llvm.x86.sse.max.ss( <4 x float> %tmp48, <4 x float> < float 0.000000e+00, float undef, float undef, float undef > ) ; <<4 x float>> [#uses=1] %tmp = tail call int %llvm.x86.sse.cvttss2si( <4 x float> %tmp59 ) ; <int> [#uses=1] %tmp69 = cast int %tmp to ushort ; <ushort> [#uses=1] ret ushort %tmp69 } which improves codegen from: _Convert_sse: movss LCPI1_0, %xmm0 movss 4(%esp), %xmm1 subss %xmm0, %xmm1 movss LCPI1_1, %xmm0 mulss %xmm0, %xmm1 movss LCPI1_2, %xmm0 minss %xmm0, %xmm1 xorps %xmm0, %xmm0 maxss %xmm0, %xmm1 cvttss2si %xmm1, %eax andl $65535, %eax ret to: _Convert_sse: movss 4(%esp), %xmm0 subss LCPI1_0, %xmm0 mulss LCPI1_1, %xmm0 movss LCPI1_2, %xmm1 minss %xmm1, %xmm0 xorps %xmm1, %xmm1 maxss %xmm1, %xmm0 cvttss2si %xmm0, %eax andl $65535, %eax ret This is just a first step, it can be extended in many ways. Testcase here: Transforms/InstCombine/vec_demanded_elts.ll llvm-svn: 30752
* This case isn't implemented yet. It seems unlikely to be needed, but if itChris Lattner2006-10-041-4/+2
| | | | | | ever is, we want to get an assert instead of silent bad codegen. llvm-svn: 30716
* Simplify logic further.Nick Lewycky2006-10-031-17/+8
| | | | | | | Ensure that we copy KnownProperties before calling visitBasicBlock, else we may leak properties into blocks where they don't belong. llvm-svn: 30705
* Simplify, now that predsimplify depends on break-crit-edges.Nick Lewycky2006-10-031-26/+8
| | | | | | Fix SwitchInst where dest-block is the same as one of the cases. llvm-svn: 30700
* Move break-crit-edges before the predicate simplifier. Allows us toNick Lewycky2006-10-031-7/+3
| | | | | | optimize in more cases. llvm-svn: 30699
* Revert previous patch. Still breaking things.Evan Cheng2006-10-031-49/+1
| | | | llvm-svn: 30698
* Fix PR932 and Analysis/Dominators/2006-10-02-BreakCritEdges.ll:Chris Lattner2006-10-031-15/+112
| | | | | | | The critical edge block dominates the dest block if the destblock dominates all edges other than the one incoming from the critical edge. llvm-svn: 30696
* Fix a bug from r1.391 of this file, where we checked the size instead ofChris Lattner2006-10-011-2/+2
| | | | | | | the alignment when promoting allocations. This implements InstCombine/cast.ll:test32 llvm-svn: 30682
* Fix debug outputChris Lattner2006-09-301-2/+1
| | | | llvm-svn: 30680
* Implement SRA of heap allocations.Chris Lattner2006-09-301-10/+266
| | | | llvm-svn: 30679
* Add some ifdef'd out debug infoChris Lattner2006-09-301-3/+30
| | | | llvm-svn: 30676
* Eliminate ConstantBool::True and ConstantBool::False. Instead, provideChris Lattner2006-09-286-119/+113
| | | | | | ConstantBool::getTrue() and ConstantBool::getFalse(). llvm-svn: 30665
* Another attempt at making ArgPromotion smarter. This patch no longer breaks ↵Owen Anderson2006-09-281-1/+49
| | | | | | Burg. llvm-svn: 30657
* simplify codeChris Lattner2006-09-281-1/+1
| | | | llvm-svn: 30656
* set DEBUG_TYPE rightChris Lattner2006-09-271-0/+1
| | | | llvm-svn: 30623
* Style changes only. Remove dead code, fix a comment.Nick Lewycky2006-09-231-11/+4
| | | | llvm-svn: 30588
* Be far more careful when splitting a loop header, either to form a preheaderChris Lattner2006-09-231-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | or when splitting loops with a common header into multiple loops. In particular the old code would always insert the preheader before the old loop header. This is disasterous in cases where the loop hasn't been rotated. For example, it can produce code like: .. outside the loop... jmp LBB1_2 #bb13.outer LBB1_1: #bb1 movsd 8(%esp,%esi,8), %xmm1 mulsd (%edi), %xmm1 addsd %xmm0, %xmm1 addl $24, %edi incl %esi jmp LBB1_3 #bb13 LBB1_2: #bb13.outer leal (%edx,%eax,8), %edi pxor %xmm1, %xmm1 xorl %esi, %esi LBB1_3: #bb13 movapd %xmm1, %xmm0 cmpl $4, %esi jl LBB1_1 #bb1 Note that the loop body is actually LBB1_1 + LBB1_3, which means that the loop now contains an uncond branch WITHIN it to jump around the inserted loop header (LBB1_2). Doh. This patch changes the preheader insertion code to insert it in the right spot, producing this code: ... outside the loop, fall into the header ... LBB1_1: #bb13.outer leal (%edx,%eax,8), %esi pxor %xmm0, %xmm0 xorl %edi, %edi jmp LBB1_3 #bb13 LBB1_2: #bb1 movsd 8(%esp,%edi,8), %xmm0 mulsd (%esi), %xmm0 addsd %xmm1, %xmm0 addl $24, %esi incl %edi LBB1_3: #bb13 movapd %xmm0, %xmm1 cmpl $4, %edi jl LBB1_2 #bb1 Totally crazy, no branch in the loop! :) llvm-svn: 30587
* Teach UpdateDomInfoForRevectoredPreds to handle revectored preds that are notChris Lattner2006-09-231-91/+49
| | | | | | | | reachable, making it general purpose enough for use by InsertPreheaderForLoop. Eliminate custom dominfo updating code in InsertPreheaderForLoop, using UpdateDomInfoForRevectoredPreds instead. llvm-svn: 30586
* Fix Transforms/IndVarsSimplify/2006-09-20-LFTR-Crash.llChris Lattner2006-09-211-15/+22
| | | | llvm-svn: 30555
* Don't rewrite ConstantExpr::get.Nick Lewycky2006-09-211-44/+20
| | | | llvm-svn: 30552
* Once we're down to "setcc type constant1, constant2", at least come upNick Lewycky2006-09-201-18/+14
| | | | | | with the right answer. llvm-svn: 30550
* Use a total ordering to compare instructions.Nick Lewycky2006-09-201-87/+101
| | | | | | Fixes infinite loop in resolve(). llvm-svn: 30540
* simplifyAndrew Lenharth2006-09-201-12/+8
| | | | llvm-svn: 30535
* We went through all that trouble to compute whether it was safe to transformChris Lattner2006-09-201-6/+46
| | | | | | | this comparison, but never checked it. Whoops, no wonder we miscompiled 177.mesa! llvm-svn: 30511
* Back out Chris' last set of changes. This breaks 177.mesa and povray somehow.Evan Cheng2006-09-201-43/+6
| | | | llvm-svn: 30505
* 80 col.Evan Cheng2006-09-201-1/+2
| | | | llvm-svn: 30504
* If we have an add, do it in the pointer realm, not the int realm. This is ↵Andrew Lenharth2006-09-191-0/+22
| | | | | | critical in the linux kernel for pointer analysis correctness llvm-svn: 30496
* implement select.ll:test19-22Chris Lattner2006-09-191-6/+43
| | | | llvm-svn: 30482
* Walk down the dominator tree instead of the control flow graph. That meansNick Lewycky2006-09-181-150/+90
| | | | | | | that we can't modify the CFG any more, at least not until it's possible to update the dominator tree (PR217). llvm-svn: 30469
* Fix an infinite loop building the CFEChris Lattner2006-09-181-1/+2
| | | | llvm-svn: 30465
* Implement a trivial optzn: of vastart is never called in a function that takesChris Lattner2006-09-181-2/+113
| | | | | | | | ... args, remove the '...'. This is Transforms/DeadArgElim/dead_vaargs.ll llvm-svn: 30459
* Implement InstCombine/cast.ll:test31. This speeds up 462.libquantum by 26%.Chris Lattner2006-09-181-4/+39
| | | | llvm-svn: 30456
* Implement Transforms/InstCombine/shift-sra.ll:test0Chris Lattner2006-09-181-0/+20
| | | | llvm-svn: 30450
* Rewrite shift/and/compare sequences to promote better licm of the RHS.Chris Lattner2006-09-181-28/+48
| | | | | | Use isLogicalShift/isArithmeticShift to simplify code. llvm-svn: 30448
* Fix Transforms/InstCombine/2006-09-15-CastToBool.ll and PR913Chris Lattner2006-09-161-0/+5
| | | | llvm-svn: 30405
* revert previous two patches. They cause miscompilation of ↵Chris Lattner2006-09-151-35/+1
| | | | | | MultiSource/Applications/Burg llvm-svn: 30397
* Revert my previous work on ArgumentPromotion. Further investigation has ↵Owen Anderson2006-09-151-34/+46
| | | | | | | | revealed these changes to be incorrect. They just weren't showing up in any of our current testcases. llvm-svn: 30385
* Adding dllimport, dllexport and external weak linkage types.Anton Korobeynikov2006-09-142-4/+6
| | | | | | | | | DLL* linkages got full (I hope) codegeneration support in C & both x86 assembler backends. External weak linkage added for future use, we don't provide any codegeneration, etc. support for it. llvm-svn: 30374
* Second half of the fix for Transforms/Inline/inline_cleanup.llChris Lattner2006-09-131-2/+28
| | | | | | | This folds unconditional branches that are often produced by code specialization. llvm-svn: 30307
* Add some more consistency checks.Nick Lewycky2006-09-131-1/+20
| | | | llvm-svn: 30305
* Fix unionSets so that it can merge correctly.Nick Lewycky2006-09-131-22/+34
| | | | llvm-svn: 30304
* Implement the first half of Transforms/Inline/inline_cleanup.llChris Lattner2006-09-131-1/+9
| | | | llvm-svn: 30303
* Erase dead instructions.Nick Lewycky2006-09-131-2/+3
| | | | llvm-svn: 30298
* Initialize DontInternalize.Devang Patel2006-09-131-1/+2
| | | | llvm-svn: 30281
* An sinkable instruction may exist with uses, if those uses are in dead blocks.Chris Lattner2006-09-121-0/+4
| | | | | | Handle this. This fixes PR908 and Transforms/LICM/2006-09-12-DeadUserOfSunkInstr.ll llvm-svn: 30275
* Fix PR905 and InstCombine/2006-09-11-EmptyStructCrash.llChris Lattner2006-09-111-1/+2
| | | | llvm-svn: 30266
OpenPOWER on IntegriCloud