summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/PeepholeOptimizer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* PeepholeOpt cleanup/refactor; NFCMatthias Braun2018-01-111-440/+370
| | | | | | | | | | | | | | | | | | - Less unnecessary use of `auto` - Add early `using RegSubRegPair(AndIdx) =` to avoid countless `TargetInstrInfo::` qualifications. - Use references instead of pointers where possible. - Remove unused parameters. - Rewrite the CopyRewriter class hierarchy: - Pull out uncoalescable copy rewriting functionality into PeepholeOptimizer class. - Use an abstract base class to make it clear that rewriters are independent. - Remove unnecessary \brief in doxygen comments. - Remove unused constructor and method from ValueTracker. - Replace UseAdvancedTracking of ValueTracker with DisableAdvCopyOpt use. llvm-svn: 322325
* PeepholeOptimizer: Fix for vregs without defsMatthias Braun2018-01-111-3/+16
| | | | | | | | | | The PeepholeOptimizer would fail for vregs without a definition. If this was caused by an undef operand abort to keep the code simple (so we don't need to add logic everywhere to replicate the undef flag). Differential Revision: https://reviews.llvm.org/D40763 llvm-svn: 322319
* PeepholeOptimizer: Do not form PHI with subreg argumentsMatthias Braun2018-01-111-22/+19
| | | | | | | | | | | | | | | | | | | | | When replacing a PHI the PeepholeOptimizer currently takes the register class of the register at the first operand. This however is not correct if this argument has a subregister index. As there is currently no API to query the register class resulting from applying a subregister index to all registers in a class, we can only abort in these cases and not perform the transformation. This changes findNextSource() to require the end of all copy chains to not use a subregister if there is any PHI in the chain. I had to rewrite the overly complicated inner loop there to have a good place to insert the new check. This fixes https://llvm.org/PR33071 (aka rdar://32262041) Differential Revision: https://reviews.llvm.org/D40758 llvm-svn: 322313
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-1/+1
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [CodeGen] Print "%vreg0" as "%0" in both MIR and debug outputFrancis Visoiu Mistrih2017-11-301-10/+10
| | | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, avoid printing "vreg" for virtual registers (which is one of the current MIR possibilities). Basically: * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g" * grep -nr '%vreg' . and fix if needed * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g" * grep -nr 'vreg[0-9]\+' . and fix if needed Differential Revision: https://reviews.llvm.org/D40420 llvm-svn: 319427
* [CodeGen] Print register names in lowercase in both MIR and debug outputFrancis Visoiu Mistrih2017-11-281-4/+4
| | | | | | | | | | | As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-3/+3
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layeringDavid Blaikie2017-11-081-1/+1
| | | | | | | | This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647
* [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-09-111-20/+39
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 312971
* Remove redundant copy in recurrencesTaewook Oh2017-06-291-3/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code, ``` BB#1: ; Loop Header %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: ; Loop Latch %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0 %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> ``` Existing two-address generation pass generates following code: ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1: ``` .LBB0_6: addl %esi, %edi addl %ebx, %edi cmpl $10, %edi movl %edi, %esi jl .LBB0_1 ``` This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: derived from LLVM BB %bb7 Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` and the final assembly does not have redundant copy: ``` .LBB0_6: addl %edi, %eax addl %ebx, %eax cmpl $10, %eax jl .LBB0_1 ``` Reviewers: qcolombet, MatzeB, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31821 llvm-svn: 306758
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* PeepholeOptimizer: Do not replace SubregToReg(bitcast like)Matthias Braun2017-01-091-1/+10
| | | | | | | | | | | While we can usually replace bitcast like instructions (MachineInstr::isBitcast()) with a COPY this is not legal if any of the users uses SUBREG_TO_REG to assert the upper bits of the result are zero. Differential Revision: https://reviews.llvm.org/D28474 llvm-svn: 291483
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-151-2/+2
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* [peephole] Enhance folding logic to work for STATEPOINTsPhilip Reames2016-12-131-9/+19
| | | | | | | | | | | | | | The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510
* Fix some Clang-tidy modernize-use-using and Include What You Use warnings; ↵Eugene Zelenko2016-08-251-4/+21
| | | | | | | | other minor fixes. Differential revision: https://reviews.llvm.org/D23861 llvm-svn: 279695
* PeepholeOptimizer: Make pass name match DEBUG_TYPEMatt Arsenault2016-07-081-2/+2
| | | | llvm-svn: 274874
* Fixed warning caused by r274402.Eric Liu2016-07-041-5/+5
| | | | llvm-svn: 274497
* PeepholeOptimizer: Relax assertMatt Arsenault2016-07-011-2/+4
| | | | | | Allow implicit defs llvm-svn: 274402
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* Re-commit optimization bisect support (r267022) without new pass manager ↵Andrew Kaylor2016-04-221-1/+1
| | | | | | | | | | support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
* Revert "Initial implementation of optimization bisect support."Vedant Kumar2016-04-221-1/+1
| | | | | | | | This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
* Initial implementation of optimization bisect support.Andrew Kaylor2016-04-211-1/+1
| | | | | | | | | | | | This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022
* fix formatting; NFCSanjay Patel2015-12-291-11/+10
| | | | llvm-svn: 256574
* use range-based for-loop; NFCISanjay Patel2015-12-291-8/+6
| | | | llvm-svn: 256572
* don't repeat function names in comments; NFCSanjay Patel2015-12-291-14/+13
| | | | llvm-svn: 256569
* PeepholeOptimizer: Ignore dead implicit defsDan Gohman2015-12-101-0/+6
| | | | | | | | Target-specific instructions may have uninteresting physreg clobbers, for target-specific reasons. The peephole pass doesn't need to concern itself with such defs, as long as they're implicit and marked as dead. llvm-svn: 255182
* CodeGen peephole: fold redundant phys reg copiesJF Bastien2015-12-031-12/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code generation often exposes redundant physical register copies through virtual registers such as: %vreg = COPY %PHYSREG ... %PHYSREG = COPY %vreg There are cases where no intervening clobber of %PHYSREG occurs, and the later copy could therefore be removed. In some cases this further allows us to remove the initial copy. This patch contains a motivating example which comes from the x86 build of Chrome, specifically cc::ResourceProvider::UnlockForRead uses libstdc++'s implementation of hash_map. That example has two tests live at the same time, and after machine sinking LLVM has confused itself enough and things spilling EFLAGS is a great idea even though it's never restored and the comparison results are both live. Before this patch we have: DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def> %vreg1<def> = COPY %EFLAGS; GR64:%vreg1 %EFLAGS<def> = COPY %vreg1; GR64:%vreg1 JNE_1 <BB#1>, %EFLAGS<imp-use> Both copies are useless. This patch tries to eliminate the later copy in a generic manner. dec is especially confusing to LLVM when compared with sub. I wrote this patch to treat all physical registers generically, but only remove redundant copies of non-allocatable physical registers because the allocatable ones caused issues (e.g. when calling conventions weren't properly modeled) and should be handled later by the register allocator anyways. The following tests used to failed when the patch also replaced allocatable registers: CodeGen/X86/StackColoring.ll CodeGen/X86/avx512-calling-conv.ll CodeGen/X86/copy-propagation.ll CodeGen/X86/inline-asm-fpstack.ll CodeGen/X86/musttail-varargs.ll CodeGen/X86/pop-stack-cleanup.ll CodeGen/X86/preserve_mostcc64.ll CodeGen/X86/tailcallstack64.ll CodeGen/X86/this-return-64.ll This happens because COPY has other special meaning for e.g. dependency breakage and x87 FP stack. Note that all other backends' tests pass. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15157 llvm-svn: 254665
* Refactor: Simplify boolean conditional return statements in lib/CodeGen.Rafael Espindola2015-10-241-4/+1
| | | | | | Patch by Richard. llvm-svn: 251213
* PeepholeOptimizer: Remove redundant copiesMatt Arsenault2015-09-251-0/+79
| | | | | | | | | | | | If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611
* Introduce target hook for optimizing register copiesMatt Arsenault2015-09-241-34/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
* Remove dead declarationMatt Arsenault2015-09-241-1/+0
| | | | llvm-svn: 248471
* Fix typos / grammarMatt Arsenault2015-09-091-26/+26
| | | | llvm-svn: 247109
* Make helper functions static. NFC.Benjamin Kramer2015-08-201-1/+1
| | | | llvm-svn: 245549
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-08-191-85/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reintroduce r245442. Remove an overly conservative assertion introduced in r245442. We could replace the assertion to use `shareSameRegisterFile` instead, but in that point in `insertPHI` we already lost the original Def subreg to check against. So drop the assertion completely. Original commit message: - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245479
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-08-191-288/+85
| | | | | | | | | sources" Revert r245442 while investigating a fix. An assertion hit in http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380 llvm-svn: 245446
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-08-191-85/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply r243486. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245442
* [X86] Allow x86 call frame optimization to fold more loads into pushesMichael Kuperstein2015-08-121-3/+3
| | | | | | | | | | This abstracts away the test for "when can we fold across a MachineInstruction" into the the MI interface, and changes call-frame optimization use the same test the peephole optimizer users. Differential Revision: http://reviews.llvm.org/D11945 llvm-svn: 244729
* Allow PeepholeOptimizer to fold a few more casesMichael Kuperstein2015-08-111-5/+4
| | | | | | | | | | The condition for clearing the folding candidate list was clamped together with the "uninteresting instruction" condition. This is too conservative, e.g. we don't need to clear the list when encountering an IMPLICIT_DEF. Differential Revision: http://reviews.llvm.org/D11591 llvm-svn: 244577
* Fix some comment typos.Benjamin Kramer2015-08-081-2/+2
| | | | llvm-svn: 244402
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-07-291-285/+82
| | | | | | | | | | sources" Reported to Broke some internal tests: PR24303 This reverts commit r243486. llvm-svn: 243540
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-281-82/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply 243271 with more fixes; although we are not handling multiple sources with coalescable copies, we were not properly skipping this case. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243486
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-07-271-275/+82
| | | | | | | | sources" Still breaks some ARM buildbots. This reverts r243271. llvm-svn: 243318
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-271-82/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply r242295 with fixes in the implementation. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243271
* [PeepholeOptimizer] Refactor optimizeUncoalescable logicBruno Cardoso Lopes2015-07-221-127/+246
| | | | | | | | | | | | | | | | | | | Reapply r242294. - Create a new CopyRewriter for Uncoalescable copy-like instructions - Change the ValueTracker to return a ValueTrackerResult This makes optimizeUncoalescable looks more like optimizeCoalescable and use the CopyRewritter infrastructure. This is also the preparation for looking up into PHI nodes in the ValueTracker. rdar://problem/20404526 Differential Revision: http://reviews.llvm.org/D11195 llvm-svn: 242940
* Revert "Refactor optimizeUncoalescable logic"Bruno Cardoso Lopes2015-07-151-246/+127
| | | | | | | | | | Likely broke compilation on ARM: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054 This reverts commit 0b7824464fbe3d3f386e2d4aef6a431422709e53. llvm-svn: 242311
* Revert "Look through PHIs to find additional register sources"Bruno Cardoso Lopes2015-07-151-265/+82
| | | | | | | | | | Likely broke compilation on ARM: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054 This reverts commit 131ce4a838c081516cbfed039fc986b33e3979d6. llvm-svn: 242310
* Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-151-82/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 242295
* Refactor optimizeUncoalescable logicBruno Cardoso Lopes2015-07-151-127/+246
| | | | | | | | | | | | | | | - Create a new CopyRewriter for Uncoalescable copy-like instructions - Change the ValueTracker to return a ValueTrackerResult This makes optimizeUncoalescable looks more like optimizeCoalescable and use the CopyRewritter infrastructure. This is also the preparation for looking up into PHI nodes in the ValueTracker. Differential Revision: http://reviews.llvm.org/D11195 llvm-svn: 242294
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-1/+1
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-1/+1
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
OpenPOWER on IntegriCloud