summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/PeepholeOptimizer.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Allow x86 call frame optimization to fold more loads into pushesMichael Kuperstein2015-08-121-3/+3
| | | | | | | | | | This abstracts away the test for "when can we fold across a MachineInstruction" into the the MI interface, and changes call-frame optimization use the same test the peephole optimizer users. Differential Revision: http://reviews.llvm.org/D11945 llvm-svn: 244729
* Allow PeepholeOptimizer to fold a few more casesMichael Kuperstein2015-08-111-5/+4
| | | | | | | | | | The condition for clearing the folding candidate list was clamped together with the "uninteresting instruction" condition. This is too conservative, e.g. we don't need to clear the list when encountering an IMPLICIT_DEF. Differential Revision: http://reviews.llvm.org/D11591 llvm-svn: 244577
* Fix some comment typos.Benjamin Kramer2015-08-081-2/+2
| | | | llvm-svn: 244402
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-07-291-285/+82
| | | | | | | | | | sources" Reported to Broke some internal tests: PR24303 This reverts commit r243486. llvm-svn: 243540
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-281-82/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply 243271 with more fixes; although we are not handling multiple sources with coalescable copies, we were not properly skipping this case. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243486
* Revert "[PeepholeOptimizer] Look through PHIs to find additional register ↵Bruno Cardoso Lopes2015-07-271-275/+82
| | | | | | | | sources" Still breaks some ARM buildbots. This reverts r243271. llvm-svn: 243318
* [PeepholeOptimizer] Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-271-82/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reapply r242295 with fixes in the implementation. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 243271
* [PeepholeOptimizer] Refactor optimizeUncoalescable logicBruno Cardoso Lopes2015-07-221-127/+246
| | | | | | | | | | | | | | | | | | | Reapply r242294. - Create a new CopyRewriter for Uncoalescable copy-like instructions - Change the ValueTracker to return a ValueTrackerResult This makes optimizeUncoalescable looks more like optimizeCoalescable and use the CopyRewritter infrastructure. This is also the preparation for looking up into PHI nodes in the ValueTracker. rdar://problem/20404526 Differential Revision: http://reviews.llvm.org/D11195 llvm-svn: 242940
* Revert "Refactor optimizeUncoalescable logic"Bruno Cardoso Lopes2015-07-151-246/+127
| | | | | | | | | | Likely broke compilation on ARM: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054 This reverts commit 0b7824464fbe3d3f386e2d4aef6a431422709e53. llvm-svn: 242311
* Revert "Look through PHIs to find additional register sources"Bruno Cardoso Lopes2015-07-151-265/+82
| | | | | | | | | | Likely broke compilation on ARM: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054 This reverts commit 131ce4a838c081516cbfed039fc986b33e3979d6. llvm-svn: 242310
* Look through PHIs to find additional register sourcesBruno Cardoso Lopes2015-07-151-82/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 242295
* Refactor optimizeUncoalescable logicBruno Cardoso Lopes2015-07-151-127/+246
| | | | | | | | | | | | | | | - Create a new CopyRewriter for Uncoalescable copy-like instructions - Change the ValueTracker to return a ValueTrackerResult This makes optimizeUncoalescable looks more like optimizeCoalescable and use the CopyRewritter infrastructure. This is also the preparation for looking up into PHI nodes in the ValueTracker. Differential Revision: http://reviews.llvm.org/D11195 llvm-svn: 242294
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-1/+1
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-1/+1
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.Benjamin Kramer2015-03-231-0/+1
| | | | llvm-svn: 232998
* Simplify expressions involving boolean constants with clang-tidyDavid Blaikie2015-03-091-1/+1
| | | | | | | | Patch by Richard (legalize at xmission dot com). Differential Revision: http://reviews.llvm.org/D8154 llvm-svn: 231617
* Replace std::copy with a back inserter with vector append where feasibleBenjamin Kramer2015-02-281-2/+1
| | | | | | | | | All of the cases were just appending from random access iterators to a vector. Using insert/append can grow the vector to the perfect size directly and moves the growing out of the loop. No intended functionalty change. llvm-svn: 230845
* Peephole opt needs optimizeSelect() to keep track of newly created MIsMehdi Amini2015-01-131-4/+13
| | | | | | | | | | | | | | | Peephole optimizer is scanning a basic block forward. At some point it needs to answer the question "given a pointer to an MI in the current BB, is it located before or after the current instruction". To perform this, it keeps a set of the MIs already seen during the scan, if a MI is not in the set, it is assumed to be after. It means that newly created MIs have to be inserted in the set as well. This commit passes the set as an argument to the target-dependent optimizeSelect() so that it can properly update the set with the (potentially) newly created MIs. llvm-svn: 225772
* Avoid caching the MachineFunction, we don't use it outside ofEric Christopher2014-10-151-9/+7
| | | | | | runOnMachineFunction. llvm-svn: 219847
* [AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner2014-10-141-0/+12
| | | | | | | | | | | | | | | | | | | | | Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
* Instead of the TargetMachine cache the MachineFunctionEric Christopher2014-10-141-14/+13
| | | | | | | | and TargetRegisterInfo in the peephole optimizer. This makes it easier to grab subtarget dependent variables off of the MachineFunction rather than the TargetMachine. llvm-svn: 219669
* [PeepholeOptimizer] Enable the advanced copy optimization by default.Quentin Colombet2014-08-211-1/+1
| | | | | | | | | | | | | The advanced copy optimization does not yield any difference on the whole llvm test-suite + SPECs, either in compile time or runtime (binaries are identical), but has a big potential when data go back and forth between register files as demonstrated with test/CodeGen/ARM/adv-copy-opt.ll. Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64. <rdar://problem/12702965> llvm-svn: 216236
* [PeepholeOptimizer] Update the kill flags when extending the live-range of theQuentin Colombet2014-08-211-1/+5
| | | | | | | | source of a copy. <rdar://problem/12702965> llvm-svn: 216229
* [PeepholeOptimizer] Take advantage of the isInsertSubreg property in theQuentin Colombet2014-08-211-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> llvm-svn: 216144
* [PeepholeOptimizer] Take advantage of the isExtractSubreg property in theQuentin Colombet2014-08-201-24/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | advanced copy optimization. This patch is a step toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov r0, r1, d16 but it does not understand yet vmov.32 d16[0], r0 vmov.32 d16[1], r1 Comming patches will fix that and update the related test case. <rdar://problem/12702965> llvm-svn: 216136
* [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage ofQuentin Colombet2014-08-201-169/+607
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088
* PeepholeOptimizer: make parameter ref to SmallPtrSetImplHans Wennborg2014-08-111-2/+2
| | | | | | | This makes the function type independent of the in-line size of LocalMIs. llvm-svn: 215356
* Re-commit "Increase the size of this SmallVector in PeepholeOptimizer." ↵Hans Wennborg2014-08-111-3/+3
| | | | | | | | | (r215340) This time, also update the function that receives a reference to the SmallPtrSet as a parameter. llvm-svn: 215342
* Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)Hans Wennborg2014-08-111-1/+1
| | | | | | | | | | | | | That broke the build: /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>' Changed |= optimizeExtInstr(MI, MBB, LocalMIs); ^~~~~~~~ /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here SmallPtrSet<MachineInstr*, 8> &LocalMIs) { ^ llvm-svn: 215341
* Increase the size of this SmallVector in PeepholeOptimizer.Hans Wennborg2014-08-111-1/+1
| | | | | | During a Clang build, the median size of this was 9 llvm-svn: 215340
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-5/+8
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [PeepholeOptimzer] Fix a typo in a comment.Quentin Colombet2014-07-011-1/+1
| | | | | | Spotted by Amara Emerson. llvm-svn: 212106
* [PeepholeOptimizer] Advanced rewriting of copies to avoid cross register banksQuentin Colombet2014-07-011-13/+368
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | copies. This patch extends the peephole optimization introduced in r190713 to produce register-coalescer friendly copies when possible. This extension taught the existing cross-bank copy optimization how to deal with the instructions that generate cross-bank copies, i.e., insert_subreg, extract_subreg, reg_sequence, and subreg_to_reg. E.g. b = insert_subreg e, A, sub0 <-- cross-bank copy ... C = copy b.sub0 <-- cross-bank copy Would produce the following code: b = insert_subreg e, A, sub0 <-- cross-bank copy ... C = copy A <-- same-bank copy This patch also introduces a new helper class for that: ValueTracker. This class implements the logic to look through the copy related instructions and get the related source. For now, the advanced rewriting is disabled by default as we are lacking the semantic on target specific instructions to catch the motivating examples. Related to <rdar://problem/12702965>. llvm-svn: 212100
* [Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth2014-04-221-1/+2
| | | | | | | | | | | | define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-6/+6
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* [CodeGen] Fix peephole optimizer bug introduced in r205481. Fixes PR19318.Lang Hames2014-04-031-9/+11
| | | | | | | | I should have read that comment a little more carefully. ;) Regression test in the works, committing in the mean time to un-break people. llvm-svn: 205511
* [CodeGen] Teach the peephole optimizer to remember (and exploit) all foldingLang Hames2014-04-021-35/+44
| | | | | | | | opportunities in the current basic block, rather than just the last one seen. <rdar://problem/16478629> llvm-svn: 205481
* Disable each MachineFunctionPass for 'optnone' functions, unless thatPaul Robinson2014-03-311-0/+3
| | | | | | | pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
* Switch a number of loops in lib/CodeGen over to range-based for-loops, now thatOwen Anderson2014-03-171-13/+6
| | | | | | the MachineRegisterInfo iterators are compatible with it. llvm-svn: 204075
* Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changingOwen Anderson2014-03-131-7/+7
| | | | | | | | | | operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865
* Fix for http://llvm.org/bugs/show_bug.cgi?id=18590Ekaterina Romanova2014-03-131-3/+11
| | | | | | | This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed. Patch by Trevor Smigiel! llvm-svn: 203829
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-071-2/+2
| | | | | | class. llvm-svn: 203220
* Replace PROLOG_LABEL with a new CFI_INSTRUCTION.Rafael Espindola2014-03-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
* [Peephole] Rewrite copies to avoid cross register banks copies.Quentin Colombet2013-09-131-84/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By definition copies across register banks are not coalescable. Still, it may be possible to get rid of such a copy when the value is available in another register of the same register file. Consider the following example, where capital and lower letters denote different register file: b = copy A <-- cross-bank copy ... C = copy b <-- cross-bank copy This could have been optimized this way: b = copy A <-- cross-bank copy ... C = copy A <-- same-bank copy Note: b and C's definitions may be in different basic blocks. This patch adds a peephole optimization that looks through a chain of copies leading to a cross-bank copy and reuses a source that is on the same register file if available. This solution could also be used to get rid of some copies (e.g., A could have been used instead of C). However, we do not do so because: - It may over constrain the coloring of the source register for coalescing. - The register allocator may not be able to find a nice split point for the longer live-range, leading to more spill. <rdar://problem/14742333> llvm-svn: 190713
* Add debug prints for when optimizeLoadInstr folds a load.Craig Topper2012-12-171-0/+6
| | | | llvm-svn: 170298
* Add comment for load foldingJoel Jones2012-12-111-0/+5
| | | | llvm-svn: 169880
* Use the new script to sort the includes of every file under lib.Chandler Carruth2012-12-031-5/+5
| | | | | | | | | | | | | | | | | Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
* Make sure we iterate over newly created instructions. Fixes pr13625. Testcase toRafael Espindola2012-10-151-0/+5
| | | | | | follow in one sec. llvm-svn: 165951
* Use standard pattern for iterate+erase.Jakob Stoklund Olesen2012-08-171-9/+2
| | | | | | | | | Increment the MBB iterator at the top of the loop to properly handle the current (and previous) instructions getting erased. This fixes PR13625. llvm-svn: 162099
* Add an MCID::Select flag and TII hooks for optimizing selects.Jakob Stoklund Olesen2012-08-161-16/+27
| | | | | | | | | | | | Select instructions pick one of two virtual registers based on a condition, like x86 cmov. On targets like ARM that support predication, selects can sometimes be eliminated by predicating the instruction defining one of the operands. Teach PeepholeOptimizer to recognize select instructions, and ask the target to optimize them. llvm-svn: 162059
OpenPOWER on IntegriCloud