summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86CmovConversion.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86CmovConversion] Make heuristic for optimized cmov depth more ↵Nikita Popov2020-02-191-6/+7
| | | | | | | | | | | | | | | | | | | conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155 (cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
* Sink all InitializePasses.h includesReid Kleckner2019-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* [X86] Fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-061-3/+3
|
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-151-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-2/+2
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* X86: Clean up pass initializationTom Stellard2019-06-131-3/+1
| | | | | | | | | | | | | | | | | | | | Summary: - Remove redundant initializations from pass constructors that were already being initialized by LLVMInitializeX86Target(). - Add initialization function for the FPS pass. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63218 llvm-svn: 363221
* [X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized ↵Simon Pilgrim2019-05-281-1/+2
| | | | | | variable warnings. NFCI. llvm-svn: 361804
* [X86] Merge the different Jcc instructions for each condition code into ↵Craig Topper2019-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802
* [X86] Merge the different CMOV instructions for each condition code into ↵Craig Topper2019-04-051-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* X86: Consistently declare pass initializers in X86.h; NFCMatthias Braun2018-11-011-6/+0
| | | | | | | This avoids declaring them twice: in X86TargetMachine.cpp and the file implementing the pass. llvm-svn: 345801
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-6/+6
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* [DebugInfo] Examine all uses of isDebugValue() for debug instructions.Shiva Chen2018-05-091-3/+3
| | | | | | | | | | | | | | | | | | Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844
* [TargetSchedule] shrink interface for init(); NFCISanjay Patel2018-04-081-1/+1
| | | | | | | | | | The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-1/+1
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layeringDavid Blaikie2017-11-081-1/+1
| | | | | | | | This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647
* [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve ↵Amjad Aboud2017-10-151-0/+31
| | | | | | | | PR34565 Differential Revision: https://reviews.llvm.org/D38359 llvm-svn: 315851
* [X86] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-10-051-16/+38
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 314953
* [X86][NFC] Add X86CmovConverterPass to the pass registry.Amjad Aboud2017-10-021-5/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D38355 llvm-svn: 314726
* [x86] Fix PR34377 by disabling cmov conversion when we relied on itChandler Carruth2017-09-061-0/+10
| | | | | | | | | | | performing a zext of a register. On the PR there is discussion of how to more effectively handle this, but this patch prevents us from miscompiling code. Differential Revision: https://reviews.llvm.org/D37504 llvm-svn: 312620
* fix more typos; NFCSanjay Patel2017-08-301-2/+2
| | | | llvm-svn: 312120
* fix typos; NFCSanjay Patel2017-08-301-15/+15
| | | | llvm-svn: 312119
* [x86] Fix an even stranger corner case where we have multiple levels ofChandler Carruth2017-08-191-1/+9
| | | | | | | | | cmov self-refrencing. Pointed out by Amjad Aboud in code review, test case minorly simplified from the one he posted. llvm-svn: 311267
* [x86] Teach the cmov converter to aggressively convert cmovs with memoryChandler Carruth2017-08-191-6/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | 1.0 | | | | | CP | mov rcx, rdi | 0* | | | | | | | | xor edi, edi | 2^ | 0.1 | 0.6 | 0.5 0.5 | 0.5 0.5 | | 0.4 | CP | cmp byte ptr [rsi+0xf], 0xf | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov rax, qword ptr [rsi] | 3 | 1.8 | 0.6 | | | | 0.6 | CP | cmovbe rax, rdi | 2^ | | | 0.5 0.5 | 0.5 0.5 | | 1.0 | | cmp byte ptr [rcx+0xf], 0x10 | 0F | | | | | | | | jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | 0.5 | 0.5 | | | | | | mov rax, rdi | 0* | | | | | | | | xor edi, edi | 2^ | 0.5 | 0.5 | 1.0 1.0 | | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | 0.5 | 0.5 | | | | | | mov ecx, 0x0 | 1 | | | | | | 1.0 | CP | jnbe 0x39 | 2^ | | | | 1.0 1.0 | | 1.0 | CP | cmp byte ptr [rax+0xf], 0x10 | 0F | | | | | | | | jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 0* | | | | | | | | | | mov rcx, rdi | 0* | | | | | | | | | | xor edi, edi | 2^ | | | 0.5 0.5 | 0.5 0.5 | | 1.0 | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | | | mov rax, qword ptr [rsi] | 3 | 1.0 | 1.0 | | | | | 1.0 | | | cmovbe rax, rdi | 2^ | 0.5 | | 0.5 0.5 | 0.5 0.5 | | | 0.5 | | | cmp byte ptr [rcx+0xf], 0x10 | 0F | | | | | | | | | | jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 0* | | | | | | | | | | mov rax, rdi | 0* | | | | | | | | | | xor edi, edi | 2^ | | | 1.0 1.0 | | | 1.0 | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | | 1.0 | | | | | | | | mov ecx, 0x0 | 1 | | | | | | | 1.0 | | | jnbe 0x39 | 2^ | 1.0 | | | 1.0 1.0 | | | | | | cmp byte ptr [rax+0xf], 0x10 | 0F | | | | | | | | | | jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226
* [x86] Refactor the CMOV conversion pass to be more flexible.Chandler Carruth2017-08-191-15/+23
| | | | | | | | | | | | | | | | | | The primary thing that this accomplishes is to allow future re-use of these routines in more contexts and clarify the behavior w.r.t. loops. For example, if handling outer loops is desirable, doing so in a inside-out order becomes straight forward because it walks the loop nest itself (rather than walking the function's basic blocks) and de-couples the CMOV rewriting from the loop structure as there isn't actually anything loop-specific about this transformation. This patch should be essentially a no-op. It potentially changes the order in which we visit the inner loops, but otherwise should merely set the stage for subsequent changes. Differential Revision: https://reviews.llvm.org/D36783 llvm-svn: 311225
* [X86] Improved X86::CMOV to Branch heuristic.Amjad Aboud2017-08-081-4/+18
| | | | | | | | | Resolved PR33954. This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead. Differential Revision: https://reviews.llvm.org/D36081 llvm-svn: 310352
* [X86] X86::CMOV to Branch heuristic based optimization.Amjad Aboud2017-07-161-0/+611
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst. However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when: 1. Branch is well predicted 2. Condition operand is expensive, compared to True-value and the False-value operands In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough. This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic. Differential Revision: https://reviews.llvm.org/D34769 llvm-svn: 308142
OpenPOWER on IntegriCloud