summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [x86] enable machine combiner reassociations for 128-bit vector ↵Sanjay Patel2015-08-081-0/+4
| | | | | | single/double adds llvm-svn: 244403
* MIR Serialization: Initial serialization of the machine operand target flags.Alex Lorenz2015-08-061-0/+35
| | | | | | | | | | | | This commit implements the initial serialization of the machine operand target flags. It extends the 'TargetInstrInfo' class to add two new methods that help to provide text based serialization for the target flags. This commit can serialize only the X86 target flags, and the target flags for the other targets will be serialized in the follow-up commits. Reviewers: Duncan P. N. Exon Smith llvm-svn: 244185
* Revert "Fix MO's analyzePhysReg, it was confusing sub- and super-registers. ↵JF Bastien2015-08-051-48/+26
| | | | | | | | Problem pointed out by Michael Hordijk." I mistakenly committed the patch for D6629, and was trying to commit another. Reverting until it gets proper signoff. llvm-svn: 244121
* Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem ↵JF Bastien2015-08-051-26/+48
| | | | | | pointed out by Michael Hordijk. llvm-svn: 244120
* wrap OptSize and MinSize attributes for easier and consistent access (NFCI)Sanjay Patel2015-08-041-2/+3
| | | | | | | | | | | | | | | | | Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994
* [x86] machine combiner reassociation: mark EFLAGS operand as 'dead'Sanjay Patel2015-08-041-4/+43
| | | | | | | | | | | | | | In the commentary for D11660, I wasn't sure if it was alright to create new integer machine instructions without also creating the implicit EFLAGS operand. From what I can see, the implicit operand is always created by the MachineInstrBuilder based on the instruction type, so we don't have to do that explicitly. However, in reviewing the debug output, I noticed that the operand was not marked as 'dead'. The machine combiner should do that to preserve future optimization opportunities that may be checking for that dead EFLAGS operand themselves. Differential Revision: http://reviews.llvm.org/D11696 llvm-svn: 243990
* [x86] reassociate integer multiplies using machine combiner passSanjay Patel2015-07-311-10/+30
| | | | | | | | | | | | | Add i16, i32, i64 imul machine instructions to the list of reassociation candidates. A new bit of logic is needed to handle integer instructions: they have an implicit EFLAGS operand, so we have to make sure it's dead in order to do any reassociation with integer ops. Differential Revision: http://reviews.llvm.org/D11660 llvm-svn: 243756
* push fast-math check for machine-combiner reassociations into ↵Sanjay Patel2015-07-301-7/+4
| | | | | | | | instruction-type check; NFC This makes it simpler to add instruction types that don't depend on fast-math. llvm-svn: 243596
* fix invalid load folding with SSE/AVX FP logical instructions (PR22371)Sanjay Patel2015-07-281-12/+3
| | | | | | | | | | | | | | | | | | This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ). I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy when targeting 32-bit x86 and causes a miscompile/crash in Wine: https://bugs.winehq.org/show_bug.cgi?id=38826 https://llvm.org/bugs/show_bug.cgi?id=22371#c25 The quick fix is to simply remove the scalar FP logical instructions from the load folding table in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs, fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP logical instructions (because that's all x86 gives us anyway). That lets us do the load folding legally. Differential Revision: http://reviews.llvm.org/D11477 llvm-svn: 243361
* [X86] Allow load folding into PUSH instructionsMichael Kuperstein2015-07-231-4/+11
| | | | | | | | | | Adds pushes to the folding tables. This also required a fix to the TD definition, since the memory forms of the push instructions did not have the right mayLoad/mayStore flags. Differential Revision: http://reviews.llvm.org/D11340 llvm-svn: 243010
* Remove TargetInstrInfo::canFoldMemoryOperandSimon Pilgrim2015-07-191-56/+0
| | | | | | | | | | canFoldMemoryOperand is not actually used anywhere in the codebase - all existing users instead call foldMemoryOperand directly when they wish to fold and can correctly deduce what they need from the return value. This patch removes the canFoldMemoryOperand base function and the target implementations; only x86 had a real (bit-rotted) implementation, although AMDGPU had a preparatory stub that had never needed to be completed. Differential Revision: http://reviews.llvm.org/D11331 llvm-svn: 242638
* [MMX] Use the appropriate instructions for GR64 <-> VR64 copies.Bruno Cardoso Lopes2015-07-141-2/+2
| | | | | | | | | | | | | | | MOVSDto64rr and MOV64toSDrr are defined to convert between FR64 (%xmm) <-> GR64 registers, not VR64 (%mm) <-> GR64. This is wrong. I found this by inspection and could not find a suitable testcase for it since (1) we don't handle MMX bitcasts in Peephole optimizer as to generate COPYs that (2) could be expanded back to the appropriate x86 instruction in ExpandPostRA. Switch to use the appropriate instructions: MMX_MOVD64from64rr and MMX_MOVD64to64rr here. llvm-svn: 242191
* [x86] enable machine combiner reassociations for scalar double-precision ↵Sanjay Patel2015-07-091-1/+3
| | | | | | multiplies llvm-svn: 241873
* [x86] enable machine combiner reassociations for scalar double-precision addsSanjay Patel2015-07-091-0/+2
| | | | llvm-svn: 241871
* [x86] enable machine combiner reassociations for scalar single-precision ↵Sanjay Patel2015-07-081-2/+4
| | | | | | multiplies llvm-svn: 241752
* [X86][SSE] Added (V)ROUNDSD + (V)ROUNDSS stack folding supportSimon Pilgrim2015-07-081-4/+8
| | | | llvm-svn: 241671
* use range-based for loops; NFCISanjay Patel2015-07-071-35/+17
| | | | llvm-svn: 241592
* [x86] extend machine combiner reassociation optimization to SSE scalar addsSanjay Patel2015-07-061-13/+19
| | | | | | | | | | | | Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460) to SSE scalar FP SP adds in addition to AVX scalar FP SP adds. With the 'switch' in place, we can trivially add other opcodes and test cases in future patches. Differential Revision: http://reviews.llvm.org/D10975 llvm-svn: 241515
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-2/+2
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* [x86] generalize reassociation optimization in machine combiner to 2 ↵Sanjay Patel2015-06-231-77/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instructions Currently ( D10321, http://reviews.llvm.org/rL239486 ), we can use the machine combiner pass to reassociate the following sequence to reduce the critical path: A = ? op ? B = A op X C = B op Y --> A = ? op ? B = X op Y C = A op B 'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could be any associative math/logic op (see TODO in code comment). This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth reassociating them. This generalization has a compile-time cost because we can now match more instruction sequences and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't improve the critical path. For example, in the new test case: A = M div N B = A add X C = B add Y We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path: A = M div N B = A add Y C = B add X We need the combiner to reject that pattern but select this: A = M div N B = X add Y C = B add A Differential Revision: http://reviews.llvm.org/D10460 llvm-svn: 240361
* [X86][FMA4] FMA4 ops can perform unaligned folded loads.Simon Pilgrim2015-06-221-64/+64
| | | | llvm-svn: 240342
* [X86] Teach load folding to accept scalar _Int users of MOVSS/MOVSD.Ahmed Bougacha2015-06-221-10/+46
| | | | | | | | | | | | | | | | | | | The _Int instructions are special, in that they operate on the full VR128 instead of FR32. The load folding then looks at MOVSS, at the user, and bails out when it sees a size mismatch. What we really know is that the rm_Int instructions don't load the higher lanes, so folding is fine. This happens for the straightforward intrinsic code, e.g.: _mm_add_ss(a, _mm_load_ss(p)); Fixes PR23349. Differential Revision: http://reviews.llvm.org/D10554 llvm-svn: 240326
* name change: hasPattern() -> getMachineCombinerPatterns() ; NFCSanjay Patel2015-06-191-3/+3
| | | | | | | This was suggested as part of D10460, but it's independent of any functional change. llvm-svn: 240192
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-2/+2
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* [TargetInstrInfo] Add new hook: AnalyzeBranchPredicate.Sanjoy Das2015-06-151-5/+85
| | | | | | | | | | | | | | | | | | | Summary: NFC: no one uses AnalyzeBranchPredicate yet. Add TargetInstrInfo::AnalyzeBranchPredicate and implement for x86. A later change adding support for page-fault based implicit null checks depends on this. Reviewers: reames, ab, atrick Reviewed By: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10200 llvm-svn: 239742
* [TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86.Sanjoy Das2015-06-151-0/+30
| | | | | | | | | | | | | | | | | | | | | | | Summary: TargetInstrInfo::getLdStBaseRegImmOfs to TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The implementation only handles a few easy cases now and will be made more sophisticated in the future. This is NFCI: the only user of `getLdStBaseRegImmOfs` (now `getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion` is disabled for x86. Reviewers: reames, ab, MatzeB, atrick Reviewed By: MatzeB, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10199 llvm-svn: 239741
* MachineLICM: Use TargetSchedModel instead of just itinerariesMatthias Braun2015-06-131-1/+1
| | | | | | | | | This will use Itinieraries if available, but will also work if just a MCSchedModel is available. Differential Revision: http://reviews.llvm.org/D10428 llvm-svn: 239658
* [CodeGen] ArrayRef'ize cond/pred in various TII APIs. NFC.Ahmed Bougacha2015-06-111-5/+3
| | | | llvm-svn: 239553
* change assert that will never fire to llvm_unreachableSanjay Patel2015-06-101-1/+1
| | | | llvm-svn: 239497
* [x86] Add a reassociation optimization to increase ILP via the ↵Sanjay Patel2015-06-101-0/+204
| | | | | | | | | | | | | | | | | | MachineCombiner pass This is a reimplementation of D9780 at the machine instruction level rather than the DAG. Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a starting point; see the TODO comments) to increase ILP when it's safe to do so. The code is closely based on the existing MachineCombiner optimization that is implemented for AArch64. This patch should not cause the kind of spilling tragedy that led to the reversion of r236031. Differential Revision: http://reviews.llvm.org/D10321 llvm-svn: 239486
* [InstrInfo] Refactor foldOperandImpl to thread through InsertPt. NFCKeno Fischer2015-06-081-41/+44
| | | | | | | | | | | | | | | | | | Summary: This was a longstanding FIXME and is a necessary precursor to cases where foldOperandImpl may have to create more than one instruction (e.g. to constrain a register class). This is the split out NFC changes from D6262. Reviewers: pete, ributzka, uweigand, mcrosier Reviewed By: mcrosier Subscribers: mcrosier, ted, llvm-commits Differential Revision: http://reviews.llvm.org/D10174 llvm-svn: 239336
* AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNLIgor Breger2015-06-081-2/+2
| | | | | | | | | Implemented DAG lowering for all these forms. Added tests for DAG lowering and encoding. Differential Revision: http://reviews.llvm.org/D10310 llvm-svn: 239300
* [X86] Added BitScanForward/BitScanReverse memory folding + testsSimon Pilgrim2015-06-071-0/+6
| | | | llvm-svn: 239257
* MachineInstr: Remove unused parameter.Matthias Braun2015-05-191-1/+1
| | | | llvm-svn: 237726
* MC: Modernize MCOperand API naming. NFC.Jim Grosbach2015-05-131-1/+1
| | | | | | MCOperand::Create*() methods renamed to MCOperand::create*(). llvm-svn: 237275
* [x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507)Sanjay Patel2015-05-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finish the job that was abandoned in D6958 following the refactoring in http://reviews.llvm.org/rL230221: 1. Uncomment the intrinsic def for the AVX r_Int instruction. 2. Add missing r_Int entries to the load folding tables; there are already tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I haven't added any more in this patch. 3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ). So instead of this: movaps %xmm0, %xmm1 rcpss %xmm1, %xmm1 movss %xmm1, %xmm0 We should now get: rcpss %xmm0, %xmm0 And instead of this: vsqrtss %xmm0, %xmm0, %xmm1 vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3] We should now get: vsqrtss %xmm0, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D9504 llvm-svn: 236740
* [x86] remove RCPPS and RSQRTPS intrinsic instruction definitionsSanjay Patel2015-04-281-6/+0
| | | | | | | | | | | | | We don't need codegen-only intrinsic instructions for the vector forms of these instructions. This makes the reciprocal estimate instruction lowering identical to how we handle normal square roots: (V)SQRTPS / (V)SQRTPD. No existing regression tests fail with this patch. Differential Revision: http://reviews.llvm.org/D9301 llvm-svn: 236013
* [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpdSanjay Patel2015-04-171-1/+1
| | | | | | | | | | This is the AVX extension of r235014: http://llvm.org/viewvc/llvm-project?view=revision&revision=235014 Review: http://reviews.llvm.org/D8691 llvm-svn: 235210
* [X86] add an exedepfix entry for movq == movlps == movlpdSanjay Patel2015-04-151-0/+2
| | | | | | | | | | | | | This is a 1-line patch (with a TODO for AVX because that will affect even more regression tests) that lets us substitute the appropriate 64-bit store for the float/double/int domains. It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice. Differential Revision: http://reviews.llvm.org/D8691 llvm-svn: 235014
* [X86] Added SSE4.2 CRC32 memory folding patterns + testsSimon Pilgrim2015-04-031-0/+2
| | | | llvm-svn: 234013
* [X86][3DNow] Added 3DNow! memory folding patterns + testsSimon Pilgrim2015-04-031-0/+28
| | | | llvm-svn: 234008
* Remove the need to cache the subtarget in the X86 TargetRegisterInfoEric Christopher2015-03-121-1/+1
| | | | | | | classes. Use a Triple instead and simplify a lot of the querying logic to use lookups on the Triple. llvm-svn: 232071
* Convert push_back loops into append calls.Benjamin Kramer2015-02-281-2/+2
| | | | | | No functionality change intended. llvm-svn: 230849
* ArrayRefize memory operand folding. NFC.Benjamin Kramer2015-02-281-26/+21
| | | | llvm-svn: 230846
* Replace std::copy with a back inserter with vector append where feasibleBenjamin Kramer2015-02-281-1/+1
| | | | | | | | | All of the cases were just appending from random access iterators to a vector. Using insert/append can grow the vector to the perfect size directly and moves the growing out of the loop. No intended functionalty change. llvm-svn: 230845
* [X86][MMX] Reapply: Add MMX instructions to foldable tablesBruno Cardoso Lopes2015-02-251-0/+84
| | | | | | | | | | Reapply r230248. Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230499
* Revert "[X86][MMX] Add MMX instructions to foldable tables"Bruno Cardoso Lopes2015-02-231-82/+0
| | | | | | This reverts commit r230226 since it breaks win buildbots. llvm-svn: 230248
* [X86][MMX] Add MMX instructions to foldable tablesBruno Cardoso Lopes2015-02-231-0/+82
| | | | | | | | Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230226
* rename variables again because these tables also deal with stores; NFCSanjay Patel2015-02-171-31/+31
| | | | | | Suggestion by Simon Pilgrim llvm-svn: 229574
* Add comment to explain a non-obvious setting; NFC.Sanjay Patel2015-02-171-0/+6
| | | | | | | This is paraphrased from Simon Pilgrim's comment in: http://reviews.llvm.org/D7492 llvm-svn: 229566
OpenPOWER on IntegriCloud