summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrCompiler.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of ↵Craig Topper2017-09-181-44/+19
| | | | | | | | | | | | sub_8bit_hi I'm pretty sure that InstrEmitter::EmitSubregNode will take care of this itself by calling ConstrainForSubReg which in turn calls TRI->getSubClassWithSubReg. I think Jakob Stoklund Olesen alluded to this in his commit message for r141207 which added the code to EmitSubregNode. Differential Revision: https://reviews.llvm.org/D37843 llvm-svn: 313557
* [X86] Don't disable slow INC/DEC if optimizing for sizeCraig Topper2017-09-091-6/+8
| | | | | | | | | | | | | | | | | Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866
* [x86] fold the mask op on 8- and 16-bit rotatesSanjay Patel2017-08-141-3/+39
| | | | | | | | | | | | | | | | | | | | | | | Ref the post-commit thread for r310770: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170807/478507.html The motivating cases as 'C' source examples can look like this: unsigned char rotate_right_8(unsigned char v, int shift) { // shift &= 7; v = ( v >> shift ) | ( v << ( 8 - shift ) ); return v; } https://godbolt.org/g/K6rc1A Notice that the source doesn't contain UB-safe masked shift amounts, but instcombine created those in order to produce narrow rotate patterns. This should be the last step needed to resolve PR34046: https://bugs.llvm.org/show_bug.cgi?id=34046 Differential Revision: https://reviews.llvm.org/D36644 llvm-svn: 310849
* [X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful ↵Craig Topper2017-07-231-0/+29
| | | | | | | | complexity adjustment to keep shift by immediate using the legacy instructions. These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns. llvm-svn: 308834
* [X86] Allow masks with more than 6 bits set on the x << (y & mask) ↵Craig Topper2017-07-201-1/+1
| | | | | | optimization for the 64-bit memory shifts. llvm-svn: 308657
* [X86] Use SARX/SHLX/SHLX instructions for (shift x (and y, (BitWidth-1)))Craig Topper2017-07-201-0/+31
| | | | | | Fixes PR33841. llvm-svn: 308591
* Add extra operand to CALLSEQ_START to keep frame part set up previouslySerge Pavlov2017-05-091-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527
* [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵Craig Topper2017-04-281-5/+5
| | | | | | | | | | | | simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-301-2/+2
| | | | | | Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072
* [X86] Cleanup the AddedComplexity values on move immediate instructions. NFCCraig Topper2017-03-171-5/+5
| | | | | | This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes. llvm-svn: 298043
* X86: Introduce relocImm-based patterns for cmp.Peter Collingbourne2017-02-091-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636
* [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8BNikolai Bozhenov2016-11-241-1/+1
| | | | | | | | | | | | | | | | | | The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875
* Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-091-47/+0
| | | | | | | | | represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420
* Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-091-0/+47
| | | | | | | | | represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385
* X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable ↵Peter Collingbourne2016-11-091-47/+0
| | | | | | | | | | | | | | | immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384
* [X86] Use implicit masking of SHLD/SHRD shift double instructionsSimon Pilgrim2016-08-011-3/+16
| | | | | | Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value llvm-svn: 277341
* Avoid unnecessary 32-bit to 64-bit zero extensions followingDavid L Kreitzer2016-07-291-4/+2
| | | | | | | | | 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148
* [X86] Fix stupid typo in isel lowering.Eli Friedman2016-07-141-1/+1
| | | | | | | Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376
* Delete the IsStatic predicate.Rafael Espindola2016-06-271-6/+6
| | | | | | In all its uses it was equivalent to IsNotPIC. llvm-svn: 273943
* [X86]: Add a pattern that uses GR16_ABCD rather than GR32_ABCD to avoid ↵Kevin B. Smith2016-05-311-0/+4
| | | | | | | | falsely marking whole 32 bit register as live. Differential Revision: http://reviews.llvm.org/D20649 llvm-svn: 271341
* Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA ↵Hans Wennborg2016-05-181-12/+21
| | | | | | | | | | | | instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949
* Revert r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"Hans Wennborg2016-05-171-21/+12
| | | | | | Seems to have broken the Windows ASan bot. Reverting while investigating. llvm-svn: 269833
* X86: Avoid using _chkstk when lowering WIN_ALLOCA instructionsHans Wennborg2016-05-171-12/+21
| | | | | | | | | | | | | | | This patch moves the expansion of WIN_ALLOCA pseudo-instructions into a separate pass that walks the CFG and lowers the instructions based on a conservative estimate of the offset between the stack pointer and the lowest accessed stack address. The goal is to reduce binary size and run-time costs by removing calls to _chkstk. While it doesn't fix all the code quality problems with inalloca calls, it's an incremental improvement for PR27076. Differential Revision: http://reviews.llvm.org/D20263 llvm-svn: 269828
* [X86] Fix a bug in LOCK arithmetic operation pattern matching where the ↵Craig Topper2016-05-021-1/+1
| | | | | | | | wrong immediate predicate check was being used for 64-bit instructions with 8-bit immediates. This didn't cause a bug because the order of the patterns ensured that the 64-bit instructions with 32-bit immediates were selected first. llvm-svn: 268212
* [X86] Fix the lowering of TLS calls.Quentin Colombet2016-04-271-3/+6
| | | | | | | | | | | The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797
* [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsizeHans Wennborg2016-03-251-0/+12
| | | | | | | | | | | | 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440
* X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)Hans Wennborg2016-03-251-2/+13
| | | | | | | | | This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
* [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.Quentin Colombet2016-03-121-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg[8|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325
* [X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI.Ahmed Bougacha2016-02-291-27/+45
| | | | | | | | | | | | | | This is long-standing dirtiness, as acknowledged by r77582: The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. Doing this before selection will let us combine away some constructs. Differential Revision: http://reviews.llvm.org/D17659 llvm-svn: 262244
* Optimized loading (zextload) of i1 value from memory.Elena Demikhovsky2016-02-251-5/+6
| | | | | | | | | | | This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793. Extra "and" causes performance degradation. We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits. Differential Revision: http://reviews.llvm.org/D17541 llvm-svn: 261828
* [X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled.Davide Italiano2016-02-201-2/+2
| | | | | | | | | | TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387
* [X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. ↵Craig Topper2016-01-051-4/+3
| | | | | | It was only needed for rematerialization. llvm-svn: 256818
* [X86] Add OpSize32 to OR32mrLocked instruction to match the normal OR32mr ↵Craig Topper2016-01-051-2/+2
| | | | | | instruction. llvm-svn: 256817
* Revert "[X86] Use push-pop for materializing small constants under 'minsize'"David Majnemer2016-01-051-13/+2
| | | | | | | | | | | | The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808
* [X86] Use push-pop for materializing small constants under 'minsize'Hans Wennborg2015-12-171-2/+13
| | | | | | | | | | | | | Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936
* [X86] Smaller code for materializing 32-bit 1 and -1 constantsHans Wennborg2015-12-151-0/+16
| | | | | | | | | "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656
* [X86] Part 2 to fix x86-64 fp128 calling convention.Chih-Hung Hsieh2015-12-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | Part 1 was submitted in http://reviews.llvm.org/D15134. Changes in this part: * X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class. * X86CallingConv.td: Pass f128 values in XMM registers or on stack. * X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128. * X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code. * SelectionDAGBuilder.cpp: Fix infinite loop when f128 type can have VT == TLI.getTypeToTransformTo(Ctx, VT). * Add unit tests for x86-64 fp128 type. Differential Revision: http://reviews.llvm.org/D11438 llvm-svn: 255558
* [WinEH] Remove isBarrier from instructions that do not returnReid Kleckner2015-11-091-2/+2
| | | | | | Fixes machine verification failures with David's latest EH change. llvm-svn: 252541
* [WinEH] Don't emit CATCHRET from visitCatchPadDavid Majnemer2015-11-091-1/+5
| | | | | | | Instead, emit a CATCHPAD node which will get selected to a target specific sequence. llvm-svn: 252528
* [WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EHReid Kleckner2015-11-061-5/+17
| | | | | | | | | | | | | | | | | | | | | | | This adds the EH_RESTORE x86 pseudo instr, which is responsible for restoring the stack pointers: EBP and ESP, and ESI if stack realignment is involved. We only need this on 32-bit x86, because on x64 the runtime restores CSRs for us. Previously we had to keep the CATCHRET instruction around during SEH so that we could convince X86FrameLowering to restore our frame pointers. Now we can split these instructions earlier. This was confusing, because we had a return instruction which wasn't really a return and was ultimately going to be removed by X86FrameLowering. This change also simplifies X86FrameLowering, which really shouldn't be building new MBBs. No observable functional change currently, but with the new register mask stuff in D14407, CATCHRET will become a register allocator barrier, and our existing tests rely on us having reasonable register allocation around SEH. llvm-svn: 252266
* x86: preserve flags when folding atomic operationsJF Bastien2015-10-151-15/+20
| | | | | | | | | | D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. I fixed some of this issue in D13680 but had missed INC/DEC. This patch adds the missing EFLAGS definition. llvm-svn: 250438
* function names should start with a lower case letter; NFCSanjay Patel2015-10-131-2/+2
| | | | llvm-svn: 250174
* x86: preserve flags when folding atomic operationsJF Bastien2015-10-131-6/+8
| | | | | | | | | | | | | | | | | | | | | Summary: D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. This patch adds the missing EFLAGS definition. Floating point operations don't set flags, the subsequent fadd optimization is therefore correct. The same applies for surrounding load/store optimizations. Reviewers: rsmith, rtrieu Subscribers: llvm-commits, reames, morisset Differential Revision: http://reviews.llvm.org/D13680 llvm-svn: 250135
* [X86] Remove unnecessary AddComplexity directive. The instruction is already ↵Craig Topper2015-10-061-1/+0
| | | | | | wrapped in the equivalent earlier. NFC llvm-svn: 249369
* [WinEH] Make FuncletLayout more robust against catchretDavid Majnemer2015-10-011-4/+4
| | | | | | | | | Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052
* [WinEH] Make funclet return instrs pseudo instrsReid Kleckner2015-09-171-14/+4
| | | | | | | | | This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936
* [WinEH] Add codegen support for cleanuppad and cleanupretReid Kleckner2015-09-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219
* [WinEH] Emit prologues and epilogues for funcletsReid Kleckner2015-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: 32-bit funclets have short prologues that allocate enough stack for the largest call in the whole function. The runtime saves CSRs for the funclet. It doesn't restore CSRs after we finally transfer control back to the parent funciton via a CATCHRET, but that's a separate issue. 32-bit funclets also have to adjust the incoming EBP value, which is what llvm.x86.seh.recoverframe does in the old model. 64-bit funclets need to spill CSRs as normal. For simplicity, this just spills the same set of CSRs as the parent function, rather than trying to compute different CSR sets for the parent function and each funclet. 64-bit funclets also allocate enough stack space for the largest outgoing call frame, like 32-bit. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12546 llvm-svn: 247092
* [WinEH] Add some support for code generating catchpadReid Kleckner2015-08-271-0/+9
| | | | | | | We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235
* [X86] Remove references to _ftol2Michael Kuperstein2015-08-251-20/+0
| | | | | | | As of r245924, _ftol2 is no longer used for fptoui on MS platforms. Remove the dead code associated with it. llvm-svn: 245925
OpenPOWER on IntegriCloud