summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Bring back the MOV64r0 pseudo instructionCraig Topper2018-10-241-4/+19
| | | | | | | | | | | | This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos. My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at. There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling. Differential Revision: https://reviews.llvm.org/D52757 llvm-svn: 345165
* X86: Do not optimize branches with undef eflags inputsMatthias Braun2018-10-221-0/+5
| | | | | | | | | | | | analyzeBranch()/insertBranch() etc. do not properly deal with an undef flag on the eflags input and used to produce invalid MIR. I don't see this ever affecting real world inputs (I don't think it is possible to produce undef flags with llvm IR), so I simply changed the code to bail out in this case. rdar://42122367 llvm-svn: 344970
* X86, AArch64, ARM: Do not attach debug location to spill/reload instructionsMatthias Braun2018-10-051-4/+2
| | | | | | | | | | | | | | This rebases and recommits r343520. hwasan should be fixed now and this shouldn't break the tests anymore. Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343895
* Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload ↵Matt Morehouse2018-10-021-2/+4
| | | | | | | | instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616
* X86, AArch64, ARM: Do not attach debug location to spill/reload instructionsMatthias Braun2018-10-011-4/+2
| | | | | | | | | | | Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520
* [X86] Change an llvm_unreachable to a report_fatal_error so the optimizer ↵Craig Topper2018-09-301-1/+1
| | | | | | | | will stop making us reach the other report_fatal_error in this function. There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error. llvm-svn: 343428
* Remove FrameAccess struct from hasLoadFromStackSlotSander de Smalen2018-09-051-4/+8
| | | | | | | | | | | | | | This removes the FrameAccess struct that was added to the interface in D51537, since the PseudoValue from the MachineMemoryOperand can be safely casted to a FixedStackPseudoSourceValue. Reviewers: MatzeB, thegameg, javed.absar Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51617 llvm-svn: 341454
* Extend hasStoreToStackSlot with list of FI accesses.Sander de Smalen2018-09-031-4/+10
| | | | | | | | | | | | | | | | | | For instructions that spill/fill to and from multiple frame-indices in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot should return an array of accesses, rather than just the first encounter of such an access. This better describes FI accesses for AArch64 (paired) LDP/STP instructions. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51537 llvm-svn: 341301
* Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructionsAlexander Ivchenko2018-08-301-3/+3
| | | | | | | | | | | ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072
* [MinGW] [X86] Add stubs for references to data variables that might end up ↵Martin Storsjo2018-08-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | imported from a dll Variables declared with the dllimport attribute are accessed via a stub variable named __imp_<var>. In MinGW configurations, variables that aren't declared with a dllimport attribute might still end up imported from another DLL with runtime pseudo relocs. For x86_64, this avoids the risk that the target is out of range for a 32 bit PC relative reference, in case the target DLL is loaded further than 4 GB from the reference. It also avoids having to make the text section writable at runtime when doing the runtime fixups, which makes it worthwhile to do for i386 as well. Add stub variables for all dso local data references where a definition of the variable isn't visible within the module, since the DLL data autoimporting might make them imported even though they are marked as dso local within LLVM. Don't do this for variables that actually are defined within the same module, since we then know for sure that it actually is dso local. Don't do this for references to functions, since there's no need for runtime pseudo relocations for autoimporting them; if a function from a different DLL is called without the appropriate dllimport attribute, the call just gets routed via a thunk instead. GCC does something similar since 4.9 (when compiling with -mcmodel=medium or large; from that version, medium is the default code model for x86_64 mingw), but only for x86_64. Differential Revision: https://reviews.llvm.org/D51288 llvm-svn: 340942
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-161-25/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-141-18/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead ↵Craig Topper2018-08-111-1/+6
| | | | | | | | | | of wrapping it in a SUBREG_TO_REG. Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode. This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test. llvm-svn: 339496
* [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution ↵Craig Topper2018-08-021-0/+27
| | | | | | | | | | | | domain fixing purposes These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference. Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs. Differential Revision: https://reviews.llvm.org/D50157 llvm-svn: 338735
* [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall constructionFrancis Visoiu Mistrih2018-07-301-1/+1
| | | | | | | | | | | | | | | | | | The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237
* [MachineOutliner][NFC] Move target frame info into OutlinedFunctionJessica Paquette2018-07-241-7/+8
| | | | | | | | | | | | | | Just some gardening here. Similar to how we moved call information into Candidates, this moves outlined frame information into OutlinedFunction. This allows us to remove TargetCostInfo entirely. Anywhere where we returned a TargetCostInfo struct, we now return an OutlinedFunction. This establishes OutlinedFunctions as more of a general repeated sequence, and Candidates as occurrences of those repeated sequences. llvm-svn: 337848
* [x86] Teach the x86 backend that it can fold between TCRETURNm* and ↵Chandler Carruth2018-07-241-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCRETURNr* and fix latent bugs with register class updates. Summary: Enabling this fully exposes a latent bug in the instruction folding: we never update the register constraints for the register operands when fusing a load into another operation. The fused form could, in theory, have different register constraints on its operands. And in fact, TCRETURNm* needs its memory operands to use tailcall compatible registers. I've updated the folding code to re-constrain all the registers after they are mapped onto their new instruction. However, we still can't enable folding in the general case from TCRETURNr* to TCRETURNm* because doing so may require more registers to be available during the tail call. If the call itself uses all but one register, and the folded load would require both a base and index register, there will not be enough registers to allocate the tail call. It would be better, IMO, to teach the register allocator to *unfold* TCRETURNm* when it runs out of registers (or specifically check the number of registers available during the TCRETURNr*) but I'm not going to try and solve that for now. Instead, I've just blocked the forward folding from r -> m, leaving LLVM free to unfold from m -> r as that doesn't introduce new register pressure constraints. The down side is that I don't have anything that will directly exercise this. Instead, I will be immediately using this it my SLH patch. =/ Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't have any tests that demonstrate the failure to update the memory operand register constraints. This patch still seems correct, but I'm nervous about the degree of testing due to this. Suggestions? Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49717 llvm-svn: 337845
* [MachineOutliner][NFC] Make Candidates own their call informationJessica Paquette2018-07-241-23/+28
| | | | | | | | | | | | | Before this, TCI contained all the call information for each Candidate. This moves that information onto the Candidates. As a result, each Candidate can now supply how it ought to be called. Thus, Candidates will be able to, say, call the same function in cheaper ways when possible. This also removes that information from TCI, since it's no longer used there. A follow-up patch for the AArch64 outliner will demonstrate this. llvm-svn: 337840
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-07-231-16/+44
| | | | | | | | | | | | | | models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
* [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by ↵Craig Topper2018-07-181-4/+16
| | | | | | | | using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357
* [X86] Add custom execution domain fixing for 128/256-bit integer logic ↵Craig Topper2018-07-151-0/+85
| | | | | | | | | | | | operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137
* [X86] Fix a subtle bug in the custom execution domain fixing for blends.Craig Topper2018-07-141-2/+2
| | | | | | | | The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted. Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file. llvm-svn: 337088
* [X86] The TEST instruction is eliminated when BSF/TZCNT is usedCraig Topper2018-07-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These changes cover the PR#31399. Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0 and it corresponds to the following llvm code: %cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true) %tobool = icmp eq i32 %v, 0 %.op = add nuw nsw i32 %cnt, 1 %add = select i1 %tobool, i32 0, i32 %.op and x86 asm code: bsfl %edi, %ecx addl $1, %ecx testl %edi, %edi movl $0, %eax cmovnel %ecx, %eax In this case the 'test' instruction can't be eliminated because the 'add' instruction modifies the EFLAGS, namely, ZF flag that is set by the 'bsf' instruction when 'x' is zero. We now produce the following code: bsfl %edi, %ecx movl $-1, %eax cmovnel %ecx, %eax addl $1, %eax Patch by Ivan Kulagin Reviewers: davide, craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48765 llvm-svn: 336768
* [X86] Teach X86InstrInfo::commuteInstructionImpl to use MOVSD/MOVSS for ↵Craig Topper2018-07-101-1/+21
| | | | | | | | | | BLEND under optsize when the immediate allows it. Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend. Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend. llvm-svn: 336731
* [MachineOutliner] Fix typo in getOutliningCandidateInfo function nameYvan Roux2018-07-041-1/+1
| | | | | | | | getOutlininingCandidateInfo -> getOutliningCandidateInfo Differential Revision: https://reviews.llvm.org/D48867 llvm-svn: 336285
* [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary ↵Craig Topper2018-07-021-2/+4
| | | | | | | | | | search. I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier. Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now. llvm-svn: 336075
* [X86] Remove the places that return nullptr from ↵Craig Topper2018-07-011-44/+10
| | | | | | | | X86InstrInfo::commuteInstructionImpl. findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl. llvm-svn: 336070
* [X86] Move the memory unfolding table creation into its own class and make ↵Craig Topper2018-07-011-5473/+58
| | | | | | | | | | | | it a ManagedStatic. Also move the static folding tables, their search functions and the new class into new cpp/h files. The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables. By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed. llvm-svn: 336056
* [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a ↵Craig Topper2018-06-301-4/+2
| | | | | | | | | | getFMA3Group free function. NFCI The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use. This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic. llvm-svn: 336055
* [X86] Use a std::vector for the memory unfolding table.Craig Topper2018-06-291-23/+36
| | | | | | | | | | Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated. This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups. Differential Revision: https://reviews.llvm.org/D48585 llvm-svn: 335994
* Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC ↵Jonas Devlieghere2018-06-281-44/+16
| | | | | | | | | | | | | code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894
* Unify sorted asserts to use the existing atomic patternBenjamin Kramer2018-06-281-3/+4
| | | | | | | These are all benign races and only visible in !NDEBUG. tsan complains about it, but a simple atomic bool is sufficient to make it happy. llvm-svn: 335823
* [X86] Make folding table checking threadsafeBenjamin Kramer2018-06-271-4/+3
| | | | | | | This is a benign race, but tsan likes to complain about it. Just make it happy. llvm-svn: 335788
* [X86] Don't store register and memory FMA3 opcodes in the same ↵Craig Topper2018-06-271-9/+3
| | | | | | | | | | X86InstrFMA3Group. Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group. The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed. llvm-svn: 335694
* [X86] Add comment about the sorting of the memory folding tables added in ↵Craig Topper2018-06-251-0/+15
| | | | | | r335501. llvm-svn: 335517
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-06-251-16/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508
* [X86] Sort the static memory folding tables by reg opcode. Remove the ↵Craig Topper2018-06-251-5405/+5327
| | | | | | | | | | | | reg->mem DenseMaps in favor of binary search. With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed. We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it. Differential Revision: https://reviews.llvm.org/D48527 llvm-svn: 335501
* [X86] Block commuting operand 1 of FMA*_Int instructions in ↵Craig Topper2018-06-251-59/+43
| | | | | | | | | | | | findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands. We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted. The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices. With that abort case pushed earlier, we can remove all the abort checks and replace with asserts. llvm-svn: 335446
* [X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include ↵Craig Topper2018-06-241-4/+4
| | | | | | a Z to match other EVEX instructions. llvm-svn: 335428
* Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"Reid Kleckner2018-06-211-44/+16
| | | | | | MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300
* [X86] Commit some comments that weren't in the medium code model patchReid Kleckner2018-06-211-4/+4
| | | | llvm-svn: 335298
* [X86] Implement more of x86-64 large and medium PIC code modelsReid Kleckner2018-06-211-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297
* [X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to iselCraig Topper2018-06-201-0/+17
| | | | | | | | | | I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173
* [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename ↵Jessica Paquette2018-06-191-6/+1
| | | | | | | | | | | | insertOutlinerEpilogue insertOutlinerPrologue was not used by any target, and prologue-esque code was beginning to appear in insertOutlinerEpilogue. Refactor that into one function, buildOutlinedFrame. This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue. llvm-svn: 335076
* [X86] Add all the FMA instructions direclty to the load folding table ↵Craig Topper2018-06-171-30/+544
| | | | | | | | instead of proxying through X86InstrFMA3Info. These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table. llvm-svn: 334915
* [X86] More additions to the load folding tables based on the autogenerated ↵Craig Topper2018-06-161-4/+777
| | | | | | | | tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898
* [X86] Add more instructions to the hasUndefRegUpdate list.Craig Topper2018-06-151-0/+31
| | | | | | Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future. llvm-svn: 334867
* [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.Craig Topper2018-06-151-4/+10
| | | | | | An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too. llvm-svn: 334848
* Revert r334802 "[X86] Prevent folding stack reloads with instructions that ↵Craig Topper2018-06-151-7/+4
| | | | | | | | have an undefined register update." There's a typo causing the build to fail. llvm-svn: 334803
* [X86] Prevent folding stack reloads with instructions that have an undefined ↵Craig Topper2018-06-151-4/+7
| | | | | | | | register update. We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency. llvm-svn: 334802
OpenPOWER on IntegriCloud