summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-153-0/+21
| | | | llvm-svn: 289877
* AMDGPU: Fix asserting on returned tail callsMatt Arsenault2016-12-151-2/+4
| | | | llvm-svn: 289868
* AMDGPU: Assembler support for vintrp instructionsMatt Arsenault2016-12-153-6/+108
| | | | llvm-svn: 289866
* [GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC.Ahmed Bougacha2016-12-153-3/+3
| | | | | | | | MachineLegalizer used to be the name of both the class and the member, causing GCC errors. r276522 fixed that by renaming the member to just 'Legalizer'. The 'class' workaround isn't necessary anymore; drop it. llvm-svn: 289848
* [x86] use a single shufps for 256-bit vectors when it can save instructionsSanjay Patel2016-12-151-1/+13
| | | | | | | | | | | This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
* [AArch64] Guard Misaligned 128-bit store penalty by subtarget featureMatthew Simpson2016-12-151-1/+2
| | | | | | | | | This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
* [AArch64][GlobalISel] Remove redundant RBI comments. NFC.Ahmed Bougacha2016-12-151-20/+1
| | | | | | | It's brittle, and Doxygen already picks the overriden method's comment anyway. llvm-svn: 289844
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-151-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [X86][SSE] Fix domains for scalar store instructionsSimon Pilgrim2016-12-151-0/+4
| | | | | | As discussed on D27692 llvm-svn: 289834
* [lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵Jacques Pienaar2016-12-152-3/+14
| | | | | | | | sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
* [X86][AVX512] Moved instruction domain lookups to the right table. NFCI.Simon Pilgrim2016-12-151-4/+4
| | | | | | Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830
* [X86][SSE] Fix domains for VZEXT_LOAD type instructionsSimon Pilgrim2016-12-151-0/+6
| | | | | | | | Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
* Fix for regression after Global Load Scalarization patchAlexander Timofeev2016-12-151-1/+2
| | | | llvm-svn: 289822
* Extract LaneBitmask into a separate typeKrzysztof Parzyszek2016-12-156-25/+26
| | | | | | | | | | | | Specifically avoid implicit conversions from/to integral types to avoid potential errors when changing the underlying type. For example, a typical initialization of a "full" mask was "LaneMask = ~0u", which would result in a value of 0x00000000FFFFFFFF if the type was extended to uint64_t. Differential Revision: https://reviews.llvm.org/D27454 llvm-svn: 289820
* [CostModel][X86] Updated reverse shuffle costsSimon Pilgrim2016-12-151-5/+95
| | | | llvm-svn: 289819
* [Power9] Allow AnyExt immediates for XXSPLTIBNemanja Ivanovic2016-12-152-7/+7
| | | | | | | | | | In some situations, the BUILD_VECTOR node that builds a v18i8 vector by a splat of an i8 constant will end up with signed 8-bit values and other situations, it'll end up with unsigned ones. Handle both situations. Fixes PR31340. llvm-svn: 289804
* [AVR] Support floats in the instrumention passDylan McKay2016-12-151-14/+14
| | | | | | This also refactors some common code into the 'GetTypeName' method. llvm-svn: 289803
* [Thumb] Teach ISel how to lower compares of AND bitmasks efficientlySjoerd Meijer2016-12-152-4/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794
* [AVR] Add argument indices to the instrumention hook functionsDylan McKay2016-12-151-2/+4
| | | | | | | This allows the instrumention hook functions to do better pretty-printing. llvm-svn: 289793
* Fix for build warning in execute-only supportPrakhar Bahuguna2016-12-151-2/+2
| | | | llvm-svn: 289788
* [ARM] Implement execute-only support in CodeGenPrakhar Bahuguna2016-12-1513-17/+113
| | | | | | | | | | | | | | | | | | | | This implements execute-only support for ARM code generation, which prevents the compiler from generating data accesses to code sections. The following changes are involved: * Add the CodeGen option "-arm-execute-only" to the ARM code generator. * Add the clang flag "-mexecute-only" as well as the GCC-compatible alias "-mpure-code" to enable this option. * When enabled, literal pools are replaced with MOVW/MOVT instructions, with VMOV used in addition for floating-point literals. As the MOVT instruction is required, execute-only support is only available in Thumb mode for targets supporting ARMv8-M baseline or Thumb2. * Jump tables are placed in data sections when in execute-only mode. * The execute-only text section is assigned section ID 0, and is marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'. This also overrides selection of ELF sections for globals. llvm-svn: 289784
* [NVPTX] Remove dead #defines from NVPTXUtilities.h.Justin Lebar2016-12-151-3/+0
| | | | llvm-svn: 289747
* Use PIC relocation model as default for PowerPC64 ELF.Joerg Sonnenberger2016-12-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the PowerPC64 code generation for the ELF ABI is already PIC. There are four main exceptions: (1) Constant pointer arrays etc. should in writeable sections. (2) The TOC restoration NOP after a call is needed for all global symbols. While GNU ld has a workaround for questionable GCC self-calls, we trigger the checks for calls from COMDAT sections as they cross input sections and are therefore not considered self-calls. The current decision is questionable and suboptimal, but outside the scope of the change. (3) TLS access can not use the initial-exec model. (4) Jump tables should use relative addresses. Note that the current encoding doesn't work for the large code model, but it is more compact than the default for any non-trivial jump table. Improving this is again beyond the scope of this change. At least (1) and (3) are assumptions made in target-independent code and introducing additional hooks is a bit messy. Testing with clang shows that a -fPIC binary is 600KB smaller than the corresponding -fno-pic build. Separate testing from improved jump table encodings would explain only about 100KB or so. The rest is expected to be a result of more aggressive immediate forming for -fno-pic, where the -fPIC binary just uses TOC entries. This change brings the LLVM output in line with the GCC output, other PPC64 compilers like XLC on AIX are known to produce PIC by default as well. The relocation model can still be provided explicitly, i.e. when using MCJIT. One test case for case (1) is included, other test cases with relocation mode sensitive behavior are wired to static for now. They will be reviewed and adjusted separately. Differential Revision: https://reviews.llvm.org/D26566 llvm-svn: 289743
* [NVPTX] Remove dead code.Justin Lebar2016-12-145-130/+0
| | | | | | | | | | | I've chosen to remove NVPTXInstrInfo::CanTailMerge but not NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead) because while the latter two are reasonably useful utilities, the former cannot be used safely: It relies on successful address space inference to identify writes to shared memory, but addrspace inference is a best-effort thing. llvm-svn: 289740
* [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2016-12-145-284/+249
| | | | | | other minor fixes (NFC). llvm-svn: 289736
* [NVPTX] Support .maxnreg annotation.Justin Lebar2016-12-143-0/+9
| | | | | | | | | | Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D27638 llvm-svn: 289729
* [NVPTX] Remove string constants from NVPTXBaseInfo.h.Justin Lebar2016-12-143-165/+88
| | | | | | | | | | | | | | | | | | Summary: Previously they were defined as a 2D char array in a header file. This is kind of overkill -- we can let the linker lay out these strings however it pleases. While we're at it, we might as well just inline these constants where they're used, as each of them is used only once. Also move NVPTXUtilities.{h,cpp} into namespace llvm. Reviewers: tra Subscribers: jholewinski, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D27636 llvm-svn: 289728
* [ARM] Split 128-bit vectors in BUILD_VECTOR loweringEli Friedman2016-12-141-0/+21
| | | | | | | | | | | | | Given that INSERT_VECTOR_ELT operates on D registers anyway, combining 64-bit vectors into a 128-bit vector is basically free. Therefore, try to split BUILD_VECTOR nodes before giving up and lowering them to a series of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically better lowerings; see testcases for examples. Inspired by similar code in the x86 backend for AVX. Differential Revision: https://reviews.llvm.org/D27624 llvm-svn: 289706
* fix gcc warning about a superfluous ;Nico Weber2016-12-141-1/+1
| | | | llvm-svn: 289705
* [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.Eli Friedman2016-12-143-19/+92
| | | | | | | | | | | | | | | | Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289703
* Fix build failure due to r289674 on certain systemsYaxun Liu2016-12-141-1/+0
| | | | | | Removed a useless include which caused conflict. llvm-svn: 289700
* AMDGPU: Emit runtime metadata version 2 as YAMLYaxun Liu2016-12-147-403/+550
| | | | | | Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674
* AMDGPU: Make AllocationPriority of SGPRs higher than VGPRsMatt Arsenault2016-12-141-11/+13
| | | | | | | | Since SGPRs should spill to VGPRs, they should be allocated first. I don't think this is sufficient for SGPRs to always spill to VGPRs though. llvm-svn: 289671
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-141-0/+10
| | | | | | | | | | UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
* AMDGPU: Change vintrp printingMatt Arsenault2016-12-144-6/+37
| | | | llvm-svn: 289664
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-141-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
* Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask ↵Michael Zuckerman2016-12-141-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | instruction adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945. Test explanation: Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP. In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers"). Missed optimization opportunity: vpcmpled %zmm0, %zmm1, %k0 knotw %k0, %k1 can be combine to vpcmpgtd %zmm0, %zmm2, %k1 Reviewers: 1. delena 2. igorb Commited after check all Differential Revision: https://reviews.llvm.org/D27160 llvm-svn: 289653
* Replace APFloatBase static fltSemantics data members with getter functionsStephan Bergmann2016-12-149-34/+34
| | | | | | | | | | | | | At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647
* [AVR] Add a function instrumentation passDylan McKay2016-12-144-0/+224
| | | | | | This will be used for an on-chip test suite. llvm-svn: 289641
* [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)Hal Finkel2016-12-141-40/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts: 1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration. 2. When determining when we need to add a nop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction). First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places. Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or come from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections). There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return. Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and I don't think we should try to make it appear to work in the backend in spite of that fact. This will yield subtle bugs if interposition is attempted. As a result, regardless of whether we're in PIC mode, we don't assume that we need to add the nop to support the possibility of ELF interposition. However, the necessary check is in place (i.e. calling GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for which interposition is allowed at the IR level, we'll add the nop as necessary. In the mean time, we'll generate more tail calls and fewer nops when compiling position-independent code. Differential Revision: https://reviews.llvm.org/D27231 llvm-svn: 289638
* Add support for Samsung Exynos M3 (NFC)Evandro Menezes2016-12-132-1/+8
| | | | llvm-svn: 289613
* [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2016-12-137-402/+359
| | | | | | other minor fixes (NFC). llvm-svn: 289604
* Generalize strided store pattern in interleave access passAlina Sbirlea2016-12-132-13/+73
| | | | | | | | | | | | | | | | | Summary: This patch aims to generalize matching of the strided store accesses to more general masks. The more general rule is to have consecutive accesses based on the stride: [x, y, ... z, x+1, y+1, ...z+1, x+2, y+2, ...z+2, ...] All elements in the masks need not form a contiguous space, there may be gaps. As before, undefs are allowed and filled in with adjacent element loads. Reviewers: HaoLiu, mssimpso Subscribers: mkuper, delena, llvm-commits Differential Revision: https://reviews.llvm.org/D23646 llvm-svn: 289573
* Revert "AArch64CollectLOH: Rewrite as block-local analysis."Matthias Braun2016-12-131-279/+841
| | | | | | | | | | | | | This is not always behaving as expected as it turns out block live-in lists are only correct most of the time. Still waiting for reviews on https://reviews.llvm.org/D27559 to have them correct all of the time. See also http://llvm.org/PR31361, rdar://25117107 This reverts commit r288567. This reverts commit r288561. llvm-svn: 289570
* GlobalISel: fix GOT accesses on AArch64.Tim Northover2016-12-131-2/+3
| | | | | | | | We were using the correct pseudo-instruction, but because the operand's flags weren't set correctly we still ended up emitting incorrect relocations during MC lowering. llvm-svn: 289566
* [mips] Fix comment to respect 80 chars per line; NFCSimon Dardis2016-12-131-6/+6
| | | | llvm-svn: 289530
* [mips] Fix compact branch hazard detectionSimon Dardis2016-12-131-22/+42
| | | | | | | | | | | | | In certain cases it is possible that transient instructions such as %reg = IMPLICIT_DEF as a single instruction in a basic block to reach the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to properly look through such cases. Reviewers: vkalintiris, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D27209 llvm-svn: 289529
* [GlobalISel] Move extendRegister where it belongs. NFCIDiana Picus2016-12-131-28/+0
| | | | | | Apparently I missed this one when I moved ValueHandler back in r288658. Sorry! llvm-svn: 289528
* [AVR] Add an 'relax memory operation' passDylan McKay2016-12-135-2/+157
| | | | | | | | | | | | | | | | | | | | | | Summary: This pass will be used to relax instructions which use out of bounds memory accesses to equivalent operations that can work with the addresses. The pass currently implements relaxation for the STDWPtrQRr instruction. Without this pass, an assertion error would be hit in the pseudo expansion pass. In the future, we will need to add more instructions to this pass. We can do that on a case-by-case basic. Reviewers: arsenm, kparzysz Subscribers: wdng, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D27650 llvm-svn: 289517
* [peephole] Enhance folding logic to work for STATEPOINTsPhilip Reames2016-12-131-18/+7
| | | | | | | | | | | | | | The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510
OpenPOWER on IntegriCloud