summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [FastISel] Copy the inline assembly dialect to the INLINEASM instruction.Craig Topper2019-10-051-0/+1
| | | | | | Fixes PR43575. llvm-svn: 373836
* [X86][AVX] Push sign extensions of comparison bool results through bitops ↵Simon Pilgrim2019-10-051-6/+26
| | | | | | | | | | | | (PR42025) As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop. This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended. Differential Revision: https://reviews.llvm.org/D68226 llvm-svn: 373834
* [SLP] avoid reduction transform on patterns that the backend can load-combineSanjay Patel2019-10-052-3/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833
* [X86] lowerShuffleAsLanePermuteAndRepeatedMask - variable renames. NFCI.Simon Pilgrim2019-10-051-27/+27
| | | | | | Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality. llvm-svn: 373832
* BranchFolding - IsBetterFallthrough - assert non-null pointers. NFCI.Simon Pilgrim2019-10-051-0/+2
| | | | | | Silences static analyzer null dereference warnings. llvm-svn: 373823
* Expose ProvidePositionalOption as a public APIMehdi Amini2019-10-051-1/+1
| | | | | | | | | | | | | | The motivation is to reuse the key value parsing logic here to parse instance specific pass options within the context of MLIR. The primary functionality exposed is the "," splitting for arrays and the logic for properly handling duplicate definitions of a single flag. Patch by: Parker Schuh <parkers@google.com> Differential Revision: https://reviews.llvm.org/D68294 llvm-svn: 373815
* Fix a *nasty* miscompile in experimental unordered atomic loweringPhilip Reames2019-10-051-3/+4
| | | | | | | | | | This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block. Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well. Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant". llvm-svn: 373814
* [RISCV] Added missing ImmLeaf predicatesAna Pazos2019-10-041-2/+4
| | | | | | simm9_lsb0 and simm12_lsb0 operand types were missing predicates. llvm-svn: 373812
* Invalidate assumption cache before outlining.Aditya Kumar2019-10-042-12/+21
| | | | | | | | | | | | | | Subscribers: llvm-commits Tags: #llvm Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson Reviewed by: vsk Differential Revision: https://reviews.llvm.org/D68478 llvm-svn: 373807
* Revert [CodeGen] Do the Simple Early Return in block-placement pass to ↵Reid Kleckner2019-10-041-46/+0
| | | | | | | | | | | | | | optimize the blocks This reverts r371177 (git commit f879c6875563c0a8cd838f1e13b14dd33558f1f8) It caused PR43566 by removing empty, address-taken MachineBasicBlocks. Such blocks may have references from blockaddress or other operands, and need more consideration to be removed. See the PR for a test case to use when relanding. llvm-svn: 373805
* [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> ↵Roman Lebedev2019-10-041-0/+28
| | | | | | | | | | | 'icmp sge/slt %x, 0' We do indeed already get it right in some cases, but only transitively, with one-use restrictions. Since we only need to produce a single comparison, it makes sense to match the pattern directly: https://rise4fun.com/Alive/kPg llvm-svn: 373802
* [InstCombine] Right-shift shift amount reassociation with truncation ↵Roman Lebedev2019-10-041-15/+19
| | | | | | | | | | | | | | | | | | | (PR43564, PR42391) Initially (D65380) i believed that if we have rightshift-trunc-rightshift, we can't do any folding. But as it usually happens, i was wrong. https://rise4fun.com/Alive/GEw https://rise4fun.com/Alive/gN2O In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have this very sequence, of two right shifts separated by trunc. And "just" so that happens, we apparently can fold the pattern if the total shift amount is either 0, or it's equal to the bitwidth of the innermost widest shift - i.e. if we are left with only the original sign bit. Which is exactly what is wanted there. llvm-svn: 373801
* [MachineOutliner] Disable outlining from noreturn functionsJessica Paquette2019-10-041-0/+6
| | | | | | | | | | | | | | | | Outlining from noreturn functions doesn't do the correct thing right now. The outliner should respect that the caller is marked noreturn. In the event that we have a noreturn function, and the outlined code is in tail position, the outliner will not see that the outlined function should be tail called. As a result, you end up with a regular call containing a return. Fixing this requires that we check that all candidates live inside noreturn functions. So, for the sake of correctness, don't outline from noreturn functions right now. Add machine-outliner-noreturn.mir to test this. llvm-svn: 373791
* [NFC] Add { } to silence compiler warning [-Wmissing-braces].Huihui Zhang2019-10-041-1/+1
| | | | | | | | | ../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} llvm-svn: 373784
* [ScheduleDAG] When a node is cloned, add an edge between the nodes.Eli Friedman2019-10-041-0/+4
| | | | | | | | | | | | | | | InstrEmitter's virtual register handling assumes that clones are emitted after the cloned node. Make sure this assumption actually holds. Fixes a "Node emitted out of order - early" assertion on the testcase. This is probably a very rare case to actually hit in practice; even without the explicit edge, the scheduler will usually end up scheduling the nodes in the expected order due to other constraints. Differential Revision: https://reviews.llvm.org/D68068 llvm-svn: 373782
* [JITLink] Silence GCC warnings. NFC.Martin Storsjo2019-10-041-1/+1
| | | | | | | | Use parentheses in an expression with mixed && and ||. Differential Revision: https://reviews.llvm.org/D68447 llvm-svn: 373779
* [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these ↵Craig Topper2019-10-042-146/+207
| | | | | | | | | | | | | | | | | based on the immediate in MCInstLower The immediate form of VPCMP can represent these completely. The vpcmpgt/eq are just shorter encodings. This patch removes the isel patterns and just swaps the opcodes and removes the immediate in MCInstLower. This matches where we do some other encodings tricks. Removes over 10K bytes from the isel table. Differential Revision: https://reviews.llvm.org/D68446 llvm-svn: 373766
* [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNCCraig Topper2019-10-041-0/+14
| | | | | | | | We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC Differential Revision: https://reviews.llvm.org/D68432 llvm-svn: 373765
* [ModuloSchedule] Do not remap terminatorsJames Molloy2019-10-041-1/+1
| | | | | | | | | | This is a trivial point fix. Terminator instructions aren't scheduled, so we shouldn't expect to be able to remap them. This doesn't affect Hexagon and PPC because their terminators are always hardware loop backbranches that have no register operands. llvm-svn: 373762
* [AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa ↵Dmitry Preobrazhensky2019-10-041-1/+2
| | | | | | | | | | | | opcodes See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68349 llvm-svn: 373745
* Fix MSVC "not all control paths return a value" warning. NFCI.Simon Pilgrim2019-10-041-0/+1
| | | | llvm-svn: 373741
* [AMDGPU][MC][GFX10] Enabled decoding of 'null' operandDmitry Preobrazhensky2019-10-041-0/+1
| | | | | | | | | | See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68348 llvm-svn: 373740
* ARM-Darwin: keep the frame register reserved even if not updated.Tim Northover2019-10-041-1/+1
| | | | | | | | Darwin platforms need the frame register to always point at a valid record even if it's not updated in a leaf function. Backtraces are more important than one extra GPR. llvm-svn: 373738
* [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructionsDmitry Preobrazhensky2019-10-041-1/+1
| | | | | | | | | | See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68347 llvm-svn: 373736
* Fix MSVC "not all control paths return a value" warning. NFCI.Simon Pilgrim2019-10-041-0/+2
| | | | llvm-svn: 373730
* Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-10-041-1/+1
| | | | llvm-svn: 373729
* [DebugInfo] LiveDebugValues: move DBG_VALUE creation into VarLoc classJeremy Morse2019-10-041-107/+137
| | | | | | | | | | | | | | | | | | | | | | Rather than having a mixture of location-state shared between DBG_VALUEs and VarLoc objects in LiveDebugValues, this patch makes VarLoc the master record of variable locations. The refactoring means that the transfer of locations from one place to another is always a performed by an operation on an existing VarLoc, that produces another transferred VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once all locations are known. As a plus, there is now only one method where DBG_VALUEs can be created. The test case added covers a circumstance that is now impossible to express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt, previously it would have been restored-from-spill as a direct DBG_VALUE. We now don't lose this information along the way, as VarLocs always refer back to the "original" non-transfer DBG_VALUE, and we can always work out whether a location was "originally" indirect. Differential Revision: https://reviews.llvm.org/D67398 llvm-svn: 373727
* [DebugInfo] LiveDebugValues: defer DBG_VALUE creation during analysisJeremy Morse2019-10-041-8/+7
| | | | | | | | | | | | | | | | | | When transfering variable locations from one place to another, LiveDebugValues immediately creates a DBG_VALUE representing that transfer. This causes trouble if the variable location should subsequently be invalidated by a loop back-edge, such as in the added test case: the transfer DBG_VALUE from a now-invalid location is used as proof that the variable location is correct. This is effectively a self-fulfilling prophesy. To avoid this, defer the insertion of transfer DBG_VALUEs until after analysis has completed. Some of those transfers are still sketchy, but we don't propagate them into other blocks now. Differential Revision: https://reviews.llvm.org/D67393 llvm-svn: 373720
* AMDGPU/GlobalISel: Fix using wrong addrspace for apertureMatt Arsenault2019-10-041-1/+3
| | | | | | | This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716
* AMDGPU/GlobalISel: Select G_PTRTOINTMatt Arsenault2019-10-041-0/+1
| | | | llvm-svn: 373715
* AMDGPU/GlobalISel: Support wave32 waterfall loopsMatt Arsenault2019-10-041-22/+30
| | | | llvm-svn: 373714
* [X86] Enable inline memcmp() to use AVX512David Zarzycki2019-10-041-2/+1
| | | | llvm-svn: 373706
* Revert "[Symbolize] Use the local MSVC C++ demangler instead of relying on ↵Martin Storsjo2019-10-041-4/+37
| | | | | | | | | dbghelp. NFC." This reverts SVN r373698, as it broke sanitizer tests, e.g. in http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52441. llvm-svn: 373701
* [AMDGPU][SILoadStoreOptimizer] NFC: Refactor codePiotr Sobczak2019-10-041-120/+80
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699
* [Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC.Martin Storsjo2019-10-041-37/+4
| | | | | | | | | This allows making a couple llvm-symbolizer tests run in all environments. Differential Revision: https://reviews.llvm.org/D68133 llvm-svn: 373698
* [JITLink] Explicitly destroy bumpptr-allocated blocks to avoid a memory leak.Lang Hames2019-10-041-0/+6
| | | | llvm-svn: 373693
* [JITLink] Fix an unused variable warning.Lang Hames2019-10-041-3/+2
| | | | llvm-svn: 373692
* [JITLink] Switch from an atom-based model to a "blocks and symbols" model.Lang Hames2019-10-0415-1328/+1488
| | | | | | | | | | | | | | | | | | | | | | | | In the Atom model the symbols, content and relocations of a relocatable object file are represented as a graph of atoms, where each Atom represents a contiguous block of content with a single name (or no name at all if the content is anonymous), and where edges between Atoms represent relocations. If more than one symbol is associated with a contiguous block of content then the content is broken into multiple atoms and layout constraints (represented by edges) are introduced to ensure that the content remains effectively contiguous. These layout constraints must be kept in mind when examining the content associated with a symbol (it may be spread over multiple atoms) or when applying certain relocation types (e.g. MachO subtractors). This patch replaces the Atom model in JITLink with a blocks-and-symbols model. The blocks-and-symbols model represents relocatable object files as bipartite graphs, with one set of nodes representing contiguous content (Blocks) and another representing named or anonymous locations (Symbols) within a Block. Relocations are represented as edges from Blocks to Symbols. This scheme removes layout constraints (simplifying handling of MachO alt-entry symbols, and hopefully ELF sections at some point in the future) and simplifies some relocation logic. llvm-svn: 373689
* [RISCV] Split SP adjustment to reduce the offset of callee saved register ↵Shiva Chen2019-10-042-1/+90
| | | | | | | | | | | | | | | | | | | | spill and restore We would like to split the SP adjustment to reduce the instructions in prologue and epilogue as the following case. In this way, the offset of the callee saved register could fit in a single store. add sp,sp,-2032 sw ra,2028(sp) sw s0,2024(sp) sw s1,2020(sp) sw s3,2012(sp) sw s4,2008(sp) add sp,sp,-64 Differential Revision: https://reviews.llvm.org/D68011 llvm-svn: 373688
* LowerTypeTests: Rename local functions to avoid collisions with identically ↵Peter Collingbourne2019-10-031-0/+11
| | | | | | | | | | | named functions in ThinLTO modules. Without this we can encounter link errors or incorrect behaviour at runtime as a result of the wrong function being referenced. Differential Revision: https://reviews.llvm.org/D67945 llvm-svn: 373678
* [MemorySSA] Don't hoist stores if interfering uses (as calls) exist.Alina Sbirlea2019-10-031-1/+11
| | | | llvm-svn: 373674
* [DAGCombiner] add operation legality checks before creating shift ops (PR43542)Sanjay Patel2019-10-031-1/+6
| | | | | | | | | | | | | | As discussed on llvm-dev and: https://bugs.llvm.org/show_bug.cgi?id=43542 ...we have transforms that assume shift operations are legal and transforms to use them are profitable, but that may not hold for simple targets. In this case, the MSP430 target custom lowers shifts by repeating (many) simpler/fixed ops. That can be avoided by keeping this code as setcc/select. Differential Revision: https://reviews.llvm.org/D68397 llvm-svn: 373666
* Reland r349624: Let TableGen write output only if it changed, instead of ↵Nico Weber2019-10-031-8/+29
| | | | | | | | | | | doing so in cmake Move the write-if-changed logic behind a flag and don't pass it with the MSVC generator. msbuild doesn't have a restat optimization, so not doing write-if-change there doesn't have a cost, and it should fix whatever causes PR43385. llvm-svn: 373664
* DebugInfo: Generalize rnglist emission as a precursor to reusing it for ↵David Blaikie2019-10-031-15/+25
| | | | | | loclist emission llvm-svn: 373663
* [AArch64InstPrinter] prefer bfi to bfc for < armv8.2-aNick Desaulniers2019-10-031-1/+2
| | | | | | | | | | | | | | | | | | | Summary: Fixes pr/42576. Link: https://github.com/ClangBuiltLinux/linux/issues/697 Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D68356 llvm-svn: 373655
* [PowerPC] Adjust the naming and operand order of fnmsub patternsJinsong Ji2019-10-031-18/+18
| | | | | | | | | | | | | | | | | | | | | | Summary: This is follow up patch of https://reviews.llvm.org/D67595. Adjust naming and the Commutable operands for additional patterns to make it easier to read. The testcase update also show that we can save some unecessary fmr as well. Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai Reviewed By: #powerpc, nemanjai Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68112 llvm-svn: 373652
* [NFC] Fix unused variable in release buildsJordan Rupprecht2019-10-031-1/+2
| | | | llvm-svn: 373646
* [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors ↵Craig Topper2019-10-031-0/+44
| | | | | | | | | | | | | truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. This patch recognizes the shuffle pattern we get from a v8i64->v8i8 truncate when v8i64 isn't a legal type. With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector. Diffrential Revision: https://reviews.llvm.org/D68374 llvm-svn: 373645
* [X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.Simon Pilgrim2019-10-031-7/+7
| | | | | | | | | | We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373641
* AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELTMatt Arsenault2019-10-031-5/+77
| | | | llvm-svn: 373639
OpenPOWER on IntegriCloud