summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] Eliminate stores to constant memoryPhilip Reames2019-04-222-0/+24
| | | | | | | | | | | | If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store. The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead. Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date. Differential Revision: https://reviews.llvm.org/D60659 llvm-svn: 358919
* [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify ↵Philip Reames2019-04-222-6/+2
| | | | | | | | from InstCombine In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask. llvm-svn: 358913
* Use const DebugLoc&Matt Arsenault2019-04-221-2/+2
| | | | llvm-svn: 358910
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-2/+7
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* [IPSCCP] Add missing `AssumptionCacheTracker` dependencyJustin Bogner2019-04-221-0/+1
| | | | | | | | | Back in August, r340525 introduced a dependency on the assumption cache tracker in the ipsccp pass, but that commit missed a call to INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache improperly registered if SCCP is the only thing that pulls it in. llvm-svn: 358903
* [LPM/BPI] Preserve BPI through trivial loop pass pipeline (e.g. LCSSA, ↵Philip Reames2019-04-222-0/+13
| | | | | | | | | | | | | | LoopSimplify) Currently, we do not expose BPI to loop passes at all. In the old pass manager, we appear to have been ignoring the fact that LCSSA and/or LoopSimplify didn't preserve BPI, and making it available to the following loop passes anyways. In the new one, it's invalidated before running any loop pass if either LCSSA or LoopSimplify actually make changes. If they don't make changes, then BPI is valid and available. So, we go ahead and teach LCSSA and LoopSimplify how to preserve BPI for consistency between old and new pass managers. This patch avoids an invalidation between the two requires in the following trivial pass pipeline: opt -passes="requires<branch-prob>,loop(no-op-loop),requires<branch-prob>" (when the input file is one which requires either LCSSA or LoopSimplify to canonicalize the loops) Differential Revision: https://reviews.llvm.org/D60790 llvm-svn: 358901
* [PGO/SamplePGO][NFC] Move the function updateProfWeight from InstructionWei Mi2019-04-222-43/+44
| | | | | | | | | | | | | | | | | | | | | | to CallInst. The issue was raised here: https://reviews.llvm.org/D60903#1472783 The function Instruction::updateProfWeight is only used for CallInst in profile update. From the current interface, it is very easy to think that the function can also be used for branch instruction. However, Branch instruction does't need the scaling the function provides for branch_weights and VP (value profile), in addition, scaling may introduce inaccuracy for branch probablity. The patch moves the function updateProfWeight from Instruction class to CallInst to remove the confusion. The patch also changes the scaling of branch_weights from a loop to a block because we know that ProfileData for branch_weights of CallInst will only have two operands at most. Differential Revision: https://reviews.llvm.org/D60911 llvm-svn: 358900
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-1/+3
| | | | llvm-svn: 358894
* GlobalISel: Legalize scalar G_EXTRACT sourcesMatt Arsenault2019-04-221-0/+7
| | | | llvm-svn: 358892
* llvm-undname: Fix an assert-on-invalid, found by oss-fuzzNico Weber2019-04-221-1/+1
| | | | llvm-svn: 358891
* AMDGPU: Fix not checking for copy when looking at copy srcMatt Arsenault2019-04-221-1/+6
| | | | | | | Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890
* [AMDGPU][MC] Corrected parsing of SP3 'neg' modifierDmitry Preobrazhensky2019-04-221-24/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-222-12/+50
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [DAGCombiner] make variable name less ambiguous; NFCSanjay Patel2019-04-221-4/+4
| | | | llvm-svn: 358886
* [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFCSanjay Patel2019-04-221-11/+16
| | | | llvm-svn: 358884
* [LLVM-C] Add accessors to the default floating-point metadata nodeRobert Widmann2019-04-221-0/+12
| | | | | | | | | | | | | | | | Summary: Add a getter and setter pair for floating-point accuracy metadata. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60527 llvm-svn: 358883
* [NewPM] Add Option handling for SimpleLoopUnswitchSerguei Katkov2019-04-222-1/+39
| | | | | | | | | | | This patch enables passing options to SimpleLoopUnswitch via the passes pipeline. Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60676 llvm-svn: 358880
* Revert "[ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFC"Nikita Popov2019-04-224-15/+16
| | | | | | | | | | This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445. After thinking about this more, this isn't right, the range is not exact in the same sense as makeExactICmpRegion(). This needs a separate function. llvm-svn: 358876
* [ConstantRange] Rename make{Guaranteed -> Exact}NoWrapRegion() NFCNikita Popov2019-04-224-16/+15
| | | | | | | | Following D60632 makeGuaranteedNoWrapRegion() always returns an exact nowrap region. Rename the function accordingly. This is in line with the naming of makeExactICmpRegion(). llvm-svn: 358875
* [X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is ↵Craig Topper2019-04-221-2/+5
| | | | | | not enabled. Same for 256 bit and AVX. llvm-svn: 358872
* [JITLink] Remove a lot of reduntant 'JITLink_' prefixes. NFC.Lang Hames2019-04-228-18/+22
| | | | llvm-svn: 358869
* [JITLink] Fix section start address calculation in eh-frame recorder.Lang Hames2019-04-221-0/+3
| | | | | | | | | | Section atoms are not sorted, so we need to scan the whole section to find the start address. No test case: Found by inspection, and any reproduction would depend on pointer ordering. llvm-svn: 358865
* llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t stringsNico Weber2019-04-211-3/+3
| | | | | | | | | | | | | | | | | llvm-undname used to put '\x' in front of every pair of nibbles, but u"\xD7\xFF" produces a string with 6 bytes: \xD7 \0 \xFF \0 (and \0\0). Correct for a single character (plus terminating \0) is u\xD7FF instead. Now, wchar_t, char16_t, and char32_t strings roundtrip from source to clang-cl (and cl.exe) and then llvm-undname. (...at least as long as it's not a string like L"\xD7FF" L"foo" which gets demangled as L"\xD7FFfoo", where the compiler then considers the "f" as part of the hex escape. That seems ok.) Also add a comment saying that the "almost-valid" char32_t string I added in my last commit is actually produced by compilers. llvm-svn: 358857
* llvm-undname: Fix stack overflow on almost-validNico Weber2019-04-211-3/+3
| | | | | | | | | | | | | | | | | If a unsigned with all 4 bytes non-0 was passed to outputHex(), there were two off-by-ones in it: - Both MaxPos and Pos left space for the final \0, which left the buffer one byte to small. Set MaxPos to 16 instead of 15 to fix. - The `assert(Pos >= 0);` was after a `Pos--`, move it up one line. Since valid Unicode codepoints are <= 0x10ffff, this could never really happen in practice. Found by oss-fuzz. llvm-svn: 358856
* [ConstantRange] Add saturating add/sub methodsNikita Popov2019-04-211-0/+36
| | | | | | | | | | | | | | | | Add support for uadd_sat and friends to ConstantRange, so we can handle uadd.sat and friends in LVI. The implementation is forwarding to the corresponding APInt methods with appropriate bounds. One thing worth pointing out here is that the handling of wrapping ranges is not maximally accurate. A simple example is that adding 0 to a wrapped range will return a full range, rather than the original wrapped range. The tests also only check that the non-wrapping envelope is correct and minimal. Differential Revision: https://reviews.llvm.org/D60946 llvm-svn: 358855
* [ConstantRange] Add getNonEmpty() constructorNikita Popov2019-04-213-64/+19
| | | | | | | | | | | | | | ConstantRanges have an annoying special case: If upper and lower are the same, it can be either an empty or a full set. When constructing constant ranges nearly always a full set is intended, but this still requires an explicit check in many places. This revision adds a getNonEmpty() constructor that disambiguates this case: If upper and lower are the same, a full set is created. Differential Revision: https://reviews.llvm.org/D60947 llvm-svn: 358854
* llvm-undname: Fix stack overflow on invalid found by oss-fuzzNico Weber2019-04-211-1/+1
| | | | llvm-svn: 358852
* [ARM] Rewrite isLegalT2AddressImmediateDavid Green2019-04-211-29/+24
| | | | | | | | | | | | | | | | | | | | | This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845
* [X86] Add the rounding control operand to the printing for some scalar FMA ↵Craig Topper2019-04-211-1/+1
| | | | | | instructions. llvm-svn: 358844
* [CachePruning] Simplify comparatorFangrui Song2019-04-211-9/+2
| | | | llvm-svn: 358843
* [X86] Don't form masked vfpclass instruction from and+vfpclass unless the ↵Craig Topper2019-04-211-28/+36
| | | | | | fpclass only has a single use. llvm-svn: 358841
* [JITLink] Remove an overly strict error check in JITLink's eh-frame parser.Lang Hames2019-04-212-13/+4
| | | | | | | | The error check required FDEs to refer to the most recent CIE, but the eh-frame spec allows them to refer to any previously seen CIE. This patch removes the offending check. llvm-svn: 358840
* [JITLink] Factor basic common GOT and stub creation code into its own class.Lang Hames2019-04-212-72/+130
| | | | llvm-svn: 358838
* llvm-undname: Improve string literal demangling with embedded \0 charsNico Weber2019-04-201-2/+5
| | | | | | | | | - Don't assert when a string looks like a u32 string to the heuristic but doesn't have a length that's 0 mod 4. Instead, classify those as u16 with embedded \0 chars. Found by oss-fuzz. - Print embedded nul bytes as \0 instead of \x00. llvm-svn: 358835
* ftime-trace: Trace the name of the currently active pass as well.Nico Weber2019-04-201-6/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D60782 llvm-svn: 358834
* [JITLink] Add yet more detail to MachO/x86-64 unsupported relocation errors.Lang Hames2019-04-201-1/+4
| | | | | | | | Knowing the address/symbolnum field values makes it easier to identify the unsupported relocation, and provides enough information for the full bit pattern of the relocation to be reconstructed. llvm-svn: 358833
* [JITLink][ORC] Add JITLink to the list of dependencies for ORC.Lang Hames2019-04-201-1/+2
| | | | | | | | | The new ObjectLinkingLayer in ORC depends on JITLink. This should fix the build error at http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9621 llvm-svn: 358832
* [JITLink] Fix a bad formatv format string.Lang Hames2019-04-201-1/+1
| | | | llvm-svn: 358831
* Revert r358800. Breaks Obsequi from the test suite.Amara Emerson2019-04-202-100/+12
| | | | | | | The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with a different issue. llvm-svn: 358829
* [JITLink] Add BinaryFormat to JITLink's dependencies.Lang Hames2019-04-201-1/+1
| | | | | | | | | Hopefully this will fix the missing dependence on llvm::identify_magic that is showing up on some PPC bots. E.g. http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9617 llvm-svn: 358827
* [JITLink] Add more detail to MachO/x86-64 "unsupported relocation" errors.Lang Hames2019-04-201-1/+5
| | | | | | | The extra information here will be helpful in diagnosing errors, like the ones currently occuring on the PPC big-endian bots. :) llvm-svn: 358826
* [JITLink] Silence some MSVC implicit cast warnings.Lang Hames2019-04-201-2/+3
| | | | llvm-svn: 358824
* [JITLink] Use memset instead of bzero.Lang Hames2019-04-201-2/+2
| | | | llvm-svn: 358822
* [JITLink] Fix a missing header and bad prototype.Lang Hames2019-04-201-1/+2
| | | | llvm-svn: 358819
* Initial implementation of JITLink - A replacement for RuntimeDyld.Lang Hames2019-04-2019-65/+3169
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: JITLink is a jit-linker that performs the same high-level task as RuntimeDyld: it parses relocatable object files and makes their contents runnable in a target process. JITLink aims to improve on RuntimeDyld in several ways: (1) A clear design intended to maximize code-sharing while minimizing coupling. RuntimeDyld has been developed in an ad-hoc fashion for a number of years and this had led to intermingling of code for multiple architectures (e.g. in RuntimeDyldELF::processRelocationRef) in a way that makes the code more difficult to read, reason about, extend. JITLink is designed to isolate format and architecture specific code, while still sharing generic code. (2) Support for native code models. RuntimeDyld required the use of large code models (where calls to external functions are made indirectly via registers) for many of platforms due to its restrictive model for stub generation (one "stub" per symbol). JITLink allows arbitrary mutation of the atom graph, allowing both GOT and PLT atoms to be added naturally. (3) Native support for asynchronous linking. JITLink uses asynchronous calls for symbol resolution and finalization: these callbacks are passed a continuation function that they must call to complete the linker's work. This allows for cleaner interoperation with the new concurrent ORC JIT APIs, while still being easily implementable in synchronous style if asynchrony is not needed. To maximise sharing, the design has a hierarchy of common code: (1) Generic atom-graph data structure and algorithms (e.g. dead stripping and | memory allocation) that are intended to be shared by all architectures. | + -- (2) Shared per-format code that utilizes (1), e.g. Generic MachO to | atom-graph parsing. | + -- (3) Architecture specific code that uses (1) and (2). E.g. JITLinkerMachO_x86_64, which adds x86-64 specific relocation support to (2) to build and patch up the atom graph. To support asynchronous symbol resolution and finalization, the callbacks for these operations take continuations as arguments: using JITLinkAsyncLookupContinuation = std::function<void(Expected<AsyncLookupResult> LR)>; using JITLinkAsyncLookupFunction = std::function<void(const DenseSet<StringRef> &Symbols, JITLinkAsyncLookupContinuation LookupContinuation)>; using FinalizeContinuation = std::function<void(Error)>; virtual void finalizeAsync(FinalizeContinuation OnFinalize); In addition to its headline features, JITLink also makes other improvements: - Dead stripping support: symbols that are not used (e.g. redundant ODR definitions) are discarded, and take up no memory in the target process (In contrast, RuntimeDyld supported pointer equality for weak definitions, but the redundant definitions stayed resident in memory). - Improved exception handling support. JITLink provides a much more extensive eh-frame parser than RuntimeDyld, and is able to correctly fix up many eh-frame sections that RuntimeDyld currently (silently) fails on. - More extensive validation and error handling throughout. This initial patch supports linking MachO/x86-64 only. Work on support for other architectures and formats will happen in-tree. Differential Revision: https://reviews.llvm.org/D58704 llvm-svn: 358818
* [X86] Disable argument copy elision for arguments passed via pointersCraig Topper2019-04-201-1/+5
| | | | | | | | | | | | | | | | | | | | | Summary: If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory. The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too. This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix. Reviewers: rnk Reviewed By: rnk Subscribers: dbabokin, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60801 llvm-svn: 358817
* [CorrelatedValuePropagation] Mark subs that we know not to wrap with nuw/nsw.Luqman Aden2019-04-201-25/+26
| | | | | | | | | | | | | | | | | | | Summary: Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative. Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181) Reviewers: sanjoy, apilipenko, nikic Reviewed By: nikic Subscribers: hiraditya, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60036 llvm-svn: 358816
* [ExecutionDomainFix] Optimize a binary search insertionFangrui Song2019-04-201-3/+3
| | | | llvm-svn: 358815
* [llvm-symbolizer] Fix section index at the end of a sectionFangrui Song2019-04-201-2/+1
| | | | | | | | This is very minor issue. The returned section index is only used by DWARFDebugLine as an llvm::upper_bound input and the use case shouldn't cause any behavioral change. llvm-svn: 358814
* [X86] Fix stack probing on x32 (PR41477)Nikita Popov2019-04-202-8/+15
| | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI with stack probing a dynamic alloca will result in a WIN_ALLOCA_32 with a 32-bit size. The current implementation tries to copy it into RAX, resulting in a physreg copy error. Fix this by copying to EAX instead. Also fix incorrect opcodes or registers used in subs. llvm-svn: 358807
OpenPOWER on IntegriCloud