summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/MemCpyOpt
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first ↵Eric Christopher2019-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pass at the O1 described there."" This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4 Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
* Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at ↵Muhammad Omair Javaid2019-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | the O1 described there." This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 ↵Eric Christopher2019-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* [MemCpyOpt] Fixing Incorrect Code Motion while Handling Aggregate Type ValuesSam Elliott2019-10-211-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When MemCpyOpt is handling aggregate type values, if an instruction (let's call it P) between the targeting load (L) and store (S) clobbers the source pointer of L, it will try to hoist S before P. This process will also hoist S's data dependency instructions. However, the current implementation has a bug that if one of S's dependency instructions is //also// a user of P, MemCpyOpt will not prevent it from being hoisted above P and cause a use-before-define error. For example, in the newly added test file (i.e. `aggregate-type-crash.ll`), it will try to hoist both `store %my_struct %1, %my_struct* %3` and its dependent, `%3 = bitcast i8* %2 to %my_struct*`, above `%2 = call i8* @my_malloc(%my_struct* %0)`. Creating the following BB: ``` entry: %1 = bitcast i8* %4 to %my_struct* %2 = bitcast %my_struct* %1 to i8* %3 = bitcast %my_struct* %0 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 4 %3, i64 8, i1 false) %4 = call i8* @my_malloc(%my_struct* %0) ret void ``` Where there is a use-before-define error between `%1` and `%4`. Update: The compiler for the Pony Programming Language [also encounter the same bug](https://github.com/ponylang/ponyc/issues/3140) Patch by Min-Yih Hsu (myhsu) Reviewers: eugenis, pcc, dblaikie, dneilson, t.p.northover, lattner Reviewed By: eugenis Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66060 llvm-svn: 375403
* [ValueTracking] Improve pointer offset computation for cases of same baseRong Xu2019-10-101-0/+77
| | | | | | | | | | | | This patch improves the handling of pointer offset in GEP expressions where one argument is the base pointer. isPointerOffset() is being used by memcpyopt where current code synthesizes consecutive 32 bytes stores to one store and two memset intrinsic calls. With this patch, we convert the stores to one memset intrinsic. Differential Revision: https://reviews.llvm.org/D67989 llvm-svn: 374454
* [BuildLibCalls] Noalias annotationDavid Bolvansky2019-08-141-1/+1
| | | | | | | | | | | | | | | | Summary: I think this is better solution than annotating callsites in IC/SLC. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66217 llvm-svn: 368875
* [SimplifyLibCalls] Add noalias from known callsitesDavid Bolvansky2019-08-131-1/+1
| | | | | | | | | | | | | | | | | | Summary: Should be fine for memcpy, strcpy, strncpy. Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66135 llvm-svn: 368724
* [SimplifyLibCalls] Add dereferenceable bytes from known callsitesDavid Bolvansky2019-08-131-16/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: int mm(char *a, char *b) { return memcmp(a,b,16); } Currently: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16) ret i32 %call } After patch: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* dereferenceable(16) %a, i8* dereferenceable(16) %b, i64 16) ret i32 %call } Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: javed.absar, spatel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66079 llvm-svn: 368657
* [FunctionAttrs] Annotate "willreturn" for intrinsicsHideto Ueno2019-07-281-1/+1
| | | | | | | | | | | | | | | | | | | Summary: In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef). In this patch, willreturn is annotated for LLVM intrinsics. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64904 llvm-svn: 367184
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-1731-0/+2171
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-1731-2171/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* [ValueTracking] Look through casts when determining non-nullnessJohannes Doerfert2019-01-261-5/+5
| | | | | | | | | | Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 llvm-svn: 352293
* [BasicAA] Fix AA bug on dynamic allocas and stackrestoreReid Kleckner2018-12-211-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55969 llvm-svn: 349945
* Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"Nikita Popov2018-12-131-4/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 349078
* Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"David L. Jones2018-12-131-4/+4
| | | | | | | This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002
* [MemCpyOpt] memset->memcpy forwarding with undef tailNikita Popov2018-12-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645
* [MemCpyOpt] Add tests for memset->memcpy forwaring with undef tail; NFCNikita Popov2018-12-071-0/+152
| | | | | | These are baseline tests for D55120. llvm-svn: 348644
* Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValueJF Bastien2018-09-212-21/+110
| | | | | | | | | | | | | Summary: his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types. clang part of this patch: D51752 Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51751 llvm-svn: 342709
* [MemCpyOpt] Skip optimizing basic blocks not reachable from entryBjorn Pettersson2018-04-231-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Skip basic blocks not reachable from the entry node in MemCpyOptPass::iterateOnFunction. Code that is unreachable may have properties that do not exist for reachable code (an instruction in a basic block can for example be dominated by a later instruction in the same basic block, for example if there is a single block loop). MemCpyOptPass::processStore is only safe to use for reachable basic blocks, since it may iterate past the basic block beginning when used for unreachable blocks. By simply skipping to optimize unreachable basic blocks we can avoid asserts such as "Assertion `!NodePtr->isKnownSentinel()' failed." in MemCpyOptPass::processStore. The problem was detected by fuzz tests. Reviewers: eli.friedman, dneilson, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D45889 llvm-svn: 330635
* [MemCpyOpt] Update to new API for memory intrinsic alignmentDaniel Neilson2018-03-212-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the MemCpyOpt pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. We also add a few tests to fill gaps in the testing of this pass. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment() and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 328097
* Remove alignment argument from memcpy/memmove/memset in favour of alignment ↵Daniel Neilson2018-01-1922-215/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965
* Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks"Reid Kleckner2017-12-284-243/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r321138. It seems there are still underlying issues with memdep. PR35519 seems to still be present if debug info is enabled. We end up losing a memcpy. Somehow during store to memset merging, we insert the memset after the memcpy or fail to update the memdep analysis to account for the newly inserted memset of a pair. Reduced test case: #include <assert.h> #include <stdio.h> #include <string> #include <utility> #include <vector> void do_push_back( std::vector<std::pair<std::string, std::vector<std::string>>>* crls) { crls->push_back(std::make_pair(std::string(), std::vector<std::string>())); } int __attribute__((optnone)) main() { // Put some data in the vector and then remove it so we take the push_back // fast path. std::vector<std::pair<std::string, std::vector<std::string>>> crl_set; crl_set.push_back({"asdf", {}}); crl_set.pop_back(); printf("first word in vector storage: %p\n", *(void**)crl_set.data()); // Do the push_back which may fail to initialize the data. do_push_back(&crl_set); auto* first = &crl_set.back().first; printf("first word in vector storage (should be zero): %p\n", *(void**)crl_set.data()); assert(first->empty()); puts("ok"); } Compile with libc++, enable optimizations, and enable debug info: $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib This program will assert with this change. llvm-svn: 321510
* [memcpyopt] Teach memcpyopt to optimize across basic blocksDan Gohman2017-12-204-0/+243
| | | | | | | | | | | | | | | | | This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. This is r319482 and r319483, along with fixes for PR35519: fix the optimization that merges stores into memsets to preserve cached memdep info, and fix memdep's non-local caching strategy to not assume that larger queries are always more conservative than smaller ones. Fixes PR28958 and PR35519. Differential Revision: https://reviews.llvm.org/D40802 llvm-svn: 321138
* Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across ↵Hans Wennborg2017-12-062-162/+0
| | | | | | | | | | | | | | | | | | | | | | | | basic blocks" This caused PR35519. > [memcpyopt] Teach memcpyopt to optimize across basic blocks > > This teaches memcpyopt to make a non-local memdep query when a local query > indicates that the dependency is non-local. This notably allows it to > eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. > > Fixes PR28958. > > Differential Revision: https://reviews.llvm.org/D38374 > > [memcpyopt] Commit file missed in r319482. > > This change was meant to be included with r319482 but was accidentally > omitted. llvm-svn: 319873
* [memcpyopt] Teach memcpyopt to optimize across basic blocksDan Gohman2017-11-302-0/+162
| | | | | | | | | | | | This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482
* Fix some misc. -enable-var-scope violationsMatt Arsenault2017-11-131-6/+6
| | | | llvm-svn: 318006
* Add address space mangling to lifetime intrinsicsMatt Arsenault2017-04-104-21/+21
| | | | | | In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
* [MemCpyOpt] Only replace memcpy with bitcast if address spaces matchMatt Arsenault2017-04-101-0/+13
| | | | | | Patch by James Price llvm-svn: 299866
* MemCpyOptimizer: don't create new addrspace castsFiona Glaser2017-03-141-0/+15
| | | | | | | This isn't safe on all targets, and since we don't have a way to know it's safe, avoid doing it for now. llvm-svn: 297788
* [MemCpyOpt] Don't sink LoadInst below possible clobber.Bryant Wong2016-12-271-0/+36
| | | | | | Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611
* [MemCpyOpt] Don't emit IR in an unspecified orderBenjamin Kramer2016-11-071-28/+28
| | | | | | | | | Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126
* [MemCpy] Add comments for r279769Tim Shen2016-08-251-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279778
* [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the ↵Tim Shen2016-08-251-0/+38
| | | | | | | | | | | | | | | identity of two operands Summary: This fixes pr29105. The reason is that lifetime marks creates new aliasing pointers the original ones, but before this patch aliases were not checked in performMemCpyToMemSetOptzn. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279769
* [AliasAnalysis] Treat invariant.start as read-memoryAnna Thomas2016-08-091-0/+49
| | | | | | | | | | | | | | | | | Summary: We teach alias analysis that invariant.start is readonly. This helps with GVN and memcopy optimizations that currently treat. invariant.start as a clobber. We need to treat this as readonly, so that DSE does not incorrectly remove stores prior to the invariant.start Reviewers: sanjoy, reames, majnemer, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23214 llvm-svn: 278138
* [PM] Port MemCpyOpt to the new PM.Sean Silva2016-06-141-0/+1
| | | | | | | | | The need for all these Lookup* functions is just because of calls to getAnalysis inside methods (i.e. not at the top level) of the runOnFunction method. They should be straightforward to clean up when the old PM is gone. llvm-svn: 272615
* [MemCpyOpt] Do not exchange llvm.lifetime.start and llvm.memcpyTim Shen2016-06-081-0/+25
| | | | | | | | | | Reviewers: iteratee Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21087 llvm-svn: 272192
* [MemCpyOpt] Don't perform callslot optimization across may-throw callsDavid Majnemer2016-05-262-1/+35
| | | | | | | | | An exception could prevent a store from occurring but MemCpyOpt's callslot optimization would fire anyway, causing the store to occur. This fixes PR27849. llvm-svn: 270892
* [MemCpyOpt] Use MaxIntSize in byte instead of bitJun Bum Lim2016-05-131-0/+20
| | | | | | | | | | | | Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit. Reviewers: arsenm, joker.eph, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20176 llvm-svn: 269433
* Revert "MemCpyOpt: combine local load/store sequences into memcpy."Tim Northover2016-05-101-353/+0
| | | | | | | This reverts commit r269125. It was in my tree when I ran "git svn dcommit". It's really still under review. llvm-svn: 269127
* MemCpyOpt: combine local load/store sequences into memcpy.Tim Northover2016-05-101-0/+353
| | | | | | | | Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block that really implements a memcpy operation, backends can benefit from seeing this. llvm-svn: 269125
* Imporove load to store => memcpyAmaury Sechet2016-03-141-7/+29
| | | | | | | | | | Summary: This now try to reorder instructions in order to help create the optimizable pattern. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Differential Revision: http://reviews.llvm.org/D16523 llvm-svn: 263503
* Fix PR26051: Memcpy optimization should introduce a call to memcpy before ↵Mehdi Amini2016-01-061-0/+14
| | | | | | | | | | the store destination position This is a conservative fix, I expect Amaury to relax this. Follow-up for r256923 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 256999
* Promote aggregate store to memset when possibleAmaury Sechet2016-01-061-4/+6
| | | | | | | | | | | | Summary: As per title. This will allow the optimizer to pick up on it. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15923 llvm-svn: 256969
* Improve load/store to memcpy for aggregateAmaury Sechet2016-01-061-0/+25
| | | | | | | | | | | | Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15903 llvm-svn: 256923
* Implement load to store => memcpy in MemCpyOpt for aggregatesAmaury Sechet2016-01-051-0/+47
| | | | | | | | | | | | | | | Summary: Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations. This is one such opportunity. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15894 llvm-svn: 256868
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-1917-182/+182
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-1817-182/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* Sort the enums in Attributes.h in case insensitive alphabetical order.Akira Hatanaka2015-11-111-1/+1
| | | | | | | | | Sort the enums in preparation for moving the attributes to a table-gen file. rdar://problem/19836465 llvm-svn: 252692
* [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.Andrea Di Biagio2015-10-091-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass MemCpyOpt doesn't check if a store instruction is nontemporal. As a consequence, adjacent nontemporal stores are always merged into a memset call. Example: ;;; define void @foo(<4 x float>* nocapture %p) { entry: store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0 %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1 store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0 ret void } !0 = !{i32 1} ;;; In this example, the two nontemporal stores are combined to a memset of zero which does not preserve the nontemporal hint. Later on the backend (tested on a x86-64 corei7) expands that memset call into a sequence of two normal 16-byte aligned vector stores. opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o - Before: xorps %xmm0, %xmm0 movaps %xmm0, 16(%rdi) movaps %xmm0, (%rdi) With this patch, we no longer merge nontemporal stores into calls to memset. In this example, llc correctly expands the two stores into two movntps: xorps %xmm0, %xmm0 movntps %xmm0, 16(%rdi) movntps %xmm0, (%rdi) In theory, we could extend the usage of !nontemporal metadata to memcpy/memset calls. However a change like that would only have the effect of forcing the backend to expand !nontemporal memsets back to sequences of store instructions. A memset library call would not have exactly the same semantic of a builtin !nontemporal memset call. So, SelectionDAG will have to conservatively expand it back to a sequence of !nontemporal stores (effectively undoing the merging). Differential Revision: http://reviews.llvm.org/D13519 llvm-svn: 249820
* Emit argmemonly attribute for intrinsics.Igor Laevsky2015-08-131-2/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D11352 llvm-svn: 244920
OpenPOWER on IntegriCloud