summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[DebugInfo] Recover debug intrinsics when killing duplicated/empty ↵Tozer2019-12-042-64/+22
| | | | | | | | basic blocks" This reverts commit 72ce759928e6dfee6a9efa310b966c19722352ba. Reverted due to build failure.
* Revert "[Coverage] Revise format to reduce binary size"Vedant Kumar2019-12-041-1/+5
| | | | | | | | | | This reverts commit e18531595bba495946aa52c0a16b9f9238cff8bc. On Windows, there is an error: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data
* [Coverage] Revise format to reduce binary sizeVedant Kumar2019-12-041-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are *duplicate* placeholders for functions visible-but-not-used in TUs. Placeholders *are* used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471
* [LoopInterchange] Improve inner exit loop safety checks.Florian Hahn2019-12-041-33/+36
| | | | | | | | | | | | | | | | | | | | The PHI node checks for inner loop exits are too permissive currently. As indicated by an existing comment, we should only allow LCSSA PHI nodes that are part of reductions or are only used outside of the loop nest. We ensure this by checking the users of the LCSSA PHIs. Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer loop. It also moves the inner loop exit check before the outer loop exit check. Fixes PR43473. Reviewers: efriedma, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D68144
* [llvm][Transform] Remove unused variable. [NFCI]Francesco Petrogalli2019-12-041-1/+1
| | | | The variable prevents compiling when using -Werror=unused-variable.
* [PGO][PGSO] Distinguish queries from unit tests and explicitly enable for ↵Hiroshi Yamauchi2019-12-041-0/+5
| | | | | | | | | | | | | | | | the existing IR passes only. NFC. Summary: This is one more prep step necessary before the code gen pass instrumentation code could go in. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70988
* [DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocksstozer2019-12-042-22/+64
| | | | | | | | | | When basic blocks are killed, either due to being empty or to being an if.then or if.else block whose complement contains identical instructions, some of the debug intrinsics in that block are lost. This patch sinks those intrinsics into the single successor block, setting them Undef if necessary to prevent debug info from falling out-of-date. Differential Revision: https://reviews.llvm.org/D70318
* [SimpleLoopUnswitch] Invalidate the topmost loop with ExitBB as exiting.Florian Hahn2019-12-041-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | SCEV caches the exiting blocks when computing exit counts. In SimpleLoopUnswitch, we split the exit block of the loop to unswitch. Currently we only invalidate the loop containing that exit block, but if that block is the exiting block for a parent loop, we have stale cache entries. We have to invalidate the top-most loop that contains the exit block as exiting block. We might also be able to skip invalidating the loop containing the exit block, if the exit block is not an exiting block of that loop. There are also 2 more places in SimpleLoopUnswitch, that use a similar problematic approach to get the loop to invalidate. If the patch makes sense, I will also update those places to a similar approach (they deal with multiple exit blocks, so we cannot directly re-use getTopMostExitingLoop). Fixes PR43972. Reviewers: skatkov, reames, asbirlea, chandlerc Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D70786
* [APFloat] Prevent construction of APFloat with Semantics and FP valueEhud Katz2019-12-042-2/+2
| | | | | | | | | | | | | | | | | Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)` may seem like they accept a FP (floating point) value, but the overload they reach is actually the `integerPart` one, not a `float` or `double` overload (which only exists when `fltSemantics` isn't passed). This may lead to possible loss of data, by the conversion from `float` or `double` to `integerPart`. To prevent future mistakes, a new constructor overload, which accepts any FP value and marked with `delete`, to prevent its usage. Fixes PR34095. Differential Revision: https://reviews.llvm.org/D70425
* [InstCombine] Revert aafde063aaf09285c701c80cd4b543c2beb523e8 and ↵Craig Topper2019-12-032-10/+2
| | | | | | | | | | | | | | | | 6749dc3446671df05235d0a218c426a314ac33cd related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.
* [LV] Scalar with predication must not be uniformAyal Zaks2019-12-031-17/+22
| | | | | | | | | | | | | | Fix PR40816: avoid considering scalar-with-predication instructions as also uniform-after-vectorization. Instructions identified as "scalar with predication" will be "vectorized" using a replicating region. If such instructions are also optimized as "uniform after vectorization", namely when only the first of VF lanes is used, such a replicating region becomes erroneous - only the first instance of the region can and should be formed. Fix such cases by not considering such instructions as "uniform after vectorization". Differential Revision: https://reviews.llvm.org/D70298
* [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregatesAnton Afanasyev2019-12-031-62/+56
| | | | | | | | | | | | | | | | | | | | Summary: Make SLPVectorize to recognize homogeneous aggregates like `{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`, `[2 x {float, float}]` and so on. It's a follow-up of https://reviews.llvm.org/D70068. Merged `findBuildVector()` and `findBuildAggregate()` to one `findBuildAggregate()` function making it recursive to recognize multidimensional aggregates. Aggregates required to be homogeneous. Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70587
* [VPlan] Add dump function to VPlan class.Florian Hahn2019-12-032-8/+14
| | | | | | | | | | | | | | This adds a dump() function to VPlan, which uses the existing operator<<. This method provides a convenient way to dump a VPlan while debugging, e.g. from lldb. Reviewers: hsaito, Ayal, gilr, rengolin Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D70920
* [asan] Remove debug locations from alloca prologue instrumentationJohannes Altmanninger2019-12-031-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes https://llvm.org/PR26673 "Wrong debugging information with -fsanitize=address" where asan instrumentation causes the prologue end to be computed incorrectly: findPrologueEndLoc, looks for the first instruction with a debug location to determine the prologue end. Since the asan instrumentation instructions had debug locations, that prologue end was at some instruction, where the stack frame is still being set up. There seems to be no good reason for extra debug locations for the asan instrumentations that set up the frame; they don't have a natural source location. In the debugger they are simply located at the start of the function. For certain other instrumentations like -fsanitize-coverage=trace-pc-guard the same problem persists - that might be more work to fix, since it looks like they rely on locations of the tracee functions. This partly reverts aaf4bb239487e0a3b20a8eaf94fc641235ba2c29 "[asan] Set debug location in ASan function prologue" whose motivation was to give debug location info to the coverage callback. Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is given the correct source location; as the debug location is still set in ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break. So -fsanitize-coverage is hopefully unaffected - I don't think it should rely on the debug locations of asan-generated allocas. Related revision: 3c6c14d14b40adfb581940859ede1ac7d8ceae7a "ASAN: Provide reliable debug info for local variables at -O0." Below is how the X86 assembly version of the added test case changes. We get rid of some .loc lines and put prologue_end where the user code starts. ```diff --- 2.master.s 2019-12-02 12:32:38.982959053 +0100 +++ 2.patch.s 2019-12-02 12:32:41.106246674 +0100 @@ -45,8 +45,6 @@ .cfi_offset %rbx, -24 xorl %eax, %eax movl %eax, %ecx - .Ltmp2: - .loc 1 3 0 prologue_end # 2.c:3:0 cmpl $0, __asan_option_detect_stack_use_after_return movl %edi, 92(%rbx) # 4-byte Spill movq %rsi, 80(%rbx) # 8-byte Spill @@ -57,9 +55,7 @@ callq __asan_stack_malloc_0 movq %rax, 72(%rbx) # 8-byte Spill .LBB1_2: - .loc 1 0 0 is_stmt 0 # 2.c:0:0 movq 72(%rbx), %rax # 8-byte Reload - .loc 1 3 0 # 2.c:3:0 cmpq $0, %rax movq %rax, %rcx movq %rax, 64(%rbx) # 8-byte Spill @@ -72,9 +68,7 @@ movq %rax, %rsp movq %rax, 56(%rbx) # 8-byte Spill .LBB1_4: - .loc 1 0 0 # 2.c:0:0 movq 56(%rbx), %rax # 8-byte Reload - .loc 1 3 0 # 2.c:3:0 movq %rax, 120(%rbx) movq %rax, %rcx addq $32, %rcx @@ -99,7 +93,6 @@ movb %r8b, 31(%rbx) # 1-byte Spill je .LBB1_7 # %bb.5: - .loc 1 0 0 # 2.c:0:0 movq 40(%rbx), %rax # 8-byte Reload andq $7, %rax addq $3, %rax @@ -118,7 +111,8 @@ movl %ecx, (%rax) movq 80(%rbx), %rdx # 8-byte Reload movq %rdx, 128(%rbx) - .loc 1 4 3 is_stmt 1 # 2.c:4:3 +.Ltmp2: + .loc 1 4 3 prologue_end # 2.c:4:3 movq %rax, %rdi callq f movq 48(%rbx), %rax # 8-byte Reload ``` Reviewers: eugenis, aprantl Reviewed By: eugenis Subscribers: ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70894
* Place the "cold" code piece into the same section as the original functionBill Wendling2019-12-021-0/+3
| | | | | | | | | | | | | | | | Summary: This cropped up in the Linux kernel where cold code was placed in an incompatible section. Reviewers: compnerd, vsk, tejohnson Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70925
* [PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize.Hiroshi Yamauchi2019-12-026-9/+18
| | | | | | | | | | | | | | Summary: In case of a need to distinguish different query sites for gradual commit or debugging of PGSO. NFC. Reviewers: davidxl Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70510
* [VPlan] Move graph traits (NFC).Florian Hahn2019-12-021-121/+122
| | | | | | | | | | | By defining the graph traits right after the VPBlockBase definitions, we can make use of them earlier in the file. Reviewers: hsaito, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D70733
* [InstCombine] fix undef propagation for vector urem transform (PR44186)Sanjay Patel2019-12-021-2/+4
| | | | | | | | | As described here: https://bugs.llvm.org/show_bug.cgi?id=44186 The match() code safely allows undef values, but we can't safely propagate a vector constant that contains an undef to the new compare instruction.
* [ARM,MVE] Add an InstCombine rule permitting VPNOT.Simon Tatham2019-12-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | Summary: If a user writing C code using the ACLE MVE intrinsics generates a predicate and then complements it, then the resulting IR will use the `pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit integer; complement that integer; and convert back. This will generate machine code that moves the predicate out of the `P0` register, complements it in an integer GPR, and moves it back in again. This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement of the original predicate vector, which we can already instruction- select as the VPNOT instruction which complements P0 in place. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70484
* [InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() ↵Roman Lebedev2019-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100
* [InstCombine] fold copysign with constant sign argument to (fneg+)fabsSanjay Patel2019-12-021-0/+15
| | | | | | | | | | | | | | | If the sign of the sign argument is known (this could be extended to use ValueTracking), then we can use fneg+fabs to clear/set the sign bit of the magnitude argument. http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic This transform is already done in DAGCombiner, but we can do it sooner in IR as suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 We have effectively no analysis for copysign in IR, so we are taking the unusual step of increasing the number of IR instructions for the negative constant case. Differential Revision: https://reviews.llvm.org/D70792
* [InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast)))Bjorn Pettersson2019-12-021-22/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: optimizeVectorResize is rewriting patterns like: %1 = bitcast vector %src to integer %2 = trunc/zext %1 %dst = bitcast %2 to vector Since bitcasting between integer an vector types gives different integer values depending on endianness, we need to take endianness into account. As it happens the old implementation only produced the correct result for little endian targets. Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178 Reviewers: spatel, lattner, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70844
* [InstCombine] Expand usub_sat patterns to handle constantsDavid Green2019-11-301-3/+11
| | | | | | | | The constants come through as add %x, -C, not a sub as would be expected. They need some extra matchers to canonicalise them towards usub_sat. Differential Revision: https://reviews.llvm.org/D69514
* [InstCombine] Adjust usub_sat fold one use checksDavid Green2019-11-301-3/+3
| | | | | | This adjusts the one use checks in the the usub_sat fold code to not increase instruction count, but otherwise do the fold. Reviewed as a part of D69514.
* [Attributor] Deduce dereferenceable based on accessed bytes mapHideto Ueno2019-11-291-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces the deduction based on load/store instructions whose pointer operand is a non-inbounds GEP instruction. For example if we have, ``` void f(int *u){ u[0] = 0; u[1] = 1; u[2] = 2; } ``` then u must be dereferenceable(12). This patch is inspired by D64258 Reviewers: jdoerfert, spatel, hfinkel, RKSimon, sstefan1, xbolva00, dtemirbulatov Reviewed By: jdoerfert Subscribers: jfb, lebedev.ri, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70714
* [Attributor] Remove dereferenceable_or_null when nonull is presentHideto Ueno2019-11-291-0/+10
| | | | | | | | | | | | | | Summary: This patch prevents the simultaneous presence of `dereferenceable` and `dereferenceable_or_null` attribute Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70789
* Revert "[Attributor] Move pass after InstCombine to futher eliminate null ↵Dávid Bolvanský2019-11-271-2/+3
| | | | | | pointer checks" This reverts commit 7ca7d62c6ea1680ec0a1861083669596547fdd6f. Commited accidentally.
* [Attributor] Move pass after InstCombine to futher eliminate null pointer checksDávid Bolvanský2019-11-271-3/+2
| | | | | | | | | | | | Summary: PR44149 Reviewers: jdoerfert Subscribers: mehdi_amini, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70737
* [Attributor] Handle special case when offset equals zero in nonnull deductionHideto Ueno2019-11-271-6/+18
|
* Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first ↵Eric Christopher2019-11-261-16/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pass at the O1 described there."" This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4 Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
* [InstCombine] Fixed std::min on some bots. NFCIDávid Bolvanský2019-11-261-1/+1
|
* [InstCombine] Optimize some memccpy calls to memcpy/nullDávid Bolvanský2019-11-261-0/+41
| | | | | | | | | | | | | | | | | Summary: return memccpy(d, "helloworld", 'r', 20) => return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8 Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68089
* [Attributor] Track a GEP Instruction in align deductionHideto Ueno2019-11-261-12/+37
| | | | | | | | | | | | | | | | | | | | Summary: This patch enables us to track GEP instruction in align deduction. If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that ``` A + Offset = C * Q = B ``` So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70392
* Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at ↵Muhammad Omair Javaid2019-11-261-30/+16
| | | | | | | | | | | | | | | | | | | | | | | the O1 described there." This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 ↵Eric Christopher2019-11-251-16/+30
| | | | | | | | | | | | | | | | | | | | | described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* [InstCombine] remove shuffle mask canonicalization that creates undef elementsSanjay Patel2019-11-251-10/+10
| | | | | | This is NFC-intended because SimplifyDemandedVectorElts() does the same transform later. As discussed in D70641, we may want to change that behavior, so we need to isolate where it happens.
* [NFC][LoopFusion] Use isControlFlowEquivalent() from CodeMoverUtils.Whitney Tsang2019-11-252-14/+14
| | | | | | | | Reviewer: kbarton, jdoerfert, Meinersbur, bmahjour, etiotto Reviewed By: Meinersbur Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D70619
* [InstCombine] prevent infinite loop from conflicting shuffle mask transformsSanjay Patel2019-11-251-2/+4
| | | | | | | | | The pattern in question is currently not possible because we aggressively (wrongly) transform mask elements to undef values if they choose from an undef operand. That, however, would change if we tighten our semantics for shuffles as discussed in D70641. Adding this check gives us the flexibility to make that change with minimal overhead for current definitions.
* [InstCombine] simplify code for shuffle mask canonicalization; NFCSanjay Patel2019-11-251-6/+4
| | | | We never use the local 'Mask' before returning, so that was dead code.
* [InstCombine] remove dead code from shuffle mask canonicalization; NFCSanjay Patel2019-11-251-2/+2
|
* [InstCombine] simplify loop for shuffle mask canonicalization; NFCSanjay Patel2019-11-251-4/+4
|
* [DebugInfo@O2][Utils] Undef instead of delete dbg.values in helper funcOCHyams2019-11-251-15/+7
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Related bug: https://bugs.llvm.org/show_bug.cgi?id=40648 Static helper function rewriteDebugUsers in Local.cpp deletes dbg.value intrinsics when it cannot move or rewrite them, or salvage the deleted instruction's value. It should instead undef them in this case. This patch fixes that and I've added a test which covers the failing test case in bz40648. I've updated the unit test Local.ReplaceAllDbgUsesWith to check for this behaviour (and fixed a typo in the test which would cause the old test to always pass). Reviewers: aprantl, vsk, djtodoro, probinson Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70604
* Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵Florian Hahn2019-11-241-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | order recurrences." This version contains 2 fixes for reported issues: 1. Make sure we do not try to sink terminator instructions. 2. Make sure we bail out, if we try to sink an instruction that needs to stay in place for another recurrence. Original message: If the recurrence PHI node has a single user, we can sink any instruction without side effects, given that all users are dominated by the instruction computing the incoming value of the next iteration ('Previous'). We can sink instructions that may cause traps, because that only causes the trap to occur later, but not on any new paths. With the relaxed check, we also have to make sure that we do not have a direct cycle (meaning PHI user == 'Previous), which indicates a reduction relation, which potentially gets missed by ReductionDescriptor. As follow-ups, we can also sink stores, iff they do not alias with other instructions we move them across and we could also support sinking chains of instructions and multiple users of the PHI. Fixes PR43398. Reviewers: hsaito, dcaballe, Ayal, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D69228
* [LoopInterchange] Adjust assertions when updating successors.Florian Hahn2019-11-241-19/+35
| | | | | | | | | | | Currently the assertion in updateSuccessor is overly strict in some cases and overly relaxed in other cases. For branches to the inner and outer loop preheader it is too strict, because they can either be unconditional branches or conditional branches with duplicate targets. Both cases are fine and we can allow updating multiple successors. On the other hand, we have to at least update one successor. This patch adds such an assertion.
* [InstCombine] remove identity shuffle simplification for mask with undefsSanjay Patel2019-11-242-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246
* [InstCombine] Fix call guard difference with dbgDavide Italiano2019-11-221-4/+4
| | | | | | Patch by Chris Ye! Differential Revision: https://reviews.llvm.org/D68004
* [CodeMoverUtils] Added an API to check if an instruction can be safelyTsang Whitney W.H2019-11-222-0/+169
| | | | | | | | | | | | | | | | | | | moved before another instruction. Summary:Added an API to check if an instruction can be safely moved before another instruction. In future PRs, we will like to add support of moving instructions between blocks that are not control flow equivalent, and add other APIs to enhance usability, e.g. moving basic blocks, moving list of instructions... Loop Fusion will be its first user. When there is intervening code in between two loops, fusion is currently unable to fuse them. Loop Fusion can use this utility to check if the intervening code can be safely moved before or after the two loops, and move them, then it can successfully fuse them. Reviewer:kbarton,jdoerfert,Meinersbur,bmahjour,etiotto Reviewed By:bmahjour Subscribers:mgorny,hiraditya,llvm-commits Tag:LLVM Differential Revision:https://reviews.llvm.org/D70049
* [SLP] Enhance SLPVectorizer to vectorize vector aggregateAnton Afanasyev2019-11-221-6/+27
| | | | | | | | | | | | | | | | | Summary: Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`. This patch allows `findBuildAggregate()` to consider vector aggregates as well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`. Fixes vector part of llvm.org/PR42022 Reviewers: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70068
* [JumpThreading] NFC: Don't cache F.hasProfileData()Kazu Hirata2019-11-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: With this patch, we no longer cache F.hasProfileData(). We simply call the function again. I'm doing this because: - JumpThreadingPass also has a member variable named HasProfileData, which is very confusing, - the function is very lightweight, and - this patch makes JumpThreading::runOnFunction more consistent with JumpThreadingPass::run. Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70602
* [JumpThreading] Use profile data even with the new pass managerKazu Hirata2019-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Without this patch, the jump threading pass ignores profiling data whenever we invoke the pass with the new pass manager. Specifically, JumpThreadingPass::run calls runImpl with class variable HasProfileData always set to false. In turn, runImpl sets HasProfileData to false again: HasProfileData = HasProfileData_; In the end, we don't use profiling data at all with the new pass manager. This patch fixes the problem by passing F.hasProfileData() to runImpl. The bug appears to have been introduced at: https://reviews.llvm.org/D41461 which removed local variable HasProfileData in JumpThreadingPass::run even though there was one more use left in the same function. As a result, the remaining use ended referring to the class variable instead. Note that F.hasProfileData is an extremely lightweight function, so I don't see the need to cache its result. Once this patch is approved, I'm planning to stop caching the result of F.hasProfileData in runOnFunction. Reviewers: wmi, eli.friedman Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70509
OpenPOWER on IntegriCloud