| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
basic blocks"
This reverts commit 72ce759928e6dfee6a9efa310b966c19722352ba.
Reverted due to build failure.
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit e18531595bba495946aa52c0a16b9f9238cff8bc.
On Windows, there is an error:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio
error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PHI node checks for inner loop exits are too permissive currently.
As indicated by an existing comment, we should only allow LCSSA PHI
nodes that are part of reductions or are only used outside of the loop
nest. We ensure this by checking the users of the LCSSA PHIs.
Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer
loop.
It also moves the inner loop exit check before the outer loop exit
check.
Fixes PR43473.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D68144
|
|
|
|
| |
The variable prevents compiling when using -Werror=unused-variable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the existing IR passes only. NFC.
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70988
|
|
|
|
|
|
|
|
|
|
| |
When basic blocks are killed, either due to being empty or to being an if.then
or if.else block whose complement contains identical instructions, some of the
debug intrinsics in that block are lost. This patch sinks those intrinsics
into the single successor block, setting them Undef if necessary to
prevent debug info from falling out-of-date.
Differential Revision: https://reviews.llvm.org/D70318
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SCEV caches the exiting blocks when computing exit counts. In
SimpleLoopUnswitch, we split the exit block of the loop to unswitch.
Currently we only invalidate the loop containing that exit block, but if
that block is the exiting block for a parent loop, we have stale cache
entries. We have to invalidate the top-most loop that contains the exit
block as exiting block. We might also be able to skip invalidating the
loop containing the exit block, if the exit block is not an exiting
block of that loop.
There are also 2 more places in SimpleLoopUnswitch, that use a similar
problematic approach to get the loop to invalidate. If the patch makes
sense, I will also update those places to a similar approach (they deal
with multiple exit blocks, so we cannot directly re-use
getTopMostExitingLoop).
Fixes PR43972.
Reviewers: skatkov, reames, asbirlea, chandlerc
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D70786
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).
This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.
To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.
Fixes PR34095.
Differential Revision: https://reviews.llvm.org/D70425
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6749dc3446671df05235d0a218c426a314ac33cd related to bitcast handling of x86_mmx
This reverts these two commits
[InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement
We're seeing at least one internal test failure related to a
bitcast that was previously before an inline assembly block
containing emms being placed after it. This leads to the mmx
state ending up not empty after the emms. IR has no way to
make any specific guarantees about this. Reverting these patches
to get back to previous behavior which at least worked for this
test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.
Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".
Differential Revision: https://reviews.llvm.org/D70298
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.
Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70587
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a dump() function to VPlan, which uses the existing
operator<<.
This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.
Reviewers: hsaito, Ayal, gilr, rengolin
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D70920
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes https://llvm.org/PR26673
"Wrong debugging information with -fsanitize=address"
where asan instrumentation causes the prologue end to be computed
incorrectly: findPrologueEndLoc, looks for the first instruction
with a debug location to determine the prologue end. Since the asan
instrumentation instructions had debug locations, that prologue end was
at some instruction, where the stack frame is still being set up.
There seems to be no good reason for extra debug locations for the
asan instrumentations that set up the frame; they don't have a natural
source location. In the debugger they are simply located at the start
of the function.
For certain other instrumentations like -fsanitize-coverage=trace-pc-guard
the same problem persists - that might be more work to fix, since it
looks like they rely on locations of the tracee functions.
This partly reverts aaf4bb239487e0a3b20a8eaf94fc641235ba2c29
"[asan] Set debug location in ASan function prologue"
whose motivation was to give debug location info to the coverage callback.
Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is
given the correct source location; as the debug location is still set in
ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break.
So -fsanitize-coverage is hopefully unaffected - I don't think it should
rely on the debug locations of asan-generated allocas.
Related revision: 3c6c14d14b40adfb581940859ede1ac7d8ceae7a
"ASAN: Provide reliable debug info for local variables at -O0."
Below is how the X86 assembly version of the added test case changes.
We get rid of some .loc lines and put prologue_end where the user code starts.
```diff
--- 2.master.s 2019-12-02 12:32:38.982959053 +0100
+++ 2.patch.s 2019-12-02 12:32:41.106246674 +0100
@@ -45,8 +45,6 @@
.cfi_offset %rbx, -24
xorl %eax, %eax
movl %eax, %ecx
- .Ltmp2:
- .loc 1 3 0 prologue_end # 2.c:3:0
cmpl $0, __asan_option_detect_stack_use_after_return
movl %edi, 92(%rbx) # 4-byte Spill
movq %rsi, 80(%rbx) # 8-byte Spill
@@ -57,9 +55,7 @@
callq __asan_stack_malloc_0
movq %rax, 72(%rbx) # 8-byte Spill
.LBB1_2:
- .loc 1 0 0 is_stmt 0 # 2.c:0:0
movq 72(%rbx), %rax # 8-byte Reload
- .loc 1 3 0 # 2.c:3:0
cmpq $0, %rax
movq %rax, %rcx
movq %rax, 64(%rbx) # 8-byte Spill
@@ -72,9 +68,7 @@
movq %rax, %rsp
movq %rax, 56(%rbx) # 8-byte Spill
.LBB1_4:
- .loc 1 0 0 # 2.c:0:0
movq 56(%rbx), %rax # 8-byte Reload
- .loc 1 3 0 # 2.c:3:0
movq %rax, 120(%rbx)
movq %rax, %rcx
addq $32, %rcx
@@ -99,7 +93,6 @@
movb %r8b, 31(%rbx) # 1-byte Spill
je .LBB1_7
# %bb.5:
- .loc 1 0 0 # 2.c:0:0
movq 40(%rbx), %rax # 8-byte Reload
andq $7, %rax
addq $3, %rax
@@ -118,7 +111,8 @@
movl %ecx, (%rax)
movq 80(%rbx), %rdx # 8-byte Reload
movq %rdx, 128(%rbx)
- .loc 1 4 3 is_stmt 1 # 2.c:4:3
+.Ltmp2:
+ .loc 1 4 3 prologue_end # 2.c:4:3
movq %rax, %rdi
callq f
movq 48(%rbx), %rax # 8-byte Reload
```
Reviewers: eugenis, aprantl
Reviewed By: eugenis
Subscribers: ormris, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70894
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This cropped up in the Linux kernel where cold code was placed in an
incompatible section.
Reviewers: compnerd, vsk, tejohnson
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70925
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.
Reviewers: davidxl
Subscribers: hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70510
|
|
|
|
|
|
|
|
|
|
|
| |
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.
Reviewers: hsaito, Ayal, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D70733
|
|
|
|
|
|
|
|
|
| |
As described here:
https://bugs.llvm.org/show_bug.cgi?id=44186
The match() code safely allows undef values, but we can't safely
propagate a vector constant that contains an undef to the new
compare instruction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.
This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70484
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR44100)
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.
Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.
From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.
As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.
However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.
Doing so doesn't seem to break any particular idioms,
as least as far as i can see.
References https://bugs.llvm.org/show_bug.cgi?id=44100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic
This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.
Differential Revision: https://reviews.llvm.org/D70792
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
optimizeVectorResize is rewriting patterns like:
%1 = bitcast vector %src to integer
%2 = trunc/zext %1
%dst = bitcast %2 to vector
Since bitcasting between integer an vector types gives
different integer values depending on endianness, we need
to take endianness into account. As it happens the old
implementation only produced the correct result for little
endian targets.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178
Reviewers: spatel, lattner, lebedev.ri
Reviewed By: spatel, lebedev.ri
Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70844
|
|
|
|
|
|
|
|
| |
The constants come through as add %x, -C, not a sub as would be
expected. They need some extra matchers to canonicalise them towards
usub_sat.
Differential Revision: https://reviews.llvm.org/D69514
|
|
|
|
|
|
| |
This adjusts the one use checks in the the usub_sat fold code to not
increase instruction count, but otherwise do the fold. Reviewed as a
part of D69514.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch introduces the deduction based on load/store instructions whose pointer operand is a non-inbounds GEP instruction.
For example if we have,
```
void f(int *u){
u[0] = 0;
u[1] = 1;
u[2] = 2;
}
```
then u must be dereferenceable(12).
This patch is inspired by D64258
Reviewers: jdoerfert, spatel, hfinkel, RKSimon, sstefan1, xbolva00, dtemirbulatov
Reviewed By: jdoerfert
Subscribers: jfb, lebedev.ri, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70714
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch prevents the simultaneous presence of `dereferenceable` and `dereferenceable_or_null` attribute
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70789
|
|
|
|
|
|
| |
pointer checks"
This reverts commit 7ca7d62c6ea1680ec0a1861083669596547fdd6f. Commited accidentally.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: PR44149
Reviewers: jdoerfert
Subscribers: mehdi_amini, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pass at the O1 described there.""
This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4
Original commit message:
As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
return memccpy(d, "helloworld", 'r', 20)
=>
return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8
Reviewers: efriedma, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68089
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch enables us to track GEP instruction in align deduction.
If a pointer `B` is defined as `A+Offset` and known to have alignment `C`, there exists some integer Q such that
```
A + Offset = C * Q = B
```
So we can say that the maximum power of two which is a divisor of gcd(Offset, C) is an alignment.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70392
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the O1 described there."
This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4.
This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu
Following tests were failing:
lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py
lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py
lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.
Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.
Original post:
http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
|
|
|
|
|
|
| |
This is NFC-intended because SimplifyDemandedVectorElts() does the same
transform later. As discussed in D70641, we may want to change that
behavior, so we need to isolate where it happens.
|
|
|
|
|
|
|
|
| |
Reviewer: kbarton, jdoerfert, Meinersbur, bmahjour, etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D70619
|
|
|
|
|
|
|
|
|
| |
The pattern in question is currently not possible because we
aggressively (wrongly) transform mask elements to undef values
if they choose from an undef operand. That, however, would
change if we tighten our semantics for shuffles as discussed
in D70641. Adding this check gives us the flexibility to make
that change with minimal overhead for current definitions.
|
|
|
|
| |
We never use the local 'Mask' before returning, so that was dead code.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Related bug: https://bugs.llvm.org/show_bug.cgi?id=40648
Static helper function rewriteDebugUsers in Local.cpp deletes dbg.value
intrinsics when it cannot move or rewrite them, or salvage the deleted
instruction's value. It should instead undef them in this case.
This patch fixes that and I've added a test which covers the failing test
case in bz40648. I've updated the unit test Local.ReplaceAllDbgUsesWith
to check for this behaviour (and fixed a typo in the test which would
cause the old test to always pass).
Reviewers: aprantl, vsk, djtodoro, probinson
Reviewed By: vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70604
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
order recurrences."
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
stay in place for another recurrence.
Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.
With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.
As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.
Fixes PR43398.
Reviewers: hsaito, dcaballe, Ayal, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D69228
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the assertion in updateSuccessor is overly strict in some
cases and overly relaxed in other cases. For branches to the inner and
outer loop preheader it is too strict, because they can either be
unconditional branches or conditional branches with duplicate targets.
Both cases are fine and we can allow updating multiple successors.
On the other hand, we have to at least update one successor. This patch
adds such an assertion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that
pattern. That preserves some of the old optimizations in IR.
Given a shuffle that includes undef elements in an otherwise identity mask like:
define <4 x float> @shuffle(<4 x float> %arg) {
%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
ret <4 x float> %shuf
}
We were simplifying that to the input operand.
But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now
leak to the result.
Also note that we still have (and there are tests for) the same transform with no undef
elements in the mask (a fully-defined identity mask). I don't think there's any
controversy about that case - it's a valid transform under any interpretation of
shufflevector/undef/poison.
Looking at a few of the diffs into codegen, I don't see any difference in final asm. So
depending on your perspective, that's good (no real loss of optimization power) or bad
(poison exists in the DAG, so we only partially fixed the bug).
Differential Revision: https://reviews.llvm.org/D70246
|
|
|
|
|
|
| |
Patch by Chris Ye!
Differential Revision: https://reviews.llvm.org/D68004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
moved before another instruction.
Summary:Added an API to check if an instruction can be safely moved
before another instruction. In future PRs, we will like to add support
of moving instructions between blocks that are not control flow
equivalent, and add other APIs to enhance usability, e.g. moving basic
blocks, moving list of instructions...
Loop Fusion will be its first user. When there is intervening code in
between two loops, fusion is currently unable to fuse them. Loop Fusion
can use this utility to check if the intervening code can be safely
moved before or after the two loops, and move them, then it can
successfully fuse them.
Reviewer:kbarton,jdoerfert,Meinersbur,bmahjour,etiotto
Reviewed By:bmahjour
Subscribers:mgorny,hiraditya,llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D70049
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.
Fixes vector part of llvm.org/PR42022
Reviewers: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70068
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With this patch, we no longer cache F.hasProfileData(). We simply
call the function again.
I'm doing this because:
- JumpThreadingPass also has a member variable named HasProfileData,
which is very confusing,
- the function is very lightweight, and
- this patch makes JumpThreading::runOnFunction more consistent with
JumpThreadingPass::run.
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70602
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Without this patch, the jump threading pass ignores profiling data
whenever we invoke the pass with the new pass manager.
Specifically, JumpThreadingPass::run calls runImpl with class variable
HasProfileData always set to false. In turn, runImpl sets
HasProfileData to false again:
HasProfileData = HasProfileData_;
In the end, we don't use profiling data at all with the new pass
manager.
This patch fixes the problem by passing F.hasProfileData() to runImpl.
The bug appears to have been introduced at:
https://reviews.llvm.org/D41461
which removed local variable HasProfileData in JumpThreadingPass::run
even though there was one more use left in the same function. As a
result, the remaining use ended referring to the class variable
instead.
Note that F.hasProfileData is an extremely lightweight function, so I
don't see the need to cache its result. Once this patch is approved,
I'm planning to stop caching the result of F.hasProfileData in
runOnFunction.
Reviewers: wmi, eli.friedman
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70509
|