| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.
Differential Revision: https://reviews.llvm.org/D42526
llvm-svn: 323797
|
|
|
|
| |
llvm-svn: 323328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AVX512BW adds support for variable shift amount for 16-bit element
vectors.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: rengolin, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D42437
llvm-svn: 323292
|
|
|
|
| |
llvm-svn: 323268
|
|
|
|
| |
llvm-svn: 323265
|
|
|
|
| |
llvm-svn: 323264
|
|
|
|
|
|
|
| |
Case points out that we don't consider shifts supported by AVX512BW
in isVectorShiftByScalarCheap()
llvm-svn: 323242
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.
This is a fix for PR35980.
Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230
llvm-svn: 323192
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
attributes (Step 1)
Summary:
This is a resurrection of work first proposed and discussed in Aug 2015:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.
In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
require that the alignments for source & dest be equal.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.
s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g
The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use
getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reviewers: pete, hfinkel, lhames, reames, bollu
Reviewed By: reames
Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits
Differential Revision: https://reviews.llvm.org/D41675
llvm-svn: 322965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the offset is differ in two addressing mode we can continue only if
ScaleReg is not set due to we will use it as merge of different offsets.
It should fix PR35799 and PR35805.
Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227
llvm-svn: 322056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.
This is a fix for PR35658.
Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251
llvm-svn: 320956
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If common type is different we should bail out due to we will not be
able to create a select or Phi of these values.
Basically it is done in ExtAddrMode::compare however it does not work
if we handle the null first and then two values of different types.
so add a check in initializeMap as well. The check in ExtAddrMode::compare
is used as earlier bail out.
Reviewers: reames, john.brawn
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40479
llvm-svn: 319292
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must collect all AddModes even if they are the same.
This is due to Original value is different but we need all original
values collected as they are used as anchors in common phi finding.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40166
llvm-svn: 318638
|
|
|
|
|
|
|
| |
The extra comma meant it wasn't correctly checking that we weren't getting an
extra getelementptr.
llvm-svn: 318406
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.
The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.
Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.
In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317555
|
|
|
|
|
|
|
|
|
|
|
| |
Commit tests for previous commit.
Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn
Reviewed By: john.brawn
Subscribers: javed.absar, john.brawn, dneilson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317430
|
|
|
|
|
|
| |
Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/.
llvm-svn: 317318
|
|
|
|
|
|
|
|
|
| |
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage
This reverts commit eea333c33fa73ad225ef28607795984829f65688.
llvm-svn: 317213
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.
See the discussion in PR33325 for more details.
Reviewers: spatel
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D39456
llvm-svn: 317211
|
|
|
|
|
|
|
|
| |
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
|
|
|
|
|
|
|
|
|
|
| |
This lets us optimize away selects that perform the same address computation in
two different ways and is also the first step towards being able to handle
selects between two different, but compatible, address computations.
Differential Revision: https://reviews.llvm.org/D38242
llvm-svn: 314794
|
|
|
|
|
|
|
|
| |
This only affects very small memcmp on x86 for now, but it
will become more important if we allow vector-sized load and
compares.
llvm-svn: 309711
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the code comment, transforming this in the other direction might require
a separate transform here in CGP given the block-at-a-time DAG constraint.
Besides that theoretical motivation, there are 2 practical motivations for the
subtract-of-cmps form:
1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still).
There is discussion about canonicalizing IR to the select form
( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ),
so we probably need to add DAG transforms for those patterns anyway, but this improves the
memcmp output without waiting for that step.
2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
if we choose to enable that.
Differential Revision: https://reviews.llvm.org/D34904
llvm-svn: 309597
|
|
|
|
|
|
|
|
|
|
|
|
| |
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
llvm-svn: 308986
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.
The added test shows an example of enabling optimization inside a loop.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294
llvm-svn: 308419
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
|
|
|
|
|
|
|
|
|
|
| |
When we fail to sink an instruction, we must make sure not to modify
the function; otherwise, we end up in an infinite loop because
CodeGenPrepare iterates until it doesn't make any changes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 .
llvm-svn: 307866
|
|
|
|
|
|
|
|
| |
This was auto-generated using an older version of the script,
and that version does not work with phis, so if we enable
expansion it will go bad.
llvm-svn: 307267
|
|
|
|
| |
llvm-svn: 306485
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
llvm-svn: 306471
|
|
|
|
|
|
|
| |
I don't think there's any visible difference from having the wrong layout
for the 32-bit case at this point, but that could change in the future.
llvm-svn: 305931
|
|
|
|
|
|
|
|
|
| |
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)
llvm-svn: 305802
|
|
|
|
|
|
|
|
|
|
|
|
| |
No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ),
so I want these for extra coverage if we enable memcmp expansion for x86.
As shown, nothing is expanded for x86 in CGP yet.
Also fundamentally, we're doing an IR transform, so we should have IR tests
for just that part. If something goes wrong, we need to know if the bug is
in CGP or later lowering.
llvm-svn: 305011
|
|
|
|
|
|
|
|
|
| |
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.
Differential Revision: https://reviews.llvm.org/D30596
llvm-svn: 299723
|
|
|
|
|
|
| |
Hopefully, this will make sense with a forthcoming patch. If not, we can move these back.
llvm-svn: 297660
|
|
|
|
| |
llvm-svn: 297629
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296416
|
|
|
|
|
|
|
| |
This reverts commit r296149 as it leads to crashes when compiling for
PPC.
llvm-svn: 296295
|
|
|
|
|
|
|
|
|
|
| |
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.
Differential Revision: https://reviews.llvm.org/D30114
llvm-svn: 296167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.
Differential Revision: https://reviews.llvm.org/D29916
llvm-svn: 296149
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289988
|
|
|
|
|
|
| |
This reverts commit r289951.
llvm-svn: 289960
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289951
|
|
|
|
|
|
|
| |
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").
llvm-svn: 288052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl
Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 287553
|
|
|
|
|
|
|
|
|
|
| |
The sink cast machinery is supposed to sink casts as close to their user
as possible. However, an EH pad is the first instruction in it's basic
block. Don't sink if the user is an EH pad.
This fixes PR27536.
llvm-svn: 267767
|
|
|
|
|
|
|
| |
Suggested in the review of D19488:
http://reviews.llvm.org/D19488
llvm-svn: 267504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sinking comparisons in CGP can undo the job of hoisting them done
earlier by LICM, and soft-FP makes this an expensive mistake.
A common pattern that produces floating point comparisons uniform
over a loop is an explicit check for division by zero. If the divisor
is hoisted out of the loop, the comparison can also be, but hoisting
the function that unwinds is never legal, since it may cause side
effects in the loop body prior to the unwinding to not be executed.
Differential Revision: http://reviews.llvm.org/D18744
llvm-svn: 265264
|