| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
r369697 changed the behavior of stripPointerCasts to no longer include
aliases. However, the code in CGDeclCXX.cpp's createAtExitStub counted
on the looking through aliases to properly set the calling convention of
a call.
The result of the change was that the calling convention mismatch of the
call would be replaced with a llvm.trap, causing a runtime crash.
Differential Revision: https://reviews.llvm.org/D68584
llvm-svn: 373929
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After changing the remark serialization, we now pass StringRefs to the
serializer. We should use StringRef for StringBlockVal, to avoid
creating temporary objects, which then cause StringBlockVal.Value to
point to invalid memory.
Reviewers: thegameg, anemet
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D68571
llvm-svn: 373923
|
| |
|
|
| |
llvm-svn: 373919
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Minor format fix for output of "llvm-profdata -show"
Reviewers: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68440
llvm-svn: 373917
|
| |
|
|
|
|
|
|
| |
NFCI.
Stop all the callers from having to check the value type before calling getTargetShuffleInputs.
llvm-svn: 373915
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.
Differential Revision: https://reviews.llvm.org/D68253
llvm-svn: 373914
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This ensures that frame-based unwinding will continue to work when
calling a noreturn function; there is not much use having the caller's
frame pointer saved if you don't also have the caller's program counter.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68542
llvm-svn: 373907
|
| |
|
|
|
|
|
|
|
|
|
|
| |
J/JAL/JALX/JALS are absolute branches, but stay within the current
256 MB-aligned region, so we must include the high bits of the
instruction address when calculating the branch target.
Patch by James Clarke.
Differential Revision: https://reviews.llvm.org/D68548
llvm-svn: 373906
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The C API doesn't have the bindings to create macro debug information.
Reviewers: whitequark, CodaFi, deadalnix
Reviewed By: whitequark
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58334
llvm-svn: 373903
|
| |
|
|
|
|
| |
Fix comment.
llvm-svn: 373901
|
| |
|
|
|
|
|
|
|
|
|
| |
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
|
| |
|
|
|
|
|
|
|
| |
It broke MC/AsmParser/directive_ascii.s on all bots:
Assertion failed: (Index < Length && "Invalid index!"), function operator[],
file ../../llvm/include/llvm/ADT/StringRef.h, line 243.
llvm-svn: 373898
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.
Reviewers: nickdesaulniers
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68483
llvm-svn: 373888
|
| |
|
|
|
|
|
|
|
|
| |
load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.
llvm-svn: 373882
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
possible.
Move the erasing and iterator updating inside to match the
other slow LEA function.
I've adapted code from optTwoAddrLEA and basically rebuilt the
implementation here. We do lose the kill flags now just like
optTwoAddrLEA. This runs late enough in the pipeline that
shouldn't really be a problem.
llvm-svn: 373877
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
Fixes PR43217
Differential Revision: https://reviews.llvm.org/D68544
llvm-svn: 373871
|
| |
|
|
| |
llvm-svn: 373870
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is patch aims to group together the `CRNotPat` multi class instantiations
within the `PPCInstrInfo.td` file.
Integer instantiations of the multi class are grouped together into a section,
and the floating point patterns are separated into its own section.
Differential Revision: https://reviews.llvm.org/D67975
llvm-svn: 373869
|
| |
|
|
|
|
|
|
| |
directly.
Move the resolveTargetShuffleInputsAndMask call to after the shuffle mask combine before the undef/zero constant fold instead.
llvm-svn: 373868
|
| |
|
|
|
|
|
|
| |
Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373867
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v8i64->v8i8 truncate when v8i64 isn't legal
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.
I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68428
llvm-svn: 373864
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
split a setcc condition if the setcc input is legal and vXi1 conditions are supported
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.
Reviewers: RKSimon, spatel, efriedma
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68359
llvm-svn: 373863
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084
llvm-svn: 373857
|
| |
|
|
|
|
| |
of using setTargetShuffleZeroElements directly. NFCI.
llvm-svn: 373855
|
| |
|
|
|
|
|
|
| |
This reverts r373850 (git commit 25ba49824d2d4f2347b4a7cb1623600a76ce9433)
This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680
llvm-svn: 373853
|
| |
|
|
|
|
|
|
| |
Summary: Replace 'isDarwin' with 'IsDarwin' based on LLVM naming convention.
Differential Revision: https://reviews.llvm.org/D68336
llvm-svn: 373852
|
| |
|
|
|
|
|
| |
Extends rL373230 and solves the motivating bug (although in a narrow way):
https://bugs.llvm.org/show_bug.cgi?id=43497
llvm-svn: 373851
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 373850
|
| |
|
|
| |
llvm-svn: 373849
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(PR43501)
https://bugs.llvm.org/show_bug.cgi?id=43501
We can't declare a GEP 'inbounds' in general. But we may salvage that information if
we have known dereferenceable bytes on the source pointer.
Differential Revision: https://reviews.llvm.org/D68244
llvm-svn: 373847
|
| |
|
|
|
|
|
|
|
|
| |
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This allows us to remove createTargetShuffleMask.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373846
|
| |
|
|
| |
llvm-svn: 373845
|
| |
|
|
| |
llvm-svn: 373842
|
| |
|
|
| |
llvm-svn: 373841
|
| |
|
|
| |
llvm-svn: 373840
|
| |
|
|
| |
llvm-svn: 373839
|
| |
|
|
|
|
| |
Turn into shift and truncate. Doesn't yet handle pointers.
llvm-svn: 373838
|
| |
|
|
|
|
| |
This wasn't updated for the immarg handling change.
llvm-svn: 373837
|
| |
|
|
|
|
| |
Fixes PR43575.
llvm-svn: 373836
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.
This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.
Differential Revision: https://reviews.llvm.org/D68226
llvm-svn: 373834
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146
We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).
Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.
In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:
movbe rax, qword ptr [rdi]
or:
mov rax, qword ptr [rdi]
Not some (half) vector monstrosity as we currently do using SLP:
vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rdi + 5]
shl rcx, 40
movzx edx, byte ptr [rdi + 6]
shl rdx, 48
or rdx, rcx
movzx ecx, byte ptr [rdi + 7]
shl rcx, 56
or rcx, rdx
or rcx, rax
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
or rax, rcx
vzeroupper
ret
Differential Revision: https://reviews.llvm.org/D67841
llvm-svn: 373833
|
| |
|
|
|
|
| |
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.
llvm-svn: 373832
|
| |
|
|
|
|
| |
Silences static analyzer null dereference warnings.
llvm-svn: 373823
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation is to reuse the key value parsing logic here to
parse instance specific pass options within the context of MLIR.
The primary functionality exposed is the "," splitting for
arrays and the logic for properly handling duplicate definitions
of a single flag.
Patch by: Parker Schuh <parkers@google.com>
Differential Revision: https://reviews.llvm.org/D68294
llvm-svn: 373815
|
| |
|
|
|
|
|
|
|
|
| |
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
llvm-svn: 373814
|
| |
|
|
|
|
| |
simm9_lsb0 and simm12_lsb0 operand types were missing predicates.
llvm-svn: 373812
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Subscribers: llvm-commits
Tags: #llvm
Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson
Reviewed by: vsk
Differential Revision: https://reviews.llvm.org/D68478
llvm-svn: 373807
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
optimize the blocks
This reverts r371177 (git commit f879c6875563c0a8cd838f1e13b14dd33558f1f8)
It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.
See the PR for a test case to use when relanding.
llvm-svn: 373805
|
| |
|
|
|
|
|
|
|
|
|
| |
'icmp sge/slt %x, 0'
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
https://rise4fun.com/Alive/kPg
llvm-svn: 373802
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR43564, PR42391)
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.
https://rise4fun.com/Alive/GEw
https://rise4fun.com/Alive/gN2O
In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.
llvm-svn: 373801
|