| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without an
entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
Differential Revision: https://reviews.llvm.org/D68267
|
|
|
|
|
|
|
|
|
|
|
| |
When EFLAGS is no longer live into a basic block, remove it from the live-in
list.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44462.
Review: Craig Topper
Differential Revision: https://reviews.llvm.org/D71375
|
|
|
|
|
|
|
|
|
|
|
|
| |
*** Bad machine code: Tied use must be a register ***
- function: stg_alloca17
- basic block: %bb.0 entry (0x20076710580)
- instruction: early-clobber %0:gpr64common, early-clobber %1:gpr64sp = STGloop 272, %stack.0.a :: (store 272 into %ir.a, align 16)
- operand 3: %stack.0.a
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/21481/steps/test-check-all/logs/stdio
This reverts commit b675a7628ce6a21b1e4a71c079a67badfb8b073d.
|
|
|
|
|
|
|
|
| |
It looks like my patch breaks the sanitizer-windows build:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/56324
This reverts commit ead815924e6ebeaf02c31c37ebf7a560b5fdf67b.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
|
| |
|
|
|
|
|
|
|
|
|
|
| |
64-bit mode
For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier.
When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd.
Differential Revision: https://reviews.llvm.org/D72368
|
|
|
|
|
|
| |
We have to do an intermediate jump to a GPR to make the cast.
Fixes PR43750.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code.
Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and *only* worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support.
For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall.
Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least *correct* and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place.
Differential Revision: https://reviews.llvm.org/D72303
|
|
|
|
| |
Silence (clang/MSVC) static analyzer warnings that the fragment data may either write out of bounds of the local array or reference uninitialized data.
|
|
|
|
| |
Use cast<> instead of dyn_cast<> since we know that the pointer should be valid (and is dereferenced immediately below in the getSignature call).
|
| |
|
|
|
|
| |
Use llvm::Optional<APInt> instead of std::pair<APInt, bool> with the bool second being used to report success/failure of fold.
|
|
|
|
|
|
|
| |
This hopes to improve readability and adds an assert.
The functional change noted by the TODO comment is
proposed in:
D72361
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:
bb3:
%var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
by duplicating basic blocks like bb3 above. Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:
bb3:
%var = phi i32* [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb3.dup:
%var = phi i32* [ null, %bb1 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70247
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics fills in all the shift instructions that take
a variable shift distance in a register, instead of an immediate. Some
of these instructions take a single shift distance in a scalar
register and apply it to all lanes; others take a vector of per-lane
distances.
These instructions are all basically one family, varying in whether
they saturate out-of-range values, and whether they round when bits
are shifted off the bottom. I've implemented them at the IR level by a
much smaller family of IR intrinsics, which take flag parameters to
indicate saturating and/or rounding (along with the usual one to
specify signed/unsigned integers).
An oddity is that all of them are //left// shift instructions – but if
you pass a negative shift count, they'll shift right. So the vector
shift distances are always vectors of //signed// integers, regardless
of whether you're considering the other input vector to be of signed
or unsigned. Also, even the simplest `vshlq` instruction in this
family (neither saturating nor rounding) has to be implemented as an
IR intrinsic, because the ordinary LLVM IR `shl` operation would
consider an out-of-range shift count to be undefined behavior.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics covers two sets of immediate shift
instructions, which have in common that they only overwrite part of
their output register and so they need an extra input giving its
previous value.
The VSLI and VSRI instructions shift each lane of the input vector
left or right just as if they were normal immediate VSHL/VSHR, but
then they only overwrite the output bits that correspond to actual
shifted bits of the input. So VSLI will leave the low n bits of each
output lane unchanged, and VSRI the same with the top n bits.
The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input
vector of 2n-bit integers, shift each lane right by a constant, and
then narrowing the shifted result to only n bits. So they only
overwrite half of the n-bit lanes in the output register, and the B/T
suffix indicates whether it's the bottom or top half of each 2n-bit
lane.
I've implemented the whole of the latter family using a single IR
intrinsic `vshrn`, which takes a lot of i32 parameters indicating
which instruction it expands to (by specifying signedness of the input
and output types, whether it saturates and/or rounds, etc).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72328
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
|
|
|
|
|
|
|
| |
This patch moves `InPQueue` into function arguments instead of template
arguments of `releaseNode`, which is a cleaner approach.
Differential Revision: https://reviews.llvm.org/D72125
|
|
|
|
|
|
|
|
| |
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.
Differential Revision: https://reviews.llvm.org/D71743
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optimizing part. #2.
Summary:
This patch relands D71271. The problem with D71271 is that it has cyclic dependency:
CodeGen->AsmPrinter->DebugInfoDWARF->CodeGen. To avoid cyclic dependency this patch
puts implementation for DWARFOptimizer into separate library: lib/DWARFLinker.
Thus the difference between this patch and D71271 is in that DWARFOptimizer renamed into
DWARFLinker and it`s files are put into lib/DWARFLinker.
Reviewers: JDevlieghere, friss, dblaikie, aprantl
Reviewed By: JDevlieghere
Subscribers: thegameg, merge_guards_bot, probinson, mgorny, hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D71839
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit a041c4ec6f7aa659b235cb67e9231a05e0a33b7d.
This looks like a non-trivial change and there has been no code
reviews (at least there were no phabricator revisions attached to the
commit description). It is also causing a regression in one of our
downstream integration tests, we haven't been able to come up with a
minimal reproducer yet.
|
|
|
|
|
|
|
|
| |
Apple's CPUs are called A7-A13 in official communication, occasionally with
weird suffixes which we probably don't need to care about. This adds each one
and describes its features. It also switches the default CPU to the canonical
name for Cyclone, but leaves legacy support in so that existing bitcode still
compiles.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Adding fp128 support for strict fcmp
Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand
Subscribers: hiraditya, llvm-commits, LuoYuanke
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71897
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is analogous to D58943, which correctly finds the corresponding
fixup. However, when linker relaxations are disabled and we evaluate the
fixup, we need to also ensure we use an offset of 0 rather than the size
of the previous fragment.
Reviewers: asb, efriedma, lenary
Reviewed By: efriedma
Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71978
|
|
|
|
|
| |
This partially fixes GlobalISel import of the patterns, but removes a
lot of entriess from the end of the skipped pattern log.
|
|
|
|
|
|
| |
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D64869
|
|
|
|
|
|
| |
Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't.
Differential Revision: https://reviews.llvm.org/D67126
|
|
|
|
|
|
|
|
|
|
| |
E.g.
%addr1 = G_PTR_ADD %base, G_CONSTANT 20
%addr2 = G_PTR_ADD %addr1, G_CONSTANT 8
-->
%addr2 = G_PTR_ADD %base, G_CONSTANT 28
Differential Revision: https://reviews.llvm.org/D72351
|
|
|
|
|
|
| |
This reverts commit 52366088a8e42c2f1e96e8430b84b8b65ec3f7bc.
I accidentally pushed this before supporting changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.
Reviewers: jyknight, nickdesaulniers, hfinkel
Reviewed By: jyknight, nickdesaulniers
Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69876
|
|
|
|
|
| |
4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal
condition type set for pointers with any unrecognized address space.
|
|
|
|
|
| |
This was only applying the deeper nested zext pattern, and missing the
special case code size fold.
|
|
|
|
|
|
| |
Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now.
Differential Revision: https://reviews.llvm.org/D72354
|
|
|
|
|
| |
Now that overridable default operands work, there's no reason to use
complex patterns to just produce 0s.
|
| |
|
| |
|
|
|
|
|
| |
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Every powerpc64le platform uses elfv2.
For powerpc64, the environments "elfv1" and "elfv2" were added for
FreeBSD ELFv1->ELFv2 migration in D61950. FreeBSD developers have
decided to use OS versions to select ABI, and no one is relying on the
environments.
Also use elfv2 on powerpc64-linux-musl.
Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default
ABI.
Reviewed By: adalava
Differential Revision: https://reviews.llvm.org/D72352
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Conservatively always save + restore LR in noreturn functions.
These functions do not end in a RET, and so they aren't guaranteed to have an
instruction which uses LR in any way. So, as a result, you can end up in
unfortunate situations where you can't backtrace out of these functions in a
debugger.
Remove the old noreturn test, and add a new one which is more descriptive.
Remove the restriction that we can't outline from noreturn functions as well
since we now do the right thing.
|
|
|
|
|
|
|
|
| |
v2i64 in foldVectorXorShiftIntoCmp.
Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence.
Differential Revision: https://reviews.llvm.org/D72318
|
|
|
|
|
|
| |
Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements.
Differential Revision: https://reviews.llvm.org/D72302
|
|
|
|
|
|
| |
(trunk). NFC.
Fixes warning: loop variable 'E' of type 'const llvm::StringRef' creates a copy from type 'const llvm::StringRef' [-Wrange-loop-analysis]
|
| |
|
|
|
|
|
|
|
|
|
|
| |
SUMMARY:
In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable.
Reviewers: hubert.reinterpretcast,jasonliu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D71551
|
|
|
|
|
| |
The imm folding optimization pattern failed to import. The instruction
pattern was already working, but failing to fail on SGPR inputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) -->
select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?)
If both sides of the select are cmps, we can remove an instruction.
The case where only side is a cmp is deferred to a possible
follow-on patch.
We have a more general 'isFreeToInvert' analysis, but I'm not seeing
a way to use that more widely without inducing infinite looping
(opposing transforms).
Here, we flip the compare predicates directly, so we should not have
any danger by creating extra intermediate 'not' ops.
Alive proofs:
https://rise4fun.com/Alive/jKa
Name: both select values are compares - invert predicates
%tcmp = icmp sle i32 %x, %y
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = icmp sgt i32 %x, %y
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Name: false val is compare - invert/not
%fcmp = icmp ugt i32 %z, %w
%sel = select i1 %cond, i1 %tcmp, i1 %fcmp
%not = xor i1 %sel, true
=>
%tcmp_not = xor i1 %tcmp, -1
%fcmp_not = icmp ule i32 %z, %w
%not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not
Differential Revision: https://reviews.llvm.org/D72007
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Attribute::getAsString doesn't have enough information to print anonymous
Module-level types correctly, so they come back as "%type 0xabcd". This results
in broken IR when printing as text.
Instead, print type-attributes (currently just byval) using the TypePrinting
infrastructure available in AsmWriter. This only applies to function argument
attributes.
|