| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
The code was failing to actually check for the presence of the call to widenable_condition. The whole point of specifying the widenable_condition intrinsic was allowing widening transforms. A normal branch is not widenable. A normal branch leading to a deopt is not widenable (in general).
I added a test case via LoopPredication, but GuardWidening has an analogous bug. Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111.
llvm-svn: 357493
|
| |
|
|
| |
llvm-svn: 357490
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This makes libLLVMBPFCodeGen.so 1128 bytes smaller for my build.
Reviewers: yonghong-song
Reviewed By: yonghong-song
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60117
llvm-svn: 357489
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60080
llvm-svn: 357485
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
actually hot.
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.
Reviewers: danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59940
llvm-svn: 357484
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60079
llvm-svn: 357483
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For shift and rotate instructions that only use the last 6 bits of the shift
amount, a shift amount of (x*64-s) can be substituted with (-s). This saves
one instruction and a register:
lhi %r1, 64
sr %r1, %r3
sllg %r2, %r2, 0(%r1)
=>
lcr %r1, %r3
sllg %r2, %r2, 0(%r1)
Review: Ulrich Weigand
llvm-svn: 357481
|
| |
|
|
|
|
|
|
|
| |
`StoreInst::getValueOperand` is identical to `getOperand(0)`, so the call to
`getOperand(0)` can be replaced. Further, `SI->getValueOperand` is redundantly
called just a few lines down, despite its return value being stored in variable
`DV`. No functional change.
llvm-svn: 357479
|
| |
|
|
|
|
| |
All issues found by machine verifier in MIPS target have been fixed.
llvm-svn: 357473
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To disable using of odd floating-point registers (O32 ABI and
-mno-odd-spreg command line option) such registers and their
super-registers added to the set of reserved registers. In general, it
works. But there is at least one problem - in case of enabled machine
verifier pass some floating-point tests failed because live ranges of
register units that are reserved is not empty and verification pass
failed with "Live segment doesn't end at a valid instruction" error
message.
There is D35985 patch which tries to solve the problem by explicit
removing of register units. This solution did not get approval.
I would like to use another approach for prevent using odd floating
point registers - define `AltOrders` and `AltOrderSelect` for MIPS
floating point register classes. Such `AltOrders` contains reduced set
of registers. At first glance, such solution does not break any test
cases and allows enabling machine instruction verification for all MIPS
test cases.
Differential Revision: http://reviews.llvm.org/D59799
llvm-svn: 357472
|
| |
|
|
|
|
|
| |
- ObjectYAML depends on Object as minidump support adds additional
dependency.
llvm-svn: 357471
|
| |
|
|
|
|
|
|
|
|
| |
This patch allows symbols appended with @plt to parse and assemble with the
R_RISCV_CALL_PLT relocation.
Differential Revision: https://reviews.llvm.org/D55335
Patch by Lewis Revill.
llvm-svn: 357470
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds the code needed to parse a minidump file into the
MinidumpYAML model, and the necessary glue code so that obj2yaml can
recognise the minidump files and process them.
Reviewers: jhenderson, zturner, clayborg
Subscribers: mgorny, lldb-commits, amccarth, markmentovai, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59634
llvm-svn: 357469
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.
Reviewers: arsenm, thegameg, MatzeB
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60062
llvm-svn: 357460
|
| |
|
|
|
|
| |
This makes it faster and saves 104 bytes for my build.
llvm-svn: 357458
|
| |
|
|
|
|
|
|
| |
-internalize-public-api-file
This makes my libLLVMipo.so.9svn smaller by 360 bytes.
llvm-svn: 357457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
used results (PR41259)
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.
That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.
Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 357452
|
| |
|
|
|
|
| |
The code doesn't actually need any of the information about the widenable condition at this level. The only thing we need is to ensure the WC call is the last thing anded in, and even that is a quirk we should really look to remove.
llvm-svn: 357448
|
| |
|
|
|
|
|
|
|
|
| |
isPotentiallyReachable.
The leads to some ambiguous overloads, so update three callers.
Differential Revision: https://reviews.llvm.org/D60085
llvm-svn: 357447
|
| |
|
|
|
|
| |
All of the interfaces related to opcode in MachineInstr and MCInstrInfo refer to opcodes as unsigned.
llvm-svn: 357444
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.
While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.
Differential Revision: https://reviews.llvm.org/D59616
llvm-svn: 357437
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length. Not sure if that really makes sense, but I decided not to touch
it for now.
Differential Revision: https://reviews.llvm.org/D59385
llvm-svn: 357436
|
| |
|
|
| |
llvm-svn: 357433
|
| |
|
|
|
|
|
|
|
| |
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.
Differential Revision: https://reviews.llvm.org/D60083
llvm-svn: 357432
|
| |
|
|
|
|
|
|
| |
whitespace.
Differential Revision: https://reviews.llvm.org/D60084
llvm-svn: 357427
|
| |
|
|
|
|
| |
Fixes a bug in isPotentiallyReachable, noticed by inspection.
llvm-svn: 357425
|
| |
|
|
|
|
|
|
|
|
| |
X86::OPERAND_ROUNDING_CONTROL instead of MCOI::OPERAND_IMMEDIATE. Add an assert on legal values of rounding control in the encoder and remove an explicit mask.
This should allow llvm-exegesis to intelligently constrain the rounding mode.
The mask in the encoder shouldn't be necessary any more. We used to allow codegen to use 8-11 for rounding mode and the assembler would use 0-3 to mean the same thing so we masked here and in the printer. Codegen now matches the assembler and the printer was updated, but I forgot to update the encoder.
llvm-svn: 357419
|
| |
|
|
|
|
| |
methods. NFCI.
llvm-svn: 357416
|
| |
|
|
| |
llvm-svn: 357414
|
| |
|
|
|
|
| |
We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.
llvm-svn: 357408
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.
Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.
Reviewers: tra
Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59947
llvm-svn: 357407
|
| |
|
|
|
|
| |
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing.
llvm-svn: 357406
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.
Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).
This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).
Differential Revision: https://reviews.llvm.org/D59295
llvm-svn: 357400
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This instruction writes a block of allocation tags
and stores zero to the associated data locations.
It differs from STGM by 1 bit and has the same
arguments.
The specification can be found here:
https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60065
llvm-svn: 357397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by
creating the RISCVMCExpr when tail/call are parsed, or in the codegen case
when the callee symbols are created.
This required adding a new CallSymbol operand to allow only adding
VK_RISCV_CALL to tail/call instructions.
This patch will allow further expansion of parsing and codegen to easily
include PLT symbols which must generate the R_RISCV_CALL_PLT relocation.
Differential Revision: https://reviews.llvm.org/D55560
Patch by Lewis Revill.
llvm-svn: 357396
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.
The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60064
llvm-svn: 357395
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.
Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.
Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.
llvm-svn: 357393
|
| |
|
|
|
|
|
|
|
|
|
| |
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.
The specification can be found here:
https://developer.arm.com/docs/ddi0596/c
llvm-svn: 357392
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
evaluateInDifferentElementOrder
Summary:
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
Reviewers: reames, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60058
llvm-svn: 357389
|
| |
|
|
| |
llvm-svn: 357388
|
| |
|
|
|
|
|
|
|
|
|
| |
evaluateInDifferentElementOrder"
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.
I'll recommit with a better commit message with reference to the
phabricator review.
llvm-svn: 357387
|
| |
|
|
|
|
|
|
|
|
|
|
| |
evaluateInDifferentElementOrder
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
llvm-svn: 357385
|
| |
|
|
|
|
|
|
|
|
| |
Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59688
llvm-svn: 357384
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:
LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2
PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.
As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.
Differential Revision: https://reviews.llvm.org/D60048
llvm-svn: 357382
|
| |
|
|
|
|
|
|
|
| |
Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and
`fcmp ord` would be inefficient due to an unoptimized double negation.
Differential Revision: https://reviews.llvm.org/D59699
llvm-svn: 357378
|
| |
|
|
|
|
|
|
|
| |
and scatter intrinsics.
This is the appropriate opcode for only having a chain output. Though I'm not
sure it matters much.
llvm-svn: 357375
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
linker relaxation)
A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to
the real target. RISCVMCExpr::evaluatePCRelLo will work around this
indirection in order to allow the fixup to be evaluate properly. However, if
relocations are forced (e.g. due to linker relaxation is enabled) then its
evaluation is undesired and will result in a relocation with the wrong target.
This patch modifies evaluatePCRelLo so it will not try to evaluate if the
fixup will be forced as a relocation. A new helper method is added to
RISCVAsmBackend to query this.
Differential Revision: https://reviews.llvm.org/D59686
llvm-svn: 357374
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One motivation for making this change is that the lack of using movmsk is likely
a main source of perf difference between clang and gcc on the C-Ray benchmark as
shown here:
https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5
...but this change alone isn't enough to solve that problem.
The 'all-of' examples show what is likely the worst case trade-off: we end up with
an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of'
examples look clearly better using movmsk because we've traded 2 vector instructions
for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'.
If we examine the llvm-mca output for these cases, it appears that even though the
'all-of' movmsk variant looks worse on paper, it would perform better on both
Haswell and Jaguar.
$ llvm-mca -mcpu=haswell no_movmsk.s -timeline
Iterations: 100
Instructions: 400
Total Cycles: 504
Total uOps: 400
Dispatch Width: 4
uOps Per Cycle: 0.79
IPC: 0.79
Block RThroughput: 1.0
$ llvm-mca -mcpu=haswell movmsk.s -timeline
Iterations: 100
Instructions: 600
Total Cycles: 358
Total uOps: 600
Dispatch Width: 4
uOps Per Cycle: 1.68
IPC: 1.68
Block RThroughput: 1.5
$ llvm-mca -mcpu=btver2 no_movmsk.s -timeline
Iterations: 100
Instructions: 400
Total Cycles: 407
Total uOps: 400
Dispatch Width: 2
uOps Per Cycle: 0.98
IPC: 0.98
Block RThroughput: 2.0
$ llvm-mca -mcpu=btver2 movmsk.s -timeline
Iterations: 100
Instructions: 600
Total Cycles: 311
Total uOps: 600
Dispatch Width: 2
uOps Per Cycle: 1.93
IPC: 1.93
Block RThroughput: 3.0
Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if
that's true, then we're also almost certainly making the wrong transform already for
reductions with >2 elements, so that should be fixed independently.
Differential Revision: https://reviews.llvm.org/D59997
llvm-svn: 357367
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.
Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.
We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.
So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.
Differential Revision: https://reviews.llvm.org/D60016
llvm-svn: 357366
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: llvm.org, Jim
Reviewed By: Jim
Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59983
llvm-svn: 357365
|