| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.
This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362
Reviewers: spatel, efriedma, RKSimon, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D54725
llvm-svn: 347287
|
| |
|
|
|
|
|
|
|
|
| |
If args is empty then accesing element 0 is illegal.
https://reviews.llvm.org/D53556
Patch by Eugene Sharygin. Thanks Eugene!
llvm-svn: 347281
|
| |
|
|
| |
llvm-svn: 347278
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54734
llvm-svn: 347277
|
| |
|
|
|
|
|
|
| |
PointerAttributes is a bitwise-or of several other fields, each of
which is already printed on its own line with a better explanation.
So this doesn't really help much.
llvm-svn: 347275
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D54728
llvm-svn: 347274
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Put 'static' on three functions in an anonymous namespace as per our
coding style.
Remove the 'namespace llvm {}' around the .cpp file and explicitly
declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'.
I prefer this style for free functions because the compiler will error
out if the .h and .cpp files don't agree on the function name or
prototype.
llvm-svn: 347269
|
| |
|
|
|
|
| |
Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.
llvm-svn: 347268
|
| |
|
|
|
|
|
|
| |
GFX9 should select opsel version.
Differential Revision: https://reviews.llvm.org/D54545
llvm-svn: 347265
|
| |
|
|
|
|
|
|
| |
and switches"
This reverts commits r347183 & r347184. Crashes while building libxml.
llvm-svn: 347260
|
| |
|
|
|
|
|
|
|
| |
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.
Differential Revision: https://reviews.llvm.org/D54718
llvm-svn: 347259
|
| |
|
|
|
|
|
|
|
| |
Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi
improves backtrace quality.
Fixes llvm.org/PR38083.
llvm-svn: 347257
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.
This can be more efficient than using pred_size(), which is a linear
time operation.
We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).
With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.
See llvm.org/PR39702 for a harder but more general way of achieving
similar results.
Differential Revision: https://reviews.llvm.org/D54686
llvm-svn: 347256
|
| |
|
|
|
|
|
|
|
|
| |
operands (PR21207)
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.
Differential Revision: https://reviews.llvm.org/D53478
llvm-svn: 347255
|
| |
|
|
| |
llvm-svn: 347253
|
| |
|
|
| |
llvm-svn: 347252
|
| |
|
|
|
|
|
|
|
|
| |
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.
I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.
Differential Revision: https://reviews.llvm.org/D54703
llvm-svn: 347251
|
| |
|
|
| |
llvm-svn: 347249
|
| |
|
|
|
|
|
|
|
|
|
|
| |
one we care about
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.
This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.
Differential Revision: https://reviews.llvm.org/D54711
llvm-svn: 347248
|
| |
|
|
|
|
|
|
| |
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.
Differential Revision: https://reviews.llvm.org/D54707
llvm-svn: 347242
|
| |
|
|
|
|
|
| |
* remove unused function
* fix compare
llvm-svn: 347241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.
This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.
Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.
Differential Revision: https://reviews.llvm.org/D54668
llvm-svn: 347240
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This will hold flags specific to subprograms. In the future
we could potentially free up scarce bits in DIFlags by moving
subprogram-specific flags from there to the new flags word.
This patch does not change IR/bitcode formats, that will be
done in a follow-up.
Differential Revision: https://reviews.llvm.org/D54597
llvm-svn: 347239
|
| |
|
|
| |
llvm-svn: 347234
|
| |
|
|
|
|
|
|
|
| |
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-svn: 347231
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347229
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
of this directive anymore.
- Actually constructs the signature in the assembler, and make the
assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
to require less conversions overall.
- Changed 700 or so tests to use it.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D54652
llvm-svn: 347228
|
| |
|
|
| |
llvm-svn: 347227
|
| |
|
|
| |
llvm-svn: 347226
|
| |
|
|
|
|
| |
This reverts commit r347190.
llvm-svn: 347225
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.
Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).
Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54301
llvm-svn: 347221
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.
Fixes PR39653.
Reviewers: Ayal, efriedma
Subscribers: rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D54538
llvm-svn: 347220
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
passes
Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass
run, however condition for issuing it is accumulated among all the runs.
That leads to confusing 'modification' messages as soon as the first modification is done.
Changing condition to be "current pass made modifications", similar to how
it is being done in all other pass managers.
llvm-svn: 347215
|
| |
|
|
| |
llvm-svn: 347212
|
| |
|
|
|
|
| |
This should be extended to handle FP and vectors in follow-up patches.
llvm-svn: 347210
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
|
| |
|
|
| |
llvm-svn: 347207
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54225
llvm-svn: 347192
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
This leaves sinks as being:
- points where the value in the register is being observed, such as
an icmp, switch or store.
- points where value types have to match, such as calls and returns.
- zext are included to ease the transformation and are generally
removed later on.
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.
Differential Revision: https://reviews.llvm.org/D54515
llvm-svn: 347191
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.
This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.
Differential Revision: https://reviews.llvm.org/D52827
llvm-svn: 347190
|
| |
|
|
|
|
|
|
|
|
|
| |
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.
Path by Kristina Bessonova!
Differential Revision: https://reviews.llvm.org/D54623
llvm-svn: 347187
|
| |
|
|
| |
llvm-svn: 347186
|
| |
|
|
|
|
|
|
| |
sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.
llvm-svn: 347185
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.
Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.
Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna
llvm-svn: 347183
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI(). I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.
Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively. Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB. For example,
Function::getEntryBlock, Loop:getExitBlock, etc.
I also fixed one of the comments.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D54669
llvm-svn: 347182
|
| |
|
|
|
|
|
|
| |
extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.
llvm-svn: 347181
|
| |
|
|
|
|
|
|
| |
-x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.
llvm-svn: 347180
|
| |
|
|
|
|
| |
conversions.
llvm-svn: 347177
|
| |
|
|
|
|
|
|
| |
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.
When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.
llvm-svn: 347176
|
| |
|
|
|
|
| |
SSE vector shifts only use the bottom 64-bits of the shift amount vector.
llvm-svn: 347173
|