| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
visitINSERT_SUBVECTOR (PR37989)
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.
llvm-svn: 347313
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.
Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX
and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.
Differential Revision: https://reviews.llvm.org/D54700
llvm-svn: 347311
|
| |
|
|
| |
llvm-svn: 347308
|
| |
|
|
|
|
|
|
| |
instructions.
Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.
llvm-svn: 347303
|
| |
|
|
|
|
|
|
| |
SM_SentinelZero
Noticed while working on improving demanded elts target shuffle shuffle combining
llvm-svn: 347302
|
| |
|
|
|
|
|
|
|
|
| |
For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.
I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.
Differential Revision: https://reviews.llvm.org/D54679
llvm-svn: 347301
|
| |
|
|
|
|
| |
As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.
llvm-svn: 347300
|
| |
|
|
|
|
|
|
| |
instructions.
As discussed on rL347240.
llvm-svn: 347299
|
| |
|
|
|
|
|
|
|
|
|
|
| |
where all the even or odd elements are undef.
Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.
As near as I can tell this makes v16i8 behavior consistent with every other VT now.
This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.
llvm-svn: 347296
|
| |
|
|
|
|
| |
This helps with a future patch and makes us less reliant on DAG combine merging shuffles.
llvm-svn: 347295
|
| |
|
|
|
|
|
|
| |
getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.
The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.
llvm-svn: 347290
|
| |
|
|
|
|
|
|
|
|
|
| |
and switches"
The initial version of patch lacked Phi nodes updates in destinations of removed
edges. This version contains this update and tests on this situation.
Differential Revision: https://reviews.llvm.org/D54021
llvm-svn: 347289
|
| |
|
|
|
|
|
|
|
|
| |
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.
llvm-svn: 347288
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.
This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362
Reviewers: spatel, efriedma, RKSimon, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D54725
llvm-svn: 347287
|
| |
|
|
|
|
|
|
|
|
| |
If args is empty then accesing element 0 is illegal.
https://reviews.llvm.org/D53556
Patch by Eugene Sharygin. Thanks Eugene!
llvm-svn: 347281
|
| |
|
|
| |
llvm-svn: 347278
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54734
llvm-svn: 347277
|
| |
|
|
|
|
|
|
| |
PointerAttributes is a bitwise-or of several other fields, each of
which is already printed on its own line with a better explanation.
So this doesn't really help much.
llvm-svn: 347275
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D54728
llvm-svn: 347274
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Put 'static' on three functions in an anonymous namespace as per our
coding style.
Remove the 'namespace llvm {}' around the .cpp file and explicitly
declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'.
I prefer this style for free functions because the compiler will error
out if the .h and .cpp files don't agree on the function name or
prototype.
llvm-svn: 347269
|
| |
|
|
|
|
| |
Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.
llvm-svn: 347268
|
| |
|
|
|
|
|
|
| |
GFX9 should select opsel version.
Differential Revision: https://reviews.llvm.org/D54545
llvm-svn: 347265
|
| |
|
|
|
|
|
|
| |
and switches"
This reverts commits r347183 & r347184. Crashes while building libxml.
llvm-svn: 347260
|
| |
|
|
|
|
|
|
|
| |
This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.
Differential Revision: https://reviews.llvm.org/D54718
llvm-svn: 347259
|
| |
|
|
|
|
|
|
|
| |
Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi
improves backtrace quality.
Fixes llvm.org/PR38083.
llvm-svn: 347257
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.
This can be more efficient than using pred_size(), which is a linear
time operation.
We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).
With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.
See llvm.org/PR39702 for a harder but more general way of achieving
similar results.
Differential Revision: https://reviews.llvm.org/D54686
llvm-svn: 347256
|
| |
|
|
|
|
|
|
|
|
| |
operands (PR21207)
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.
Differential Revision: https://reviews.llvm.org/D53478
llvm-svn: 347255
|
| |
|
|
| |
llvm-svn: 347253
|
| |
|
|
| |
llvm-svn: 347252
|
| |
|
|
|
|
|
|
|
|
| |
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.
I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.
Differential Revision: https://reviews.llvm.org/D54703
llvm-svn: 347251
|
| |
|
|
| |
llvm-svn: 347249
|
| |
|
|
|
|
|
|
|
|
|
|
| |
one we care about
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.
This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.
Differential Revision: https://reviews.llvm.org/D54711
llvm-svn: 347248
|
| |
|
|
|
|
|
|
| |
SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction.
Differential Revision: https://reviews.llvm.org/D54707
llvm-svn: 347242
|
| |
|
|
|
|
|
| |
* remove unused function
* fix compare
llvm-svn: 347241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-avx2 algorithm to each 128-bit lane.
Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together.
This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb.
Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split.
Differential Revision: https://reviews.llvm.org/D54668
llvm-svn: 347240
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This will hold flags specific to subprograms. In the future
we could potentially free up scarce bits in DIFlags by moving
subprogram-specific flags from there to the new flags word.
This patch does not change IR/bitcode formats, that will be
done in a follow-up.
Differential Revision: https://reviews.llvm.org/D54597
llvm-svn: 347239
|
| |
|
|
| |
llvm-svn: 347234
|
| |
|
|
|
|
|
|
|
| |
This allows to avoid scratch use or indirect VGPR addressing for
small vectors.
Differential Revision: https://reviews.llvm.org/D54606
llvm-svn: 347231
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347229
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This makes it easier/cleaner to generate a single signature from
this directive. Also:
- Adds the symbol name, such that we don't depend on the location
of this directive anymore.
- Actually constructs the signature in the assembler, and make the
assembler own it.
- Refactor the use of MVT vs ValType in the streamer and assembler
to require less conversions overall.
- Changed 700 or so tests to use it.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, eraman, aheejin, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D54652
llvm-svn: 347228
|
| |
|
|
| |
llvm-svn: 347227
|
| |
|
|
| |
llvm-svn: 347226
|
| |
|
|
|
|
| |
This reverts commit r347190.
llvm-svn: 347225
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the
TM. However, this means that overridden target features are not detected and can
result in incorrect behaviour.
Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere
in the same function).
Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54301
llvm-svn: 347221
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.
Fixes PR39653.
Reviewers: Ayal, efriedma
Subscribers: rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D54538
llvm-svn: 347220
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
passes
Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass
run, however condition for issuing it is accumulated among all the runs.
That leads to confusing 'modification' messages as soon as the first modification is done.
Changing condition to be "current pass made modifications", similar to how
it is being done in all other pass managers.
llvm-svn: 347215
|
| |
|
|
| |
llvm-svn: 347212
|
| |
|
|
|
|
| |
This should be extended to handle FP and vectors in follow-up patches.
llvm-svn: 347210
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
|
| |
|
|
| |
llvm-svn: 347207
|