| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 373503
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
of bits that might be undef
The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified.
Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening.
This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR.
Differential Revision: https://reviews.llvm.org/D68311
llvm-svn: 373495
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
64-bit WebAssembly (wasm64) is not specified and not supported in the
WebAssembly backend. We do have support for it in clang, however, and
we would like to keep that support because we expect wasm64 to be
specified and supported in the future. For now add an error when
trying to use wasm64 from the backend to minimize user confusion from
unexplained crashes.
Reviewers: aheejin, dschuff, sunfish
Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68254
llvm-svn: 373493
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.
Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.
The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021
Reviewers: c-rhodes, sdesmalen, rovka
Reviewed By: c-rhodes
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68023
llvm-svn: 373468
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.
This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.
Differential Revision: https://reviews.llvm.org/D68241
llvm-svn: 373446
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Printf lowering unconditionally visited every instruction in the module.
To make it faster in the common case where there are no printfs, look up
the printf function (if any) and iterate over its users instead.
Reviewers: rampitec, kzhuravl, alex-t, arsenm
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68145
llvm-svn: 373433
|
| |
|
|
|
|
|
|
|
| |
These patterns use zmm registers for 128/256-bit compares when
the VLX instructions aren't available. Previously we only
supported registers, but as PR36191 notes we can fold broadcast
loads, but not regular loads.
llvm-svn: 373423
|
| |
|
|
| |
llvm-svn: 373417
|
| |
|
|
|
|
|
|
|
| |
In principle this should behave as any other constant. However
eliminateFrameIndex currently assumes a VALU use and uses a vector
shift. Work around this by selecting to VGPR for now until
eliminateFrameIndex is fixed.
llvm-svn: 373415
|
| |
|
|
| |
llvm-svn: 373414
|
| |
|
|
|
|
| |
This will be needed to support AGPR operations.
llvm-svn: 373413
|
| |
|
|
| |
llvm-svn: 373412
|
| |
|
|
|
|
|
|
|
| |
Account and report agprs separately on gfx908. Other targets
do not change the reporting.
Differential Revision: https://reviews.llvm.org/D68307
llvm-svn: 373411
|
| |
|
|
|
|
|
|
|
|
|
|
| |
constant with sufficient sign bits to fit in vXi32
The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size.
I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold.
Differential Revision: https://reviews.llvm.org/D68247
llvm-svn: 373408
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers:
arsenm
Differential Revision:
https://reviews.llvm.org/D67574
llvm-svn: 373404
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ops."
This seems to be causing some performance regresions that I'm
trying to investigate.
One thing that stands out is that this transform can increase
the live range of the operands of the earlier logic op. This
can be bad for register allocation. If there are two logic
op inputs we should really combine the one that is closest, but
SelectionDAG doesn't have a good way to do that. Maybe we need
to do this as a basic block transform in Machine IR.
llvm-svn: 373401
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
an immediate before calling getImm().
It might be a symbol instead. We can't fold those since we can't
negate them.
Similar for other SUB with immediates.
Fixes PR43529.
llvm-svn: 373397
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a refactoring that will make future improvements to this pass easier.
This change should not change the behavior of the pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Reviewed By: nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65496
llvm-svn: 373366
|
| |
|
|
|
|
|
|
| |
There are 1024 bit register classes defined for AGPRs. Additionally
OpenCL defines vectors up to 16 x i64, and this helps those tests
legalize.
llvm-svn: 373350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
broadcasted to a vector.
Summary:
This adds the ISD opcode and a DAG combine to create it. There are
probably some places where we can directly create it, but I'll
leave that for future work.
This updates all of the isel patterns to look for this new node.
I had to add a few additional isel patterns for aligned extloads
which we should probably fix with a DAG combine or something. This
does mean that the broadcast load folding for avx512 can no
longer match a broadcasted aligned extload.
There's still some work to do here for combining a broadcast of
a broadcast_load. We also need to improve extractelement or
demanded vector elements of a broadcast_load. I'll try to get
those done before I submit this patch.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68198
llvm-svn: 373349
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is cut and pasted from the corresponding GenericScheduler
functions.
Reviewers: arsenm, atrick, tstellar, vpykhtin
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68264
llvm-svn: 373346
|
| |
|
|
|
|
|
|
|
|
| |
the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions.
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.
llvm-svn: 373303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In CFGSort, we try to make EH pads have higher priorities as soon as
they are ready to be sorted, to prevent creation of unwind destination
mismatches in CFGStackify. We did that by making priority queues'
comparison function prefer EH pads, but it was possible for an EH pad
to be popped from `Preferred` queue and then not sorted immediately and
enter `Ready` queue instead in a certain condition. This patch makes
sure that special condition does not consider EH pads as its candidates.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68229
llvm-svn: 373302
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fixing unwind mismatches for exception handling can result in splicing
existing BBs and moving some of instructions to new BBs. In this case
some of stackified def registers in the original BB can be used in the
split BB. For example, we have this BB and suppose %r0 is a stackified
register.
```
bb.1:
%r0 = call @foo
... use %r0 ...
```
After fixing unwind mismatches in CFGStackify, `bb.1` can be split and
some instructions can be moved to a newly created BB:
```
bb.1:
%r0 = call @foo
bb.split (new):
... use %r0 ...
```
In this case we should make %r0 un-stackified, because its use is now in
another BB.
When spliting a BB, this CL unstackifies all def registers that have
uses in the new split BB.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68218
llvm-svn: 373301
|
| |
|
|
| |
llvm-svn: 373298
|
| |
|
|
|
|
|
| |
TThe existing wave32 behavior seems broken and incomplete, but this
reproduces it.
llvm-svn: 373296
|
| |
|
|
| |
llvm-svn: 373295
|
| |
|
|
|
|
|
|
| |
This is sort of papering over the fact that we don't run a combiner
anywhere, but avoiding creating 2 instructions in the first place is
easy.
llvm-svn: 373293
|
| |
|
|
|
|
|
|
|
|
|
| |
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
|
| |
|
|
| |
llvm-svn: 373288
|
| |
|
|
|
|
| |
Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU.
llvm-svn: 373287
|
| |
|
|
|
|
|
| |
Handle other cases besides LDS. Mostly a straight port of the existing
handling, without the intermediate custom nodes.
llvm-svn: 373286
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The BasicBlockManager is potentially broken and should not be used.
Replace all uses of the BasicBlockPass with a FunctionBlockPass+loop on
blocks.
Reviewers: chandlerc
Subscribers: jholewinski, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68234
llvm-svn: 373254
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
forming a SELECT.
The i1 scalar would have been type legalized to i8, but that
doesn't guarantee anything about the upper bits. If we're going
to use it as condition we need to make sure the upper bits are 0.
I've special cased ISD::SETCC conditions since that should
guarantee zero upper bits. We could go further and use
computeKnownBits, but we have no tests that would need that.
Fixes PR43507.
llvm-svn: 373246
|
| |
|
|
|
|
| |
See https://reviews.llvm.org/D68167
llvm-svn: 373245
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.
Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm
Reviewed By: arsenm, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D64183
llvm-svn: 373240
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
default handling.
ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends
from v8i8. But the type legalization infrastructure will call
ReplaceNodeResults for v8i8 results. We should just defer it the
default handling instead of asserting in the default of the switch.
Fixes PR43509.
llvm-svn: 373234
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following two intrinsics:
- int_aarch64_sve_punpkhi
- int_aarch64_sve_punpklo
This patch also contains a fix which allows LLVMHalfElementsVectorType
to forward reference overloadable arguments.
Reviewers: sdesmalen, rovka, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, greened, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67830
llvm-svn: 373232
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for lowering variadic musttail calls. To do this, we have
to...
- Detect a musttail call in a variadic function before attempting to lower the
call's formal arguments. This is done in the IRTranslator.
- Compute forwarded registers in `lowerFormalArguments`, and add copies for
those registers.
- Restore the forwarded registers in `lowerTailCall`.
Because there doesn't seem to be any nice way to wrap these up into the outgoing
argument handler, the restore code in `lowerTailCall` is done separately.
Also, irritatingly, you have to make sure that the registers don't overlap with
any passed parameters. Otherwise, the scheduler doesn't know what to do with the
extra copies and asserts.
Add call-translator-variadic-musttail.ll to test this. This is pretty much the
same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to
base this off of, but the idea is the same.
Differential Revision: https://reviews.llvm.org/D68043
llvm-svn: 373226
|
| |
|
|
| |
llvm-svn: 373225
|
| |
|
|
|
|
|
|
| |
Reviewers: rampitec
Differential Revision: https://reviews.llvm.org/D67662
llvm-svn: 373221
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68141
llvm-svn: 373207
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The VCTP instruction will calculate the predicate masked based upon
the number of elements that need to be processed. I had inserted the
sub before the vctp intrinsic and supplied it as the operand, but
this is incorrect as the phi should directly feed the vctp. The sub
is calculating the value for the next iteration.
Differential Revision: https://reviews.llvm.org/D67921
llvm-svn: 373188
|
| |
|
|
|
|
|
|
|
|
|
|
| |
As we perform a zext on any arguments used in the promoted tree, it
doesn't matter if they're marked as signext. The only permitted
user(s) in the tree which would interpret the sign bits are signed
icmps. For these instructions, their promoted operands are truncated
before the icmp uses them.
Differential Revision: https://reviews.llvm.org/D68019
llvm-svn: 373186
|
| |
|
|
|
|
|
|
| |
SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA
pseudo pass also at -O0, so it should be added in addPostRegAlloc().
Review: Ulrich Weigand
llvm-svn: 373182
|
| |
|
|
|
|
|
| |
These are all also implemented in avx512_logical_lowering_types
with support for masking.
llvm-svn: 373181
|
| |
|
|
| |
llvm-svn: 373180
|
| |
|
|
|
|
| |
enable the use of vpshufb on the 256-bit halves.
llvm-svn: 373177
|
| |
|
|
| |
llvm-svn: 373175
|