| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
It isn't generally safe to fold the frame index
directly into the operand since it will possibly
not be an inline immediate after it is expanded.
This surprisingly seems to produce better code, since
the FI doesn't prevent folding other immediate operands.
llvm-svn: 288185
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Change the logic for when to fold immediates to
consider the destination operand rather than the
source of the materializing mov instruction.
No change yet, but this will allow for correctly handling
i16/f16 operands. Since 32-bit moves are used to materialize
constants for these, the same bitvalue will not be in the
register.
llvm-svn: 288184
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In AArch64InstrInfo::foldMemoryOperandImpl, catch more cases where the
COPY being spilled is copying from WZR/XZR, but the source register is
not in the COPY destination register's regclass.
For example, when spilling:
%vreg0 = COPY %XZR ; %vreg0:GPR64common
without this change, the code in TargetInstrInfo::foldMemoryOperand()
and canFoldCopy() that normally handles cases like this would fail to
optimize since %XZR is not in GPR64common. So the spill code generated
would be:
%vreg0 = COPY %XZR
STR %vreg
instead of the new code generated:
STR %XZR
Reviewers: qcolombet, MatzeB
Subscribers: mcrosier, aemerson, t.p.northover, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26976
llvm-svn: 288176
|
| |
|
|
| |
llvm-svn: 288170
|
| |
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D25912
This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.
llvm-svn: 288152
|
| |
|
|
|
|
| |
understandable. NFCI.
llvm-svn: 288147
|
| |
|
|
|
|
| |
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.
llvm-svn: 288140
|
| |
|
|
| |
llvm-svn: 288138
|
| |
|
|
|
|
|
|
|
|
|
|
| |
branches
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23417
llvm-svn: 288095
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.
This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().
Differential Revision: https://reviews.llvm.org/D26986
llvm-svn: 288057
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp
is implemented. Additional side effect of this is that we may consume less VGPRs
at a cost of more SGPRs in case if holding of multiple conditions is needed, and
that is a clear win in most cases.
Differential Revision: https://reviews.llvm.org/D26114
llvm-svn: 288053
|
| |
|
|
| |
llvm-svn: 288049
|
| |
|
|
| |
llvm-svn: 288045
|
| |
|
|
|
|
|
|
| |
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.
Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.
llvm-svn: 288040
|
| |
|
|
| |
llvm-svn: 288036
|
| |
|
|
|
|
|
| |
Remove unused variable that came in due to a copy-and-paste bug
and caused build bot failures.
llvm-svn: 288033
|
| |
|
|
|
|
|
|
|
| |
This adds assembler support for the instructions provided by the
execution-hint facility (NIAI and BP(R)P). This required adding
support for the new relocation types for 12-bit and 24-bit PC-
relative offsets used by the BP(R)P instructions.
llvm-svn: 288031
|
| |
|
|
|
|
|
| |
This adds support for the instructions provided with the
load-and-trap facility.
llvm-svn: 288030
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds assembler support for the remaining branch instructions:
the non-relative branch on count variants, and all variants of branch
on index.
The only one of those that can be readily exploited for code generation
is BRCTH (branch on count using a high 32-bit register as count). Do
use it, however, it is necessary to also introduce a hew CHIMux pseudo
to allow comparisons of a 32-bit value agains a short immediate to go
into a high register as well (implemented via CHI/CIH).
This causes a bit of codegen changes overall, but those have proven to
be neutral (or even beneficial) in performance measurements.
llvm-svn: 288029
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves formation of LOC-type instructions from (late)
IfConversion to the early if-conversion pass, and in some cases
additionally creates them directly from select instructions
during DAG instruction selection.
To make early if-conversion work, the patch implements the
canInsertSelect / insertSelect callbacks. It also implements
the commuteInstructionImpl and FoldImmediate callbacks to
enable generation of the full range of LOC instructions.
Finally, the patch adds support for all instructions of the
load-store-on-condition-2 facility, which allows using LOC
instructions also for high registers.
Due to the use of the GRX32 register class to enable high registers,
we now also have to handle the cases where there are still no single
hardware instructions (conditional move from a low register to a high
register or vice versa). These are converted back to a branch sequence
after register allocation. Since the expandRAPseudos callback is not
allowed to create new basic blocks, this requires a simple new pass,
modelled after the ARM/AArch64 ExpandPseudos pass.
Overall, this patch causes significantly more LOC-type instructions
to be used, and results in a measurable performance improvement.
llvm-svn: 288028
|
| |
|
|
|
|
| |
commutable as operand 0 should pass its upper bits through to the output.
llvm-svn: 288011
|
| |
|
|
|
|
| |
patterns.
llvm-svn: 288010
|
| |
|
|
| |
llvm-svn: 288009
|
| |
|
|
|
|
| |
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.
llvm-svn: 288007
|
| |
|
|
|
|
| |
PSLL/PSRL bit shifts
llvm-svn: 288006
|
| |
|
|
| |
llvm-svn: 288004
|
| |
|
|
|
|
| |
Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit).
llvm-svn: 288003
|
| |
|
|
|
|
|
|
|
|
| |
instruction's load size is smaller than the register size.
If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist.
I probably missed some instructions, but this should be a large portion of them.
llvm-svn: 288001
|
| |
|
|
| |
llvm-svn: 287995
|
| |
|
|
|
|
|
|
| |
instructions that don't have a restriction.
Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load.
llvm-svn: 287991
|
| |
|
|
|
|
| |
IsProfitableToFold.
llvm-svn: 287987
|
| |
|
|
|
|
|
|
| |
X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold.
Previously we were passing the SCALAR_TO_VECTOR node.
llvm-svn: 287986
|
| |
|
|
| |
llvm-svn: 287985
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.
Reviewers: RKSimon, zvi, delena, spatel
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D26790
llvm-svn: 287983
|
| |
|
|
| |
llvm-svn: 287975
|
| |
|
|
|
|
| |
folding tables.
llvm-svn: 287974
|
| |
|
|
|
|
| |
tables.
llvm-svn: 287972
|
| |
|
|
|
|
| |
available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
|
| |
|
|
| |
llvm-svn: 287970
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D26724
llvm-svn: 287962
|
| |
|
|
|
|
| |
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.
llvm-svn: 287961
|
| |
|
|
|
|
|
|
| |
tables for consistency.
Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete.
llvm-svn: 287960
|
| |
|
|
|
|
| |
alphabetical order. This is consistent with the older sections of the table. NFC
llvm-svn: 287956
|
| |
|
|
|
|
| |
suggested as a better solution by Matt
llvm-svn: 287942
|
| |
|
|
| |
llvm-svn: 287940
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.
This patch detects this and changes the element size to match the vselect.
I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27087
llvm-svn: 287939
|
| |
|
|
| |
llvm-svn: 287937
|
| |
|
|
|
|
| |
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.
llvm-svn: 287936
|
| |
|
|
|
|
| |
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.
llvm-svn: 287935
|
| |
|
|
|
|
| |
This reverts commit e834ce5976567575621901fb967b8018b9916d71.
llvm-svn: 287934
|