| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 294857
|
|
|
|
| |
llvm-svn: 294847
|
|
|
|
| |
llvm-svn: 294830
|
|
|
|
| |
llvm-svn: 294827
|
|
|
|
|
|
|
|
|
|
|
|
| |
is available.
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.
This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.
Overall I think this produces better results in the modified test cases.
llvm-svn: 294824
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Differential Revision: https://reviews.llvm.org/D28621
llvm-svn: 294634
|
|
|
|
|
|
|
|
|
| |
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).
Reverting until we figure out the proper solution.
llvm-svn: 294348
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 294287
|
|
|
|
|
|
| |
This adds the masked versions of everything, but the shift by immediate instructions.
llvm-svn: 294286
|
|
|
|
|
|
|
|
| |
This includes unmasked forms of variable shift and shifting by the lower element of a register.
Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.
llvm-svn: 294285
|
|
|
|
| |
llvm-svn: 294170
|
|
|
|
| |
llvm-svn: 294169
|
|
|
|
| |
llvm-svn: 294168
|
|
|
|
| |
llvm-svn: 294164
|
|
|
|
|
|
| |
load folding tables.
llvm-svn: 294163
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 294153
|
|
|
|
|
|
| |
load folding tables.
llvm-svn: 294152
|
|
|
|
|
|
| |
isNonFoldablePartialRegisterLoad to improve load folding of scalar loads.
llvm-svn: 294151
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
|
|
|
|
|
|
| |
tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
|
|
|
|
|
|
| |
AVX-512 already did this for the EVEX version.
llvm-svn: 293607
|
|
|
|
|
|
| |
we can convert 256-bit movnt instructions.
llvm-svn: 293606
|
|
|
|
|
|
|
|
| |
broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.
llvm-svn: 293464
|
|
|
|
|
|
| |
KSET0W/KSET1W for v8i1.
llvm-svn: 293458
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions with blendm instructions when its beneficial.
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.
This also picks up cases where the masked move was created due to a masked load intrinsic.
Differential Revision: https://reviews.llvm.org/D28454
llvm-svn: 292005
|
|
|
|
|
|
|
|
|
|
| |
AVX512_128_SET0 expansion to make this possible.
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.
This makes it possible for the register allocator to have all 32 registers available to work with.
llvm-svn: 292004
|
|
|
|
|
|
|
|
|
|
|
| |
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
|
|
|
|
|
|
|
|
| |
table. The unmasked versions read memory from operand 2, but were in the operand 3 table.
These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch.
llvm-svn: 291885
|
|
|
|
|
|
| |
grouping. NFC
llvm-svn: 291884
|
|
|
|
|
|
|
|
| |
vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.
llvm-svn: 291415
|
|
|
|
|
|
| |
size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 290591
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The expression for computing the return value of getMemOpBaseRegImmOfs has only
one possible value. The other value would result in a return earlier in the
function. This patch replaces the expression with its only possible value.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27437
llvm-svn: 290133
|
|
|
|
|
|
|
|
| |
DQI and VLX instructions are available.
This can give the register allocator more registers to use.
llvm-svn: 290057
|
|
|
|
|
|
| |
As discussed on D27692
llvm-svn: 289834
|
|
|
|
|
|
| |
Avoid duplicating instructions in the int32/int64 domains.
llvm-svn: 289830
|
|
|
|
|
|
|
|
| |
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:
Support for folding multiple operands at once which reference the same load
Support for folding multiple loads into a single instruction
Walk all the operands of the instruction for varidic instructions (this is a bug fix!)
Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.
Differential Revision: https://reviews.llvm.org/D24103
llvm-svn: 289510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.
As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.
Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.
Reviewers: spatel, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27611
llvm-svn: 289419
|
|
|
|
| |
llvm-svn: 289186
|
|
|
|
| |
llvm-svn: 289173
|
|
|
|
|
|
|
|
|
|
|
| |
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.
This fixes PR31271.
Differential Revision: https://reviews.llvm.org/D27481
llvm-svn: 288964
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.
I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.
I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.
Reviewers: spatel, delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27401
llvm-svn: 288771
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.
This fixes PR31143.
Differential Revision: https://reviews.llvm.org/D27323
llvm-svn: 288703
|
|
|
|
|
|
| |
VPERMPDZri to the correct table.
llvm-svn: 288591
|
|
|
|
| |
llvm-svn: 288587
|
|
|
|
|
|
| |
tables.
llvm-svn: 288484
|
|
|
|
| |
llvm-svn: 288482
|
|
|
|
| |
llvm-svn: 288481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.
This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().
Differential Revision: https://reviews.llvm.org/D26986
llvm-svn: 288057
|