| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.
Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.
llvm-svn: 322254
|
| |
|
|
|
|
|
|
|
|
| |
the MCAsmBackend constructor
After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.
Differential Revision: https://reviews.llvm.org/D41721
llvm-svn: 322227
|
| |
|
|
|
|
|
|
|
|
| |
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.
Most of this patch is just making sure we copy the scale around everywhere.
Differential Revision: https://reviews.llvm.org/D40055
llvm-svn: 322210
|
| |
|
|
| |
llvm-svn: 322195
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41862
llvm-svn: 322106
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.
But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.
This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.
Differential Revision: https://reviews.llvm.org/D41850
llvm-svn: 322101
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking).
According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch).
The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches).
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Differential Revision: https://reviews.llvm.org/D40482
Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709
llvm-svn: 322062
|
| |
|
|
|
|
| |
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.
llvm-svn: 322060
|
| |
|
|
|
|
|
|
| |
slt X, 0)
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.
llvm-svn: 322050
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patterns.
The pattern was this
def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))),
(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>;
but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit).
So we can just let isel do the two patterns naturally.
llvm-svn: 322049
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.
Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if
- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block
If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.
The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say
%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy
are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
llvm-svn: 322048
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
(Target)FrameLowering::determineCalleeSaves can be called multiple
times. I don't think it should have side-effects as creating stack
objects and setting global MachineFunctionInfo state as it is doing
today (in other back-ends as well).
This moves the creation of stack objects from determineCalleeSaves to
assignCalleeSavedSpillSlots.
Differential Revision: https://reviews.llvm.org/D41703
llvm-svn: 321987
|
| |
|
|
|
|
| |
CVT2MASK is just checking the sign bit which can be represented with a comparison with zero.
llvm-svn: 321985
|
| |
|
|
|
|
| |
128/256-bit compares when VLX is not available.
llvm-svn: 321984
|
| |
|
|
|
|
| |
ability to constant fold vector SIGN_EXTEND.
llvm-svn: 321979
|
| |
|
|
| |
llvm-svn: 321978
|
| |
|
|
|
|
| |
I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change.
llvm-svn: 321971
|
| |
|
|
|
|
|
|
| |
SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL.
v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated.
llvm-svn: 321968
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
|
| |
|
|
| |
llvm-svn: 321958
|
| |
|
|
|
|
| |
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.
llvm-svn: 321956
|
| |
|
|
|
|
| |
folding table.
llvm-svn: 321955
|
| |
|
|
|
|
|
|
| |
folding tables.
EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent.
llvm-svn: 321954
|
| |
|
|
| |
llvm-svn: 321953
|
| |
|
|
| |
llvm-svn: 321952
|
| |
|
|
| |
llvm-svn: 321951
|
| |
|
|
|
|
|
|
| |
versions of cvtps2ph to the store folding tables.
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
llvm-svn: 321950
|
| |
|
|
| |
llvm-svn: 321949
|
| |
|
|
|
|
|
|
| |
We don't do fine grained feature control like this on features prior to AVX512.
We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.
llvm-svn: 321947
|
| |
|
|
|
|
|
|
| |
matcher table.
This is also needed to fix PR35837.
llvm-svn: 321946
|
| |
|
|
| |
llvm-svn: 321945
|
| |
|
|
|
|
|
|
| |
isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead.
For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well.
llvm-svn: 321944
|
| |
|
|
|
|
|
|
|
|
| |
assembler matcher table
We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version.
Fixes PR35837.
llvm-svn: 321939
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
llvm-svn: 321934
|
| |
|
|
|
|
| |
This makes the names consistent with the mnemonics like every other instruction.
llvm-svn: 321931
|
| |
|
|
|
|
| |
we don't crash when trying to print an error message using it.
llvm-svn: 321930
|
| |
|
|
|
|
| |
lowerV4I64VectorShuffle.
llvm-svn: 321929
|
| |
|
|
|
|
|
|
| |
instructions.
This matches their VEX equivalents.
llvm-svn: 321912
|
| |
|
|
|
|
|
|
|
|
| |
instructions as well.
Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16"
This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512.
llvm-svn: 321903
|
| |
|
|
|
|
|
|
| |
'movq' instead.
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.
llvm-svn: 321898
|
| |
|
|
|
|
| |
of integer.
llvm-svn: 321821
|
| |
|
|
| |
llvm-svn: 321755
|
| |
|
|
|
|
|
|
|
|
| |
This custom inserter was added in r124272 at which time it added about bunch of Defs for Win64. In r150708, those defs were removed leaving only the "return BB". So I think this means the custom inserter is a NOP these days.
This patch removes the remaining code and stops tagging the instructions for custom insertion
Differential Revision: https://reviews.llvm.org/D41671
llvm-svn: 321747
|
| |
|
|
|
|
|
|
| |
Currently we use SIGN_EXTEND in lowerMasksToReg as part of calling convention setup, but we don't require a specific value for the upper bits.
This patch changes it to ANY_EXTEND which will be lowered as SIGN_EXTEND if it ends up sticking around.
llvm-svn: 321746
|
| |
|
|
|
|
|
| |
Besides the unsightly print-out, it was causing some buildbots to fail,
e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/9311
llvm-svn: 321711
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40524
Change-Id: Ie3a405b28503ceae999f5f3ba07a68fa733a2400
llvm-svn: 321674
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR33325)
This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion
for x86 to use 2 pairs of loads per block.
The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern
with oversized integer compares, so we want to transform these into x86-specific vector
nodes before legalization splits things into scalar chunks.
See PR33325 for more details:
https://bugs.llvm.org/show_bug.cgi?id=33325
Differential Revision: https://reviews.llvm.org/D41618
llvm-svn: 321656
|
| |
|
|
| |
llvm-svn: 321644
|
| |
|
|
| |
llvm-svn: 321632
|