| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio
If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).
llvm-svn: 306676
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.
Majority of the changes in this patch:
1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.
These changes are target independent and described below.
Changed CFI instructions so that they:
1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal
Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.
Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D18046
llvm-svn: 306529
|
| |
|
|
|
|
| |
Use triple and attribute only for consistency
llvm-svn: 305908
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
llvm-svn: 298928
|
| |
|
|
|
|
|
|
| |
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
|
| |
|
|
|
|
|
|
| |
vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.
llvm-svn: 291415
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Choosing a "cfi" name makes the intend a bit clearer in an assembly dump
and more importantly the assembly dumps are slightly more stable as the
numbers don't move around anymore when unrelated code calls
createTempSymbol() more or less often.
As they are temp labels the name doesn't influence the generated object
code.
Differential Revision: https://reviews.llvm.org/D27244
llvm-svn: 288290
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.
We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.
Reviewers: igorb, delena, Ayal, Farhana, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25652
llvm-svn: 287621
|
| |
|
|
|
|
|
|
|
| |
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
Differential Revision: http://reviews.llvm.org/D23347
llvm-svn: 278623
|
| |
|
|
|
|
|
|
|
|
|
| |
integer values
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".
Differential Revision: https://reviews.llvm.org/D23247
llvm-svn: 277958
|
| |
|
|
|
|
| |
vectors.
llvm-svn: 275045
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
An identity COPY like this:
%AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.
Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).
llvm-svn: 274952
|
| |
|
|
|
|
| |
instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions.
llvm-svn: 272249
|
| |
|
|
|
|
|
|
|
|
| |
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.
Differential Revision: http://reviews.llvm.org/D18737
llvm-svn: 265259
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
splat vectors
The code change is simple enough: instead of attaching an anonymous SDLoc to splatted
vector constants, use the scalar constant's existing SDLoc since that is what is passed
into getConstant() as a param. But this changes instruction scheduling, so I'll explain
why that happens.
The motivation for this patch starts near:
http://reviews.llvm.org/rL258833
...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'.
But when I made that change locally, several x86 codegen tests wiggled.
It turns out that the lack of SDLoc consistency in getConstant() changes the way
ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG
scheduler algorithms use IROrder for tie-breaking.
Differential Revision: http://reviews.llvm.org/D16972
llvm-svn: 260582
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16813
llvm-svn: 260024
|
| |
|
|
|
|
|
|
| |
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.
Differential Revision: http://reviews.llvm.org/D16531
llvm-svn: 259044
|
| |
|
|
|
|
|
|
| |
Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD.
Differential Revision: http://reviews.llvm.org/D16316
llvm-svn: 258398
|
| |
|
|
|
|
|
|
| |
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly.
Differential Revision: http://reviews.llvm.org/D16050
llvm-svn: 258081
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
instruction.
code example , previous implementation.
movzbl %dil, %eax
kmovw %eax, %k0
new code
kmovw %edi, %k0
Differential Revision: http://reviews.llvm.org/D16287
llvm-svn: 258045
|
| |
|
|
| |
llvm-svn: 258013
|
| |
|
|
|
|
|
|
|
| |
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.
Differential Revision: http://reviews.llvm.org/D15675
llvm-svn: 256470
|
| |
|
|
|
|
|
|
| |
shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648
llvm-svn: 250400
|
|
|
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way:
mask-2-simd , shuffle simd , simd-2-mask
Differential Revision: http://reviews.llvm.org/D12727
llvm-svn: 247876
|