| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction.  That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not.  This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
llvm-svn: 250324
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Differential Revision: http://reviews.llvm.org/D13698
llvm-svn: 250285
 | 
| | 
| 
| 
|  | 
llvm-svn: 250268
 | 
| | 
| 
| 
|  | 
llvm-svn: 250233
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one.  Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.
I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators).  Otherwise I would have just
added the conversion.
Even with that, no there should be functionality change here.
llvm-svn: 250218
 | 
| | 
| 
| 
|  | 
llvm-svn: 250216
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This fixes an assert in AArch64AsmParser::MatchAndEmitInstruction.
rdar://problem/23081753
llvm-svn: 250207
 | 
| | 
| 
| 
|  | 
llvm-svn: 250174
 | 
| | 
| 
| 
|  | 
llvm-svn: 250162
 | 
| | 
| 
| 
|  | 
llvm-svn: 250154
 | 
| | 
| 
| 
|  | 
llvm-svn: 250151
 | 
| | 
| 
| 
|  | 
llvm-svn: 250148
 | 
| | 
| 
| 
| 
| 
|  | 
parser will check the size.
llvm-svn: 250147
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Reviewers: rsmith, rtrieu
Subscribers: llvm-commits, reames, morisset
Differential Revision: http://reviews.llvm.org/D13680
llvm-svn: 250135
 | 
| | 
| 
| 
| 
| 
| 
|  | 
It should now correctly handle physical registers and make
it easier to identify the other direction.
llvm-svn: 250132
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.
llvm-svn: 250088
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer
The initial selection dag for the code above is:
v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch
  where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
Differential Revision: http://reviews.llvm.org/D13660
llvm-svn: 250085
 | 
| | 
| 
| 
|  | 
llvm-svn: 250075
 | 
| | 
| 
| 
|  | 
llvm-svn: 250071
 | 
| | 
| 
| 
|  | 
llvm-svn: 250059
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
addu.qb implementation
Differential Revision: http://reviews.llvm.org/D12798
llvm-svn: 250058
 | 
| | 
| 
| 
|  | 
llvm-svn: 250053
 | 
| | 
| 
| 
|  | 
llvm-svn: 250049
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
register belongs.
Summary: Fixes PR24915.
Reviewers: vkalintiris
Subscribers: emaste, seanbruno, llvm-commits
Differential Revision: http://reviews.llvm.org/D13533
llvm-svn: 250042
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13591
llvm-svn: 250040
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.
Reviewers: vkalintiris
Subscribers: llvm-commits, zoran.jovanovic, petarj
Differential Revision: http://reviews.llvm.org/D13467
llvm-svn: 250039
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The Swift Machine Scheduler Model is incomplete. There are instructions
missing which can trigger the "incomplete machine model" abort. This was
observed when a downstream SchedMachineModel was added to the ARM
target.
Patch by Christof Douma!
llvm-svn: 250033
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)
Differential Revision: http://reviews.llvm.org/D13012
llvm-svn: 250029
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.
We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.
The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.
Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.
With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.
Differential Revision: http://reviews.llvm.org/D13364
llvm-svn: 250027
 | 
| | 
| 
| 
|  | 
llvm-svn: 250026
 | 
| | 
| 
| 
| 
| 
|  | 
instructions. This way the assembler will perform range checking. Believe this matches gas behavior.
llvm-svn: 250016
 | 
| | 
| 
| 
| 
| 
|  | 
%xmmX, %xmmX encoding if it would be a shorter VEX encoding.
llvm-svn: 250014
 | 
| | 
| 
| 
|  | 
llvm-svn: 250013
 | 
| | 
| 
| 
| 
| 
|  | 
parser will check the size.
llvm-svn: 250012
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax.
This allows us to remove the explicit code for working around the existing priority
llvm-svn: 250011
 | 
| | 
| 
| 
| 
| 
|  | 
al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel.
llvm-svn: 249994
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.
llvm-svn: 249989
 | 
| | 
| 
| 
|  | 
llvm-svn: 249979
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).
llvm-svn: 249976
 | 
| | 
| 
| 
|  | 
llvm-svn: 249952
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.
SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.
Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.
Reviewed by Ulrich Weigand.
llvm-svn: 249945
 | 
| | 
| 
| 
|  | 
llvm-svn: 249941
 | 
| | 
| 
| 
|  | 
llvm-svn: 249940
 | 
| | 
| 
| 
| 
| 
|  | 
wineh-parent is dead, so is ValueOrMBB.
llvm-svn: 249920
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
llvm-svn: 249918
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.
llvm-svn: 249910
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.
The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.
llvm-svn: 249908
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.
However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.
Fix, and add trivial test.
llvm-svn: 249891
 | 
| | 
| 
| 
|  | 
llvm-svn: 249859
 |