| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
|  | 
llvm-svn: 295065
 | 
| | 
| 
| 
| 
| 
|  | 
Add support for specifying an UNPCK input as UNDEF
llvm-svn: 295061
 | 
| | 
| 
| 
| 
| 
|  | 
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.
llvm-svn: 295054
 | 
| | 
| 
| 
|  | 
llvm-svn: 295053
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Don't bother setting the V1/V2 operands again for unary shuffles.
Don't bother legalizing the value type unless the match succeeds.
llvm-svn: 295051
 | 
| | 
| 
| 
|  | 
llvm-svn: 295035
 | 
| | 
| 
| 
|  | 
llvm-svn: 295028
 | 
| | 
| 
| 
|  | 
llvm-svn: 295027
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Also, for better uniformity use TargetRegistry::RegisterMCAsmInfo rather than 
RegisterMCAsmInfoFn. Again, no functional change.
llvm-svn: 295026
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
minor fixes (NFC).
Same changes in files affected by reduced MC headers dependencies.
llvm-svn: 295009
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.
Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.
Differential Revision: https://reviews.llvm.org/D29903
llvm-svn: 295004
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.
llvm-svn: 294993
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
I'd missed a creator of FCMP nodes - duplicateCmp().
Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite.
llvm-svn: 294968
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Clean up the implementation of divide macro expansion by getting rid of a
FIXME regarding magic numbers and branch instructions. Match GAS' behaviour
for expansion of ddiv / div in the two and three operand cases. Add the two
operand alias for MIPSR6. Finally, optimize macro expansion cases where the
divisior is the $zero register.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D29887
llvm-svn: 294960
 | 
| | 
| 
| 
|  | 
llvm-svn: 294959
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.
The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.
Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D29650
llvm-svn: 294950
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We match a sequence of 3-4 instructions into a tTBB pseudo. One of our checks is that
a particular register in that sequence is killed (so it can be clobbered by the pseudo).
We weren't noticing if an errant MOV or other instruction had infiltrated the
sequence we were walking. If it had, and it defined the register we've already
identified as killed, it makes it live across the tBR_JT and thus unclobberable.
Notice this case and bail out.
llvm-svn: 294949
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This allows us to use -stop-before/-stop-after/-run-pass - we can now write
.mir tests.
llvm-svn: 294948
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.
It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:
  The relational and equality operators support the usual mathematical
  relationships between numeric values. For any ordered pair of numeric
  values exactly one of relationships the less, greater, equal and is true.
  Relational operators may raise the floating-point exception when argument
  values are NaNs.
The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).
Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.
llvm-svn: 294945
 | 
| | 
| 
| 
| 
| 
|  | 
Currently only used by target shuffle combining - will use it for lowering as well in a future patch.
llvm-svn: 294943
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
consistency between VEX/EVEX versions of the same instruction.
Differential Revision: https://reviews.llvm.org/D29873
llvm-svn: 294937
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
to support 512-bit vectors with 128-bit or 256-bit subvectors.
We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors.
llvm-svn: 294931
 | 
| | 
| 
| 
| 
| 
|  | 
This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist.
llvm-svn: 294929
 | 
| | 
| 
| 
| 
| 
|  | 
convertBitVectorToUnsiged - convertBitVectorToUnsigned
llvm-svn: 294914
 | 
| | 
| 
| 
| 
| 
|  | 
the VEX equivalents as a guide.
llvm-svn: 294908
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
instruction from AVX/SSE.
I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables.
llvm-svn: 294907
 | 
| | 
| 
| 
| 
| 
|  | 
they are stores. AVX-512 version was already named with 'mr'.
llvm-svn: 294906
 | 
| | 
| 
| 
|  | 
llvm-svn: 294905
 | 
| | 
| 
| 
| 
| 
|  | 
The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently.
llvm-svn: 294900
 | 
| | 
| 
| 
| 
| 
|  | 
Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch.
llvm-svn: 294896
 | 
| | 
| 
| 
| 
| 
|  | 
function returned true or undef.
llvm-svn: 294895
 | 
| | 
| 
| 
| 
| 
| 
|  | 
The reference is here: 
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
llvm-svn: 294890
 | 
| | 
| 
| 
| 
| 
|  | 
pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend.
llvm-svn: 294876
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG
Preparatory step for PR31712
llvm-svn: 294874
 | 
| | 
| 
| 
| 
| 
|  | 
Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR .
llvm-svn: 294873
 | 
| | 
| 
| 
|  | 
llvm-svn: 294864
 | 
| | 
| 
| 
|  | 
llvm-svn: 294859
 | 
| | 
| 
| 
|  | 
llvm-svn: 294858
 | 
| | 
| 
| 
|  | 
llvm-svn: 294857
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
getTargetConstantBitsFromNode.
Removes duplicate constant extraction code in getTargetShuffleMaskIndices.
getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits.
llvm-svn: 294856
 | 
| | 
| 
| 
|  | 
llvm-svn: 294852
 | 
| | 
| 
| 
|  | 
llvm-svn: 294847
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
All commutations confirmed to give identical results - note PFMAX/PFMIN do not
PFSUB<->PFSUBR should be commutable as well
llvm-svn: 294846
 | 
| | 
| 
| 
|  | 
llvm-svn: 294843
 | 
| | 
| 
| 
|  | 
llvm-svn: 294837
 | 
| | 
| 
| 
|  | 
llvm-svn: 294830
 | 
| | 
| 
| 
|  | 
llvm-svn: 294829
 | 
| | 
| 
| 
|  | 
llvm-svn: 294827
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
is available.
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.
This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.
Overall I think this produces better results in the modified test cases.
llvm-svn: 294824
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Teach the cost model to consider f16 interleaved operations as
expensive.  Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.
llvm-svn: 294819
 |