| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 299278
|
| |
|
|
| |
llvm-svn: 299277
|
| |
|
|
|
|
| |
Found by PVS-Studio. Fixes llvm.org/PR31676.
llvm-svn: 299262
|
| |
|
|
|
|
| |
Found by PVS-Studio. Fixes llvm.org/PR32480.
llvm-svn: 299258
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D31547
llvm-svn: 299250
|
| |
|
|
|
|
|
|
|
|
| |
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
|
| |
|
|
|
|
|
|
|
| |
R600 uses higher AS number to access kernel parameters
Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520
llvm-svn: 299245
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL.
Reviewers: mcrosier, rengolin, t.p.northover
Subscribers: junbuml, gberry, llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D31113
llvm-svn: 299240
|
| |
|
|
|
|
|
|
|
|
| |
the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers.
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.
Fixes PR32411.
llvm-svn: 299234
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target,
restricting commutativity to second and third operand only for
dpaadd_[su].df instructions therein.
Prior to this change, there were cases where the vector that is to be added
to the dot product of the other two could take a position other than the
first one in the instruction, generating false output in the destination
vector.
Such behavior has been noticed in the two functions generating v2i64 output
values so far. Other ones may exhibit such behavior as well, just not for
the vector operands which are present in the test at the moment.
Tests altered so that the function's first operand is a constant splat so
that it can be loaded with a ldi instruction, since that is the case in
which the erroneous instruction operand placement has occurred. We check
that the register which is present in the ldi instruction is placed as the
first operand in the corresponding dpadd instruction.
Patch by Stefan Maksimovic.
Differential Revision: https://reviews.llvm.org/D30827
llvm-svn: 299223
|
| |
|
|
|
|
|
|
|
| |
Since LOCR only accepts GR32 virtual registers, its operands must be copied
into this regclass in insertSelect(), when an LOCR is built. Otherwise, the
case where the source operand was GRX32 will produce invalid IR.
Review: Ulrich Weigand
llvm-svn: 299220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
llvm-svn: 299219
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Even on older subtargets that lack vector support, there may be vector values
with just one element in the input program. These are converted during DAG
legalization to scalar values.
The pre-legalize SystemZ DAGCombiner methods should in this circumstance not
touch these nodes. This patch adds a check for this in
SystemZTargetLowering::combineEXTRACT_VECTOR_ELT().
Review: Ulrich Weigand
llvm-svn: 299213
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
llvm-svn: 299202
|
| |
|
|
|
|
|
|
|
|
| |
computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
|
| |
|
|
|
|
|
|
| |
the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline.
This reverts commit r298955.
llvm-svn: 299153
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the new relocations and linking metadata section support in
https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In
particular, this allows LLVM to indicate which variable is the stack pointer,
so that it can be linked with other objects.
This also adds support for emitting type relocations for call_indirect
instructions.
Right now, this is mainly tested by using wabt and hexdump to examine the
output on selected testcases. We'll add more tests as the design stablizes
and more of the pieces are in place.
llvm-svn: 299141
|
| |
|
|
|
|
|
|
| |
What we really want to do is distinguish functions that may
be called by other functions, and graphics shaders are not
called kernels.
llvm-svn: 299140
|
| |
|
|
|
|
| |
Add scope, order, isVolatile
llvm-svn: 299122
|
| |
|
|
| |
llvm-svn: 299114
|
| |
|
|
|
|
|
|
| |
This time after calls returning i1.
Fixes PR32472.
llvm-svn: 299112
|
| |
|
|
|
|
|
|
|
| |
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.
Differential Revision: https://reviews.llvm.org/D31489
llvm-svn: 299108
|
| |
|
|
| |
llvm-svn: 299103
|
| |
|
|
| |
llvm-svn: 299102
|
| |
|
|
|
|
| |
Based on corrections mentioned in patch for clang for PR27635
llvm-svn: 299072
|
| |
|
|
| |
llvm-svn: 299069
|
| |
|
|
|
|
| |
Unbreaks the build with GCC -Werror.
llvm-svn: 299030
|
| |
|
|
|
|
| |
NFCI.
llvm-svn: 299004
|
| |
|
|
| |
llvm-svn: 299000
|
| |
|
|
|
|
| |
lowerVectorShuffleAsSplitOrBlend (PR32453)
llvm-svn: 298993
|
| |
|
|
|
|
|
|
| |
computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI.
Based on comment in D31249.
llvm-svn: 298991
|
| |
|
|
|
|
| |
No longer makes sense as the previous opcode mnemonic it was referring to is long gone.
llvm-svn: 298988
|
| |
|
|
|
|
| |
r298956.
llvm-svn: 298986
|
| |
|
|
|
|
| |
GR32 is enough.
llvm-svn: 298985
|
| |
|
|
|
|
|
|
| |
we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.
llvm-svn: 298984
|
| |
|
|
|
|
|
|
|
|
| |
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.
After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.
This fixes PR32451.
llvm-svn: 298957
|
| |
|
|
|
|
|
|
|
|
| |
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
This patch fixed PR32442.
Differential Revision: https://reviews.llvm.org/D31407
llvm-svn: 298955
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
Differential Revision: https://reviews.llvm.org/D31412
llvm-svn: 298948
|
| |
|
|
|
|
|
|
|
|
| |
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.
Differential Revision: https://reviews.llvm.org/D31434
llvm-svn: 298945
|
| |
|
|
|
|
|
|
|
|
| |
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Differential Revision: https://reviews.llvm.org/D30868
llvm-svn: 298943
|
| |
|
|
|
|
|
|
|
|
| |
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Differential Revision: https://reviews.llvm.org/D31429
llvm-svn: 298935
|
| |
|
|
|
|
|
| |
Follow-up to:
https://reviews.llvm.org/rL298775
llvm-svn: 298933
|
| |
|
|
|
|
|
|
| |
It breaks some lld tests.
This reverts commit 3a50eea6d9732ab40e9a7aebe6be777b53a8b35c.
llvm-svn: 298932
|
| |
|
|
| |
llvm-svn: 298929
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
llvm-svn: 298928
|
| |
|
|
|
|
|
|
| |
easier. NFCI.
Call the matchVectorShuffleAsBlend test as early as possible.
llvm-svn: 298925
|
| |
|
|
| |
llvm-svn: 298917
|
| |
|
|
|
|
| |
Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining.
llvm-svn: 298914
|
| |
|
|
| |
llvm-svn: 298911
|
| |
|
|
|
|
|
|
| |
shuffle combines.
Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier.
llvm-svn: 298910
|