| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 289476
|
| |
|
|
|
|
| |
You Use warnings; other minor fixes (NFC).
llvm-svn: 289475
|
| |
|
|
|
|
|
|
|
| |
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.
Differential Revision: https://reviews.llvm.org/D27287
llvm-svn: 289473
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Unconditional branch uses relative addressing which is the right choice
in case of position independent code.
This is a fix for the bug:
https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445
Differential revision: https://reviews.llvm.org/D27483
llvm-svn: 289448
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
While the result is constant across a single primitive, each pixel
shader wave can have pixels from multiple primitives.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D27572
llvm-svn: 289447
|
| |
|
|
|
|
| |
combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time.
llvm-svn: 289430
|
| |
|
|
|
|
| |
Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift.
llvm-svn: 289429
|
| |
|
|
|
|
|
|
| |
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.
Differential Revision: https://reviews.llvm.org/D27657
llvm-svn: 289426
|
| |
|
|
|
|
|
|
| |
X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.
llvm-svn: 289424
|
| |
|
|
|
|
|
|
|
|
| |
its memory pattern instead of full vector load.
These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match.
Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole.
llvm-svn: 289423
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.
As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.
Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.
Reviewers: spatel, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27611
llvm-svn: 289419
|
| |
|
|
| |
llvm-svn: 289407
|
| |
|
|
|
|
|
|
|
|
|
| |
has multiple uses (for v4i64 and v4f64).
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.
Differential Revision: https://reviews.llvm.org/D27661
llvm-svn: 289404
|
| |
|
|
|
|
|
|
|
| |
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.
Differential Revision: https://reviews.llvm.org/D27148
llvm-svn: 289383
|
| |
|
|
| |
llvm-svn: 289352
|
| |
|
|
|
|
| |
being able to constant fold them in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289350
|
| |
|
|
| |
llvm-svn: 289349
|
| |
|
|
|
|
| |
This should've been removed in r289323.
llvm-svn: 289346
|
| |
|
|
|
|
| |
able to constant fold it in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289344
|
| |
|
|
|
|
| |
LowerHorizontalByteSum
llvm-svn: 289341
|
| |
|
|
|
|
| |
select around the unmasked avx1 intrinsics.
llvm-svn: 289340
|
| |
|
|
|
|
|
| |
This was failing when trying to fold immediates into operand 1 of a
phi, which only has one statically known operand.
llvm-svn: 289337
|
| |
|
|
|
|
|
|
| |
actually used.
Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........
llvm-svn: 289336
|
| |
|
|
|
|
| |
vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
|
| |
|
|
| |
llvm-svn: 289334
|
| |
|
|
|
|
| |
single boolean flag passed to a helper function. Just check the opcode and create the flag.
llvm-svn: 289333
|
| |
|
|
| |
llvm-svn: 289331
|
| |
|
|
| |
llvm-svn: 289326
|
| |
|
|
|
|
|
|
|
|
| |
There was a bug where we would hit an assertion if 'Q' was used as a
constraint.
I also removed hardcoded register names to prefer regexes so the tests
don't break when the register allocator changes.
llvm-svn: 289325
|
| |
|
|
|
|
|
| |
It looks like some time in the past, constraint codes were changed from
chars being passed around to enums.
llvm-svn: 289323
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This gets rid of the hardcoded 'r0' that was used previously.
Reviewers: asl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27567
llvm-svn: 289322
|
| |
|
|
|
|
| |
This would previously trigger an assertion error in AVRISelDAGToDAG.
llvm-svn: 289321
|
| |
|
|
|
|
|
|
|
|
| |
cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements.
Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements.
Similar things have already been done for other convert intrinsics.
llvm-svn: 289316
|
| |
|
|
|
|
|
|
|
|
| |
These should've been checking whether the immediate is a 6-bit unsigned
integer.
If the immediate was '63', this would cause an assertion error which
shouldn't have occurred.
llvm-svn: 289315
|
| |
|
|
|
|
|
| |
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.
llvm-svn: 289307
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.
Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.
The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.
llvm-svn: 289306
|
| |
|
|
| |
llvm-svn: 289292
|
| |
|
|
|
|
|
| |
Some of the immediates need to be printed differently
eventually.
llvm-svn: 289291
|
| |
|
|
|
|
| |
You Use warnings; other minor fixes (NFC).
llvm-svn: 289282
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: CI doesn't have XNACK.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27175
llvm-svn: 289263
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This frees 2 additional scalar registers.
These are results from all of my 3 patches combined:
Polaris:
Spilled SGPRs: 2231 -> 1517 (-32.00 %)
Tonga:
Spilled SGPRs: 3829 -> 2608 (-31.89 %)
Spilled VGPRs: 100 -> 84 (-16.00 %)
Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
limited to 64 VGPRs.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27151
llvm-svn: 289262
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This frees 2 scalar registers.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27150
llvm-svn: 289261
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.
Reviewers: tstellarAMD
Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27149
llvm-svn: 289260
|
| |
|
|
|
|
| |
This should do nothing for targets without i16.
llvm-svn: 289235
|
| |
|
|
|
|
| |
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.
llvm-svn: 289232
|
| |
|
|
| |
llvm-svn: 289231
|
| |
|
|
|
|
| |
Fixes assembler regressions.
llvm-svn: 289230
|
| |
|
|
|
|
|
|
|
| |
Sort the instruction bits by type and make sure there is one
for each format.
Also cleanup namespaces.
llvm-svn: 289229
|
| |
|
|
|
| |
Revision: https://reviews.llvm.org/D26547
llvm-svn: 289227
|
| |
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion
llvm-svn: 289226
|