| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48934
llvm-svn: 336484
|
| |
|
|
| |
llvm-svn: 336453
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
llvm-svn: 336426
|
| |
|
|
|
|
|
|
| |
The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.
There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.
llvm-svn: 336416
|
| |
|
|
|
|
|
|
|
|
| |
unmasked 512-bit intrinsics with rounding mode.
This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.
This matches how clang uses the intrinsics these days.
llvm-svn: 336409
|
| |
|
|
|
|
|
|
|
|
| |
and autoupgrading.
-Split cases that call 2 intrinsics in the same case.
-Remove testing mask3 and maskz intrinsics with an all ones mask. These won't be interesting after the upgrade.
-Restore test cases for some intrinsics that are marked for deletion, but haven't been deleted yet.
llvm-svn: 336408
|
| |
|
|
| |
llvm-svn: 336404
|
| |
|
|
|
|
|
|
| |
'llvm.fma'. Add upgrade tests for all.
Still need to remove the AVX512 masked versions.
llvm-svn: 336383
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48954
llvm-svn: 336376
|
| |
|
|
|
|
| |
Now that D45806 has landed we can start trying to avoid scalarizing srem by constant - these tests demonstrate some example cases.
llvm-svn: 336360
|
| |
|
|
| |
llvm-svn: 336348
|
| |
|
|
|
|
| |
We want to compare shifts with repeated vs non-repeated v8i16 shuffle masks (for PBLENDW ymm)
llvm-svn: 336333
|
| |
|
|
|
|
| |
independent FMA and extractelement/insertelement.
llvm-svn: 336315
|
| |
|
|
|
|
|
|
|
|
| |
input.
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).
This patch fixes that so we can also combine things like (fmsub (fneg)).
llvm-svn: 336304
|
| |
|
|
|
|
|
|
|
|
| |
in clang.
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.
More removals to come, but I wanted to stop and fix the regression that showed up in this first.
llvm-svn: 336303
|
| |
|
|
| |
llvm-svn: 336277
|
| |
|
|
|
|
|
|
| |
Show the difference in behaviour cf SSE41 (no PMULLD, PBLENDW etc.)
Raised by D48936
llvm-svn: 336271
|
| |
|
|
|
|
|
|
| |
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.
llvm-svn: 336250
|
| |
|
|
|
|
|
|
| |
v8i16/v4i32 shift with 2 shift unique values
The patch was reverted at r336189 due to crashes
llvm-svn: 336247
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following code pattern:
mov %rax, %rcx
test %rax, %rax
%rax = ....
je throw_npe
mov(%rcx), %r9
mov(%rax), %r10
gets transformed into the following incorrect code after implicit null check pass:
mov %rax, %rcx
%rax = ....
faulting_load_op("movl (%rax), %r10", throw_npe)
mov(%rcx), %r9
For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization.
Patch by Surya Kumari Jangala!
Differential Revision: https://reviews.llvm.org/D48627
Reviewed By: skatkov
llvm-svn: 336241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
D48768 may turn some of these into shifts.
Reviewers: spatel
Reviewed By: spatel
Subscribers: spatel, RKSimon, llvm-commits, craig.topper
Differential Revision: https://reviews.llvm.org/D48767
llvm-svn: 336224
|
| |
|
|
|
|
|
|
| |
This might make the error message added in r335668 unneeded, but I'm not sure yet.
The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.
llvm-svn: 336217
|
| |
|
|
|
|
| |
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code
llvm-svn: 336198
|
| |
|
|
|
|
| |
This reverts commit r336113. It causes crashes.
llvm-svn: 336189
|
| |
|
|
| |
llvm-svn: 336169
|
| |
|
|
|
|
|
|
|
|
| |
Add registers still missing after r328016 (D43353):
- for bits 15-8 of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).
Thanks to Craig Topper for pointing it out.
llvm-svn: 336134
|
| |
|
|
|
|
|
|
|
|
| |
isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.
Should fix PR38001
llvm-svn: 336121
|
| |
|
|
|
|
| |
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
llvm-svn: 336113
|
| |
|
|
|
|
|
|
| |
blend
We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends.
llvm-svn: 336112
|
| |
|
|
|
|
|
|
| |
clang intrinsic names.
I thought I fixed these yesterday, but I guess I missed a few.
llvm-svn: 336071
|
| |
|
|
|
|
|
|
|
|
| |
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.
Thanks to @zvi for the original patch.
Differential Revision: https://reviews.llvm.org/D45806
llvm-svn: 336048
|
| |
|
|
|
|
|
|
|
|
| |
Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value.
Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed.
Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend.
llvm-svn: 336045
|
| |
|
|
|
|
|
|
| |
command line.
I think this test changed and these test cases were created around the same time and missed the change.
llvm-svn: 336044
|
| |
|
|
|
|
| |
that don't really exist in clang.
llvm-svn: 336043
|
| |
|
|
| |
llvm-svn: 336035
|
| |
|
|
|
|
|
|
|
|
| |
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.
The changed test cases still aren't optimal.
Differential Revision: https://reviews.llvm.org/D48619
llvm-svn: 335998
|
| |
|
|
|
|
|
|
|
|
| |
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.
Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.
Differential Revision: https://reviews.llvm.org/D48655
llvm-svn: 335957
|
| |
|
|
|
|
|
|
| |
IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.
llvm-svn: 335944
|
| |
|
|
|
|
|
|
|
|
| |
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.
Differential Revision: https://reviews.llvm.org/D48567
llvm-svn: 335918
|
| |
|
|
|
|
|
|
|
|
| |
btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.
Differential Revision: https://reviews.llvm.org/D48706
llvm-svn: 335895
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.
LLVM ERROR: unsupported relocation with subtraction expression, symbol
'__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
expression
llvm-svn: 335894
|
| |
|
|
|
|
|
|
|
|
| |
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.
llvm-svn: 335886
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
|
| |
|
|
|
|
|
|
| |
This reverts commit r335821.
This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.
llvm-svn: 335871
|
| |
|
|
|
|
|
|
| |
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
llvm-svn: 335821
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
llvm-svn: 335761
|
| |
|
|
|
|
|
|
|
|
| |
position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.
Fixes PR37938.
llvm-svn: 335754
|
| |
|
|
| |
llvm-svn: 335753
|
| |
|
|
|
|
| |
Increase coverage to make sure we're not doing anything stupid without AVX512BW
llvm-svn: 335746
|
| |
|
|
|
|
|
|
| |
that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".
llvm-svn: 335744
|