| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
shift-rotate patterns.
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).
llvm-svn: 333641
|
| |
|
|
|
|
| |
_mm_sqrt_ss to match clang codegen after r333572.
llvm-svn: 333573
|
| |
|
|
|
|
| |
Prefix was for AVX512F instead of AVX512BW
llvm-svn: 333560
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for Clang lowering of fused intrinsics. This patch:
1. Removes bindings to clang fma intrinsics.
2. Introduces new LLVM unmasked intrinsics with rounding mode:
int_x86_avx512_vfmadd_pd_512
int_x86_avx512_vfmadd_ps_512
int_x86_avx512_vfmaddsub_pd_512
int_x86_avx512_vfmaddsub_ps_512
supported with a new intrinsic type (INTR_TYPE_3OP_RM).
3. Introduces new x86 fmaddsub/fmsubadd folding.
4. Introduces new tests for code emitted by sequentions introduced in Clang part.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D47443
llvm-svn: 333554
|
| |
|
|
|
|
|
|
| |
It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes.
This adds vzeroupper at the end of some tests as we lose the 'FeatureFastPartialYMMorZMMWrite' feature from KNL, since Skylake+ don't support this its probably better.
llvm-svn: 333549
|
| |
|
|
|
|
| |
It was noticed on D47377 that these tests (for PR37246) were being unnecessarily affected by scheduler changes.
llvm-svn: 333546
|
| |
|
|
|
|
| |
It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes.
llvm-svn: 333545
|
| |
|
|
|
|
|
|
| |
that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
The order should be controlled in the input pattern.
llvm-svn: 333463
|
| |
|
|
|
|
| |
The code could issue a truncate from a small type to larger type. We need to extend in that case instead.
llvm-svn: 333460
|
| |
|
|
| |
llvm-svn: 333431
|
| |
|
|
| |
llvm-svn: 333430
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
masking in Clang header files.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D47012
llvm-svn: 333419
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Avoid assert/crash during liveness calculation in situations
where the incoming machine function has statically unreachable BBs.
Second attempt at submitting; this version of the change includes
a revised testcase.
Fixes PR37130.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47372
llvm-svn: 333416
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
consistently used for i64->float/double conversions.
Summary: We already get this right if the i64 didn't come from a load.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47439
llvm-svn: 333393
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
After SNB, Intel CPUs can rename CF independently of other EFLAGS,
so the renamer can zero it for free. Note that STC still consumes resources.
To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC`
On SNB:
```
---
key:
opcode_name: CLC
mode: uops
config: ''
cpu_name: sandybridge
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: '3', value: 0.0014, debug_string: SBPort0 }
- { key: '4', value: 0.0013, debug_string: SBPort1 }
- { key: '5', value: 0.0003, debug_string: SBPort4 }
- { key: '6', value: 0.0029, debug_string: SBPort5 }
- { key: '10', value: 0.0003, debug_string: SBPort23 }
error: ''
info: 'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```
On HSW:
```
---
key:
opcode_name: CLC
mode: uops
config: ''
cpu_name: haswell
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: '3', value: 0.001, debug_string: HWPort0 }
- { key: '4', value: 0.0009, debug_string: HWPort1 }
- { key: '5', value: 0.0004, debug_string: HWPort2 }
- { key: '6', value: 0.0006, debug_string: HWPort3 }
- { key: '7', value: 0.0002, debug_string: HWPort4 }
- { key: '8', value: 0.0012, debug_string: HWPort5 }
- { key: '9', value: 0.0022, debug_string: HWPort6 }
- { key: '10', value: 0.0001, debug_string: HWPort7 }
error: ''
info: 'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```
Reviewers: craig.topper, RKSimon
Subscribers: gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D47362
llvm-svn: 333392
|
| |
|
|
|
|
| |
We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added.
llvm-svn: 333388
|
| |
|
|
|
|
|
|
| |
instructions for masking in clang.
This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch.
llvm-svn: 333386
|
| |
|
|
|
|
|
|
|
|
| |
These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node.
This is a step towards reducing the number of intrinsics needed.
A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial.
llvm-svn: 333383
|
| |
|
|
|
|
| |
Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added.
llvm-svn: 333360
|
| |
|
|
|
|
| |
This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6.
llvm-svn: 333346
|
| |
|
|
| |
llvm-svn: 333341
|
| |
|
|
|
|
| |
The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further.
llvm-svn: 333305
|
| |
|
|
|
|
|
|
| |
As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves
Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models......
llvm-svn: 333271
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47141
llvm-svn: 333255
|
| |
|
|
| |
llvm-svn: 333185
|
| |
|
|
|
|
|
|
| |
have a testcase.
Differential Revision: https://reviews.llvm.org/D47129
llvm-svn: 333167
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an
attribute on the CU DIE.
This changes the size of those headers, so use the parsed size whenever
we have one, for simplicitly.
Differential Revision: https://reviews.llvm.org/D47158
llvm-svn: 333004
|
| |
|
|
|
|
| |
These are related to: https://reviews.llvm.org/D46957
llvm-svn: 332962
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the FP sibling of D43141 with the corresponding IR change in rL327212.
We can't propagate undef here because if a variable operand is a NaN, these
binops must propagate NaN. Neither global nor node-level fast-math makes a
difference. If we have 'nnan', I think later folds can turn the NaN into undef.
The tests in X86/fp-undef.ll are meant to be the definitive verification for
these folds - everything reduces identically now.
The other test changes are collateral damage. They may need to be altered to
preserve their intent.
Differential Revision: https://reviews.llvm.org/D47026
llvm-svn: 332920
|
| |
|
|
|
|
| |
These can all be implemented with sitofp/uitofp instructions.
llvm-svn: 332916
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.
Differential Revision: https://reviews.llvm.org/D47156
llvm-svn: 332905
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated.
Differential Revision: https://reviews.llvm.org/D46528
llvm-svn: 332904
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`).
Differential Revision: https://reviews.llvm.org/D46008
llvm-svn: 332903
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
their amount masking modified by simplifyDemandedBits
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.
This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.
As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.
I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.
Differential Revision: https://reviews.llvm.org/D47116
llvm-svn: 332895
|
| |
|
|
|
|
| |
vector tests from some scalar test cases.
llvm-svn: 332892
|
| |
|
|
|
|
|
|
| |
This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics.
Differential Revision: https://reviews.llvm.org/D47124
llvm-svn: 332890
|
| |
|
|
|
|
|
|
|
|
| |
up to dwo output.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47089
llvm-svn: 332881
|
| |
|
|
|
|
| |
Patch by Thomasz Krupa.
llvm-svn: 332872
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedBits interfering with the AND masks
As requested in D47116
llvm-svn: 332869
|
| |
|
|
|
|
|
| |
copies
Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688
llvm-svn: 332849
|
| |
|
|
|
|
|
|
|
|
| |
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.
v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.
Differential Revision: https://reviews.llvm.org/D46954
llvm-svn: 332832
|
| |
|
|
|
|
|
|
| |
in IR instead.
Someday maybe we'll use selects for all intrinsics.
llvm-svn: 332824
|
| |
|
|
|
|
| |
SimplifyDemandedBits.
llvm-svn: 332815
|
| |
|
|
| |
llvm-svn: 332780
|
| |
|
|
|
|
| |
Fixes bug 37521.
llvm-svn: 332774
|
| |
|
|
| |
llvm-svn: 332756
|
| |
|
|
|
|
|
|
|
|
| |
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)
The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)
llvm-svn: 332745
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.
This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.
This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.
Fixes PR37499
Differential Revision: https://reviews.llvm.org/D47025
llvm-svn: 332743
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is a revert of the changes from https://reviews.llvm.org/D46265;
the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot
failures.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47061
llvm-svn: 332742
|
| |
|
|
|
|
| |
clang r332738.
llvm-svn: 332740
|