| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
shuffles before simplifying inputs
By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....).
llvm-svn: 343390
|
| |
|
|
|
|
|
|
| |
handling.
This is all handled generally by getTargetConstantBitsFromNode now
llvm-svn: 343387
|
| |
|
|
| |
llvm-svn: 343385
|
| |
|
|
|
|
|
|
| |
bits via shuffles
Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet.
llvm-svn: 343384
|
| |
|
|
|
|
|
|
|
|
| |
get immediate data
Don't just attempt to find a splat build vector.
First step towards getting rid of all the 32-bit special case code.
llvm-svn: 343383
|
| |
|
|
|
|
| |
builds due to rL343375
llvm-svn: 343377
|
| |
|
|
|
|
| |
ISD::EXTRACT_SUBVECTOR
llvm-svn: 343375
|
| |
|
|
|
|
| |
The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats.
llvm-svn: 343373
|
| |
|
|
|
|
| |
Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB
llvm-svn: 343321
|
| |
|
|
|
|
|
|
| |
We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe)
Match AMD Fam16h SOG + llvm-exegesis tests
llvm-svn: 343314
|
| |
|
|
|
|
|
|
| |
Double throughput to account for 2 pipes + fix BSF's latency/uop counts
Match AMD Fam16h SOG + llvm-exegesis tests
llvm-svn: 343311
|
| |
|
|
|
|
| |
PHMINPOS can run on either JFPU pipe
llvm-svn: 343299
|
| |
|
|
| |
llvm-svn: 343241
|
| |
|
|
| |
llvm-svn: 343238
|
| |
|
|
| |
llvm-svn: 343234
|
| |
|
|
| |
llvm-svn: 343233
|
| |
|
|
| |
llvm-svn: 343227
|
| |
|
|
| |
llvm-svn: 343200
|
| |
|
|
| |
llvm-svn: 343194
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: The convenience wrapper in STLExtras is available since rL342102.
Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb
Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52573
llvm-svn: 343163
|
| |
|
|
|
|
| |
Don't reinvent the wheel for BUILD_VECTOR/ZERO_EXTEND - its only the ANY_EXTEND special case that needs handling.
llvm-svn: 343096
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Example output for vzeroall:
---
mode: uops
key:
instructions:
- 'VZEROALL'
config: ''
register_initial_values:
cpu_name: haswell
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006,
key: '3' }
- { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011,
key: '4' }
- { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004,
key: '5' }
- { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018,
key: '6' }
- { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002,
key: '7' }
- { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019,
key: '8' }
- { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033,
key: '9' }
- { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001,
key: '10' }
- { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069,
key: NumMicroOps }
error: ''
info: ''
assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3
...
Reviewers: gchatelet
Subscribers: tschuett, RKSimon, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52539
llvm-svn: 343094
|
| |
|
|
|
|
|
|
|
|
| |
Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS.
As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero.
Differential Revision: https://reviews.llvm.org/D52171
llvm-svn: 343093
|
| |
|
|
|
|
|
|
|
|
| |
input types.
This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there.
But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way.
llvm-svn: 343046
|
| |
|
|
| |
llvm-svn: 343026
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749
We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops.
We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like
extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches
that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.
The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test,
we have this vector-type-legalized sequence:
t29: v8i32 = concat_vectors t27, t28
t30: v4i64 = bitcast t29
t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ...
t31: v4i64 = bitcast t18
t32: v4i64 = xor t30, t31
t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ...
t34: v4i64 = bitcast t9
t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
t37: v4i32 = extract_subvector t36, Constant:i64<0>
t38: v4i32 = extract_subvector t36, Constant:i64<4>
Differential Revision: https://reviews.llvm.org/D52318
llvm-svn: 343008
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52424
llvm-svn: 342989
|
| |
|
|
|
|
|
|
| |
As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead.
The uops are slightly different to the register variant, so requires a +1uop tweak
llvm-svn: 342969
|
| |
|
|
|
|
|
|
| |
The included test case previously asserted because the type legalizer tried to soften the FILD ISD node.
Fixes PR38819.
llvm-svn: 342934
|
| |
|
|
| |
llvm-svn: 342933
|
| |
|
|
| |
llvm-svn: 342932
|
| |
|
|
|
|
| |
The uops are slightly different to the register variant, so requires a +1uop tweak
llvm-svn: 342916
|
| |
|
|
|
|
|
|
| |
We're missing quite a bit of data for these instruction, removing the overrides makes this obvious - inconsistent reg/mem variants is a concern as well.
Also, we have Divider resources (HWDivider etc.) but they aren't actually used consistently.
llvm-svn: 342904
|
| |
|
|
|
|
|
|
| |
Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.
This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.
llvm-svn: 342892
|
| |
|
|
|
|
| |
Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models.
llvm-svn: 342852
|
| |
|
|
|
|
| |
SNB was the last override for ROT(L|R)r(1|i) - they now all use WriteRotate correctly.
llvm-svn: 342848
|
| |
|
|
| |
llvm-svn: 342847
|
| |
|
|
|
|
|
|
| |
Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models.
Yet another step on the scheduler model cleanup marathon......
llvm-svn: 342846
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
negation
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose
multiplies generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.
This extends D52195 and may resolve PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)
llvm-svn: 342844
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.
I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.
This is necessary to allow WriteRotate to be updated to remove other rotate overrides.
It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.
llvm-svn: 342842
|
| |
|
|
| |
llvm-svn: 342841
|
| |
|
|
|
|
| |
Despite being rotates, these more modern instructions avoid many of the quirks of the regular x86 rotate instructions and consistently have a schedule closer to shifts.
llvm-svn: 342839
|
| |
|
|
|
|
|
|
| |
NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit.
Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward.
llvm-svn: 342837
|
| |
|
|
|
|
|
|
| |
Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate.
The pattern needs to extend to an v8i32 then truncate back down to v8i16.
llvm-svn: 342830
|
| |
|
|
| |
llvm-svn: 342829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Similar to D51893 which was for memcpy
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52063
llvm-svn: 342796
|
| |
|
|
|
|
|
|
| |
vXi8 vectors.
We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove.
llvm-svn: 342795
|
| |
|
|
|
|
|
|
| |
into a 64-bit register.
Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same.
llvm-svn: 342788
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On SNB, renamer-based zeroing does not work for:
- 16 and 8-bit GPRs[1].
- MMX [2].
- ANDN variants [3]
[1] echo 'sub %ax, %ax' | /tmp/llvm-exegesis -mode=uops -snippets-file=-
[2] echo 'pxor %mm0, %mm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=-
[3] echo 'andnps %xmm0, %xmm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=-
Reviewers: RKSimon, andreadb
Subscribers: gbedwell, craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D52358
llvm-svn: 342736
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S|D)Yrr
and VANDNP(S|D)Yrr.
This is a follow-up of r342555.
On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at
register-renaming stage. The other opcode has to be executed to set the upper
half of the destination YMM.
Same for VANDNP(S|D)Yrr.
Differential Revision: https://reviews.llvm.org/D52347
llvm-svn: 342728
|