| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1)))
Enable peeking through one use bitcasts to the subvector shuffle.
This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns
llvm-svn: 352879
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.
This optimizes some floating point materization, inclusive values
used on isinf lowering.
Reviewed By: rengolin, efriedma, evandro
Differential Revision: https://reviews.llvm.org/D57044
llvm-svn: 352866
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I'm unable to find this number in the "AMD SOG for family 15h".
llvm-exegesis measures the latencies of these instructions as `2`,
which matches the latencies specified in "AMD SOG for family 15h".
However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver,
Steamroller and Excavator pipeline", "Data delay between different execution
domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency.
Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`,
which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay.
Additional data point comes from the fact that Agner's "Instruction tables",
for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does
state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`.
Reviewers: andreadb, RKSimon, craig.topper
Reviewed By: andreadb
Subscribers: gbedwell, courbet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57300
llvm-svn: 352861
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, LiveRegUnits was assuming that if a block has no successors
and does not return, then no registers are live at the end of it
(because the end of the block is unreachable). This was causing the
register scavenger to use callee-saved registers to materialise stack
frame addresses without saving them in the prologue. This would normally
be fine, because the end of the block is unreachable, but this is not
legal if the block ends by throwing a C++ exception. If this happens,
the scratch register will be modified, but its previous value won't be
preserved, so it doesn't get restored by the exception unwinder.
Differential revision: https://reviews.llvm.org/D57381
llvm-svn: 352844
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch:
* Adds necessary RV64D codegen patterns
* Modifies CC_RISCV so it will properly handle f64 types (with soft float ABI)
Note that in general there is no reason to try to select fcvt.w[u].d rather than fcvt.l[u].d for i32 conversions because fptosi/fptoui produce poison if the input won't fit into the target type.
Differential Revision: https://reviews.llvm.org/D53237
llvm-svn: 352833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For targets where i32 is not a legal type (e.g. 64-bit RISC-V),
LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this
is a signed value, this should be sign-extended.
This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for
RV64, as prior to this patch that file couldn't be compiled for RV64 due to an
assertion when performing codegen for fpowi.
Differential Revision: https://reviews.llvm.org/D54574
llvm-svn: 352832
|
| |
|
|
|
|
|
|
| |
If we're optimizing for size, that overrides the subtarget
feature, so we would always produce 'inc' if we matched
this pattern.
llvm-svn: 352821
|
| |
|
|
|
|
| |
Another pattern exposed in D57516.
llvm-svn: 352820
|
| |
|
|
|
|
|
| |
It should probably just be mandatory for getTgtMemIntrinsic to return
the alignment.
llvm-svn: 352817
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The custom lowering introduced in rL352592 creates build_vector nodes
with negative i32 operands, but these operands did not meet the value
range constraints necessary to match build_vector nodes. This CL fixes
the issue by removing the unnecessary constraints.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish
Differential Revision: https://reviews.llvm.org/D57481
llvm-svn: 352813
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This requires a little extra work due tothe fact i32 is not a legal type. When
call lowering happens post-legalisation (e.g. when an intrinsic was inserted
during legalisation). A bitcast from f32 to i32 can't be introduced. This is
similar to the challenges with RV32D. To handle this, we introduce
target-specific DAG nodes that perform bitcast+anyext for f32->i64 and
trunc+bitcast for i64->f32.
Differential Revision: https://reviews.llvm.org/D53235
llvm-svn: 352807
|
| |
|
|
| |
llvm-svn: 352805
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Also clean up some preexisting target feature code.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, jfb
Differential Revision: https://reviews.llvm.org/D57495
llvm-svn: 352793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes pr39098.
For the attached test case, CombineZExtLogicopShiftLoad can optimize it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
But later visitANDLike transforms it to
t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
t61: i32 = truncate t57
t63: i32 = srl t61, Constant:i8<1>
t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
t58: i64 = srl t57, Constant:i8<1>
t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64
And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.
Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.
Differential Revision: https://reviews.llvm.org/D57491
llvm-svn: 352792
|
| |
|
|
|
|
|
|
|
|
|
| |
r zero scale SMULFIX, expand into MUL which produces better code for X86.
For vector arguments, expand into MUL if SMULFIX is provided with a zero scale.
Otherwise, expand into MULH[US] or [US]MUL_LOHI.
Differential Revision: https://reviews.llvm.org/D56987
llvm-svn: 352783
|
| |
|
|
|
|
| |
This is causing a failure in chromium
llvm-svn: 352782
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D57514
llvm-svn: 352774
|
| |
|
|
|
|
|
|
|
|
| |
bitcast(insert_subvector(v,s,c2))
Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling.
Differential Revision: https://reviews.llvm.org/D57514
llvm-svn: 352773
|
| |
|
|
|
|
|
|
| |
increment-by-one
Missed some regression test updates when testing this.
llvm-svn: 352769
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
We were failing to match the canonicalized pattern when it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we
match 4 commuted variants of uaddo.
There's also a test with a crazy type to show that the existing CGP transform
based on this matcher is not limited by target legality checks, but that's a
different problem.
Differential Revision: https://reviews.llvm.org/D57516
llvm-svn: 352766
|
| |
|
|
| |
llvm-svn: 352751
|
| |
|
|
|
|
| |
Tidyup check-prefixes at the same time
llvm-svn: 352749
|
| |
|
|
|
|
| |
Fixes regression introduced by rL352743
llvm-svn: 352745
|
| |
|
|
|
|
|
|
| |
Enables 32/64-bit scalar load broadcasts on AVX1 targets
The extractelement-load.ll regression will be fixed shortly in a followup commit.
llvm-svn: 352743
|
| |
|
|
|
|
|
|
|
|
| |
broadcast(x)
If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast.
Avoids a regression when we enable AVX1 broadcasts in shuffle combining
llvm-svn: 352742
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Constants can also be materialised using the negated value and a MVN, and this
case seem to have been missed for Thumb2. To check the constant materialisation
costs, we now call getT2SOImmVal twice, once for the original constant and then
also for its negated value, and this function checks if the constant can both
be splatted or rotated.
This was revealed by a test that optimises for minsize: instead of a LDR
literal pool load and having a literal pool entry, just a MVN with an immediate
is smaller (and also faster).
Differential Revision: https://reviews.llvm.org/D57327
llvm-svn: 352737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And instead just generate a libcall. My motivating example on ARM was a simple:
shl i64 %A, %B
for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.
Differential Revision: https://reviews.llvm.org/D57386
llvm-svn: 352736
|
| |
|
|
| |
llvm-svn: 352720
|
| |
|
|
| |
llvm-svn: 352719
|
| |
|
|
|
|
| |
For AMDGPU the result is always 32-bit for 64-bit inputs.
llvm-svn: 352717
|
| |
|
|
| |
llvm-svn: 352712
|
| |
|
|
|
|
|
|
| |
mode only intrinsics to avx512-intrinsics-x86_64.ll.
Most of the other intrinsic tests have a 32-bit command lines.
llvm-svn: 352708
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fixes PR40267, in which the removed assertion was triggering on
perfectly valid IR. As far as I can tell, constant out of bounds
indices should be allowed when splitting extract_vector_elt, since
they will simply be propagated as out of bounds indices in the
resulting split vector and handled appropriately elsewhere.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya
Differential Revision: https://reviews.llvm.org/D57471
llvm-svn: 352702
|
| |
|
|
| |
llvm-svn: 352697
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.
It...
- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP
https://reviews.llvm.org/D57483
llvm-svn: 352692
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D57439
llvm-svn: 352689
|
| |
|
|
| |
llvm-svn: 352686
|
| |
|
|
|
|
|
|
|
| |
This adds instruction selection support for G_FABS in AArch64. It also updates
the existing basic FP tests, adds a selection test for G_FABS.
https://reviews.llvm.org/D57418
llvm-svn: 352684
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
After the staack is unwound due to a thrown exxception,
`__stack_pointer` global can point to an invalid address. So
a `global.set` to restore `__stack_pointer` should be inserted right
after `catch` instruction.
But after r352598 the `global.set` instruction is inserted not right
after `catch` but after `block` - `br-on-exn` - `end_block` -
`extract_exception` sequence. This CL fixes it.
While doing that, we can actually move ReplacePhysRegs pass after
LateEHPrepare and merge EHRestoreStackPointer pass into LateEHPrepare,
and now placing `global.set` to `__stack_pointer` right after `catch` is
much easier. Otherwise it is hard to guarantee that `global.set` is
still right after `catch` and not touched with other transformations, in
which case we have to do something to hoist it.
Reviewers: dschuff
Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D57421
llvm-svn: 352681
|
| |
|
|
|
|
|
|
|
|
| |
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.
This pattern could regress with the proposed change in D57401.
llvm-svn: 352680
|
| |
|
|
|
|
|
| |
As discussed/shown in D57401, we are missing a fold for
subtract of 0/1 --> add 0/-1.
llvm-svn: 352678
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.
It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.
https://reviews.llvm.org/D57357
llvm-svn: 352673
|
| |
|
|
|
|
|
|
|
|
| |
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.
https://reviews.llvm.org/D57361
llvm-svn: 352671
|
| |
|
|
|
|
|
|
|
|
|
| |
Follow-up commit to https://reviews.llvm.org/D57359. (r352668)
This adds IRTranslator support for recognising a @llvm.sqrt intrinsic and
translating it into a G_FSQRT.
https://reviews.llvm.org/D57360
llvm-svn: 352670
|
| |
|
|
|
|
|
|
|
| |
This introduces a generic instruction for computing the floating point
square root of a value.
Right now, we can't select @llvm.sqrt, so this is working towards fixing that.
llvm-svn: 352668
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56761
llvm-svn: 352664
|
| |
|
|
|
|
|
|
|
|
| |
This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler.
Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler.
Differential Revision: https://reviews.llvm.org/D57298
llvm-svn: 352660
|
| |
|
|
|
|
| |
This is the first step towards improving broadcast support on AVX1 targets.
llvm-svn: 352634
|
| |
|
|
| |
llvm-svn: 352601
|
| |
|
|
| |
llvm-svn: 352599
|