| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
inserts should be turned into fmsubadd.
This is a follow up to the fmaddsub support added in r320950. Hopefully in the future we can fix lowering to handle this fmsubadd too.
llvm-svn: 320951
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inserts functional and add tests.
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.
The missing coverage was noticed during the review of D40335.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41133
llvm-svn: 320950
|
|
|
|
| |
llvm-svn: 320949
|
|
|
|
|
|
|
|
|
| |
This is a preparetory change for function gc which also
requires relocations to be copied in ranges like this.
Differential Revision: https://reviews.llvm.org/D41313
llvm-svn: 320948
|
|
|
|
| |
llvm-svn: 320947
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These fields are useful for lld's gc-sections support
Also remove an unused field.
Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish
Differential Revision: https://reviews.llvm.org/D41320
llvm-svn: 320946
|
|
|
|
|
|
|
|
|
|
| |
getPointerDereferenceableBytes()"
This reverts commit 217067d5179882de9deb60d2e866befea4c126e7.
Fails on llvm-clang-x86_64-expensive-checks-win
llvm-svn: 320945
|
|
|
|
|
|
|
|
|
|
| |
getPointerDereferenceableBytes()"
This reverts commit 8b7a7660a3904b2088bc594311bcea2c651def08.
I didn't mean to commit this.
llvm-svn: 320944
|
|
|
|
|
|
| |
getPointerDereferenceableBytes()
llvm-svn: 320943
|
|
|
|
|
|
|
|
|
|
|
|
| |
CXXDependentScopeMemberExpr
* Also introduces ImportTemplateArgumentListInfo facility (A. Sidorin)
Patch by Peter Szecsi!
Differential Revision: https://reviews.llvm.org/D38692
llvm-svn: 320942
|
|
|
|
| |
llvm-svn: 320941
|
|
|
|
|
|
| |
using a SmallVector that really only ever has one element as a set.
llvm-svn: 320940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For byval arguments, the number of dereferenceable bytes is equal to
the size of the pointee, not the pointer.
Reviewers: hfinkel, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41305
llvm-svn: 320939
|
|
|
|
|
|
|
|
|
|
|
|
| |
getPointerDereferenceableBytes()
Reviewers: hfinkel, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41288
llvm-svn: 320938
|
|
|
|
|
|
| |
extractions.
llvm-svn: 320937
|
|
|
|
|
|
| |
This allows us to remove some isel patterns that allowed MVT::i8 result type.
llvm-svn: 320936
|
|
|
|
|
|
|
|
| |
and allow that to be legaized to VEXTRACT.
I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step.
llvm-svn: 320935
|
|
|
|
| |
llvm-svn: 320934
|
|
|
|
|
|
|
|
| |
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.
Fixes PR32007
llvm-svn: 320933
|
|
|
|
| |
llvm-svn: 320932
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
https://reviews.llvm.org/D41121 broke the FreeBSD build due to that type not
being defined on FreeBSD. As far as I can tell, it is an int, but I do not have
a way to test the change.
Reviewers: alekseyshl, kparzysz
Reviewed By: kparzysz
Subscribers: kparzysz, emaste, kubamracek, krytarowski, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D41325
llvm-svn: 320931
|
|
|
|
|
|
| |
Strip excess BITCASTs from EXTRACT_SUBVECTOR input
llvm-svn: 320930
|
|
|
|
|
|
|
|
| |
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.
llvm-svn: 320929
|
|
|
|
|
|
|
|
| |
getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI
In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs.
llvm-svn: 320928
|
|
|
|
|
|
| |
instead of creating a select with one input being 0.
llvm-svn: 320927
|
|
|
|
|
|
| |
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.
llvm-svn: 320926
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that this is the only change required in libcxx
for it to compile with the new `wasm32-unknown-unknown-wasm`
target recently added to Clang.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D41073
llvm-svn: 320925
|
|
|
|
|
|
| |
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.
llvm-svn: 320924
|
|
|
|
|
|
|
|
| |
scheduler models. Combine into single InstrRW entries.
The reduces the number of scheduler groups in subtarget info.
llvm-svn: 320923
|
|
|
|
| |
llvm-svn: 320922
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.
More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs
into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs
into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node
when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a
variety of ways.
a. For #2, the vector path is missing the case for setlt with a '1' constant.
b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel
produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the
shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift
produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific
decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.
define i32 @abs_shifty(i32 %x) {
%signbit = ashr i32 %x, 31
%add = add i32 %signbit, %x
%abs = xor i32 %signbit, %add
ret i32 %abs
}
define i32 @abs_cmpsubsel(i32 %x) {
%cmp = icmp slt i32 %x, zeroinitializer
%sub = sub i32 zeroinitializer, %x
%abs = select i1 %cmp, i32 %sub, i32 %x
ret i32 %abs
}
define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
%signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%add = add <4 x i32> %signbit, %x
%abs = xor <4 x i32> %signbit, %add
ret <4 x i32> %abs
}
define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
%cmp = icmp slt <4 x i32> %x, zeroinitializer
%sub = sub <4 x i32> zeroinitializer, %x
%abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
ret <4 x i32> %abs
}
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx
> abs_shifty:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_cmpsubsel:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_shifty_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> abs_cmpsubsel_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_cmpsubsel:
> cmp w0, #0 // =0
> cneg w0, w0, mi
> ret
>
> abs_shifty_vec:
> abs v0.4s, v0.4s
> ret
>
> abs_cmpsubsel_vec:
> abs v0.4s, v0.4s
> ret
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le
> abs_shifty:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_cmpsubsel:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_shifty_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
> abs_cmpsubsel_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
Differential Revision: https://reviews.llvm.org/D40984
llvm-svn: 320921
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are 2 parts to getting the -fassociative-math command-line flag translated to LLVM FMF:
1. In the driver/frontend, we accept the flag and its 'no' inverse and deal with the
interactions with other flags like -ffast-math -fno-signed-zeros -fno-trapping-math.
This was mostly already done - we just need to translate the flag as a codegen option.
The test file is complicated because there are many potential combinations of flags here.
Note that we are matching gcc's behavior that requires 'nsz' and no-trapping-math.
2. In codegen, we map the codegen option to FMF in the IR builder. This is simple code and
corresponding test.
For the motivating example from PR27372:
float foo(float a, float x) { return ((a + x) - x); }
$ ./clang -O2 27372.c -S -o - -ffast-math -fno-associative-math -emit-llvm | egrep 'fadd|fsub'
%add = fadd nnan ninf nsz arcp contract float %0, %1
%sub = fsub nnan ninf nsz arcp contract float %add, %2
So 'reassoc' is off as expected (and so is the new 'afn' but that's a different patch).
This case now works as expected end-to-end although the underlying logic is still wrong:
$ ./clang -O2 27372.c -S -o - -ffast-math -fno-associative-math | grep xmm
addss %xmm1, %xmm0
subss %xmm1, %xmm0
We're not done because the case where 'reassoc' is set is ignored by optimizer passes. Example:
$ ./clang -O2 27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math -emit-llvm | grep fadd
%add = fadd reassoc float %0, %1
$ ./clang -O2 27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math | grep xmm
addss %xmm1, %xmm0
subss %xmm1, %xmm0
Differential Revision: https://reviews.llvm.org/D39812
llvm-svn: 320920
|
|
|
|
| |
llvm-svn: 320919
|
|
|
|
|
|
| |
clang can implement with native IR.
llvm-svn: 320918
|
|
|
|
| |
llvm-svn: 320917
|
|
|
|
| |
llvm-svn: 320916
|
|
|
|
| |
llvm-svn: 320915
|
|
|
|
|
|
|
|
| |
subdirectory
This test depends on X86's TTI; move into the X86 subdirectory.
llvm-svn: 320914
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.
Fixes PR34965.
Patch by Diego Caballero, thanks!
Differential Revision: https://reviews.llvm.org/D40742
llvm-svn: 320913
|
|
|
|
|
|
| |
[-Werror=strict-prototypes]'
llvm-svn: 320912
|
|
|
|
|
|
|
|
|
|
|
| |
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.
Patch by Dmitry Venikov, thanks!
Differential Revision: https://reviews.llvm.org/D40552
llvm-svn: 320911
|
|
|
|
|
|
|
|
| |
classes LZCNT/POPCNT.
I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.
llvm-svn: 320910
|
|
|
|
| |
llvm-svn: 320909
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mimics FileCheck's --check-prefixes option.
The default prefix is "expected". That is, "-verify" is equivalent to
"-verify=expected".
The goal is to permit exercising a single test suite source file with different
compiler options producing different sets of diagnostics. While cpp can be
combined with the existing -verify to accomplish the same goal, source is often
easier to maintain when it's not cluttered with preprocessor directives or
duplicate passages of code. For example, this patch also rewrites some existing
clang tests to demonstrate the benefit of this feature.
Patch by Joel E. Denny, thanks!
Differential Revision: https://reviews.llvm.org/D39694
llvm-svn: 320908
|
|
|
|
| |
llvm-svn: 320907
|
|
|
|
|
|
|
|
|
|
| |
From working on lld I've learned this is generally the
preferred way for several reasons (e.g. more concise, improves
encapsulation).
Differential Revision: https://reviews.llvm.org/D41265
llvm-svn: 320906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
1. Use stream 0 only for combined module. Previously if combined module was not
processes ThinLTO used the stream for own output. However small changes in input,
could trigger combined module and shuffle outputs making life of llvm::LTO harder.
2. Always process combined module and write output to stream 0. Processing empty
combined module is cheap and allows llvm::LTO users to avoid implementing processing
which is already done in llvm::LTO.
Subscribers: mehdi_amini, inglorion, eraman, hiraditya
Differential Revision: https://reviews.llvm.org/D41267
llvm-svn: 320905
|
|
|
|
|
|
|
|
|
|
|
|
| |
The frontend currently groups diagnostics from the command line according to
diagnostic level, but that places all notes last. Fix that by emitting such
diagnostics in the order they were generated.
Patch by Joel E. Denny, thanks!
Differential Revision: https://reviews.llvm.org/D40995
llvm-svn: 320904
|
|
|
|
|
|
|
|
| |
r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG, found another place that needs this
added.
llvm-svn: 320903
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a specialized way to lower overflow-checked
multiplications with mixed-sign operands. This fixes link failures and
ICEs on code like this:
void mul(int64_t a, uint64_t b) {
int64_t res;
__builtin_mul_overflow(a, b, &res);
}
The generic checked-binop irgen would use a 65-bit multiplication
intrinsic here, which requires runtime support for _muloti4 (128-bit
multiplication), and therefore fails to link on i386. To get an ICE
on x86_64, change the example to use __int128_t / __uint128_t.
Adding runtime and backend support for 65-bit or 129-bit checked
multiplication on all of our supported targets is infeasible.
This patch solves the problem by using simpler, specialized irgen for
the mixed-sign case.
llvm.org/PR34920, rdar://34963321
Testing: Apart from check-clang, I compared the output from this fairly
comprehensive test driver using unpatched & patched clangs:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081
Differential Revision: https://reviews.llvm.org/D41149
llvm-svn: 320902
|