| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)
Differential Revision: https://reviews.llvm.org/D48482
llvm-svn: 336596
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
|
|
|
|
|
|
|
|
|
| |
Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.
Differential Revision: https://reviews.llvm.org/D47548
llvm-svn: 336579
|
|
|
|
|
|
| |
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma
llvm-svn: 336507
|
|
|
|
|
|
|
|
|
|
| |
and hardcoded _MM_FROUND_CUR_DIRECTION internally.
I believe these have been broken since their introduction into clang.
I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.
llvm-svn: 336498
|
|
|
|
|
|
|
|
| |
that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.
llvm-svn: 336487
|
|
|
|
|
|
|
|
|
|
| |
sure we optimize the all ones mask case.
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
|
|
|
|
|
|
|
|
| |
We had the mask versions of the rounding intrinsics, but not one without masking.
Also change the rounding tests to not use the CUR_DIRECTION rounding mode.
llvm-svn: 336470
|
|
|
|
|
|
|
|
| |
native IR using llvm.fma intrinsic.
This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns.
llvm-svn: 336417
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
smaller than the literal
A Chromium developer reported a bug which turned out to be a mangling
collision between these two literals:
char s[] = "foo";
char t[32] = "foo";
They may look the same, but for the initialization of t we will (under
some circumstances) use a literal that's extended with zeros, and
both the length and those zeros should be accounted for by the mangling.
This actually makes the mangling code simpler: where it previously had
special logic for null terminators, which are not part of the
StringLiteral, that is now covered by the general algorithm.
(The problem was reported at https://crbug.com/857442)
Differential Revision: https://reviews.llvm.org/D48928
llvm-svn: 336415
|
|
|
|
|
|
|
|
|
|
| |
fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask
Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.
llvm-svn: 336346
|
|
|
|
|
|
|
|
|
| |
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.
Differential Revision: https://reviews.llvm.org/D48044
llvm-svn: 336308
|
|
|
|
| |
llvm-svn: 336264
|
|
|
|
| |
llvm-svn: 336243
|
|
|
|
|
|
| |
is already there
llvm-svn: 336047
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.
Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8
llvm-svn: 336042
|
|
|
|
|
|
| |
instead.
llvm-svn: 336036
|
|
|
|
|
|
| |
builtins instead.
llvm-svn: 335945
|
|
|
|
|
|
|
|
|
|
|
| |
Fails on Green Dragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50174/consoleFull
UNRESOLVED: Clang :: CodeGen/vld_dup.c (5546 of 38947)
******************** TEST 'Clang :: CodeGen/vld_dup.c' FAILED ********************
Test has no run line!
llvm-svn: 335750
|
|
|
|
|
|
| |
name to match llvm.
llvm-svn: 335745
|
|
|
|
|
|
|
|
|
| |
This patch reworks the support for dup NEON intrinsics as
described in D48439.
Differential Revision: https://reviews.llvm.org/D48440
llvm-svn: 335734
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Changes to some clang side tests to go with the summary parsing patch.
Depends on D47905.
Reviewers: pcc, dexonsmith, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits, steven_wu
Differential Revision: https://reviews.llvm.org/D47906
llvm-svn: 335618
|
|
|
|
|
|
|
|
|
|
| |
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.
Differential Revision: https://reviews.llvm.org/D48443
llvm-svn: 335510
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With MSVC, PCH files are created along with an object file that needs to
be linked into the final library or executable. That object file
contains the code generated when building the headers. In particular, it
will include definitions of inline dllexport functions, and because they
are emitted in this object file, other files using the PCH do not need
to emit them. See the bug for an example.
This patch makes clang-cl match MSVC's behaviour in this regard, causing
significant compile-time savings when building dlls using precompiled
headers.
For example, in a 64-bit optimized shared library build of Chromium with
PCH, it reduces the binary size and compile time of
stroke_opacity_custom.obj from 9315564 bytes to 3659629 bytes and 14.6
to 6.63 s. The wall-clock time of building blink_core.dll goes from
38m41s to 22m33s. ("user" time goes from 1979m to 1142m).
Differential Revision: https://reviews.llvm.org/D48426
llvm-svn: 335466
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.
Affected AVX512 builtins:
__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: craig.topper, spatel, efriedma
Differential Revision: https://reviews.llvm.org/D45616
llvm-svn: 335339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dead code.
This is important for C++ templates that essentially compute the valid
input in a way that is constant and will cause all the invalid cases to
be dead code that is deleted. Code in the wild actually does this and
GCC also accepts these kinds of patterns so it is important to support
it.
To make this work, we provide a non-error path to diagnose these issues,
and use a default-error warning instead. This keeps the relatively
strict handling but prevents nastiness like SFINAE on these errors. It
also allows us to safely use the system to diagnose this only when it
occurs at runtime (in emitted code).
Entertainingly, this required fixing the syntax in various other ways
for the x86 test because we never bothered to diagnose that the returns
were invalid.
Since debugging these compile failures was super confusing, I've also
improved the diagnostic to actually say what the value was. Most of the
checks I've made ignore this to simplify maintenance, but I've checked
it in a few places to make sure the diagnsotic is working.
Depends on D48462. Without that, we might actually crash some part of
the compiler after bypassing the error here.
Thanks to Richard, Ben Kramer, and especially Craig Topper for all the
help here.
Differential Revision: https://reviews.llvm.org/D48464
llvm-svn: 335309
|
|
|
|
|
|
|
|
| |
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.
This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.
llvm-svn: 335308
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR37898.
Reviewers: pcc, vlad.tsyrklevich
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D48454
llvm-svn: 335305
|
|
|
|
|
|
|
|
|
|
|
| |
This is breaking a couple of buildbots. We need to run the
NameAnonGlobal pass for regular LTO now as well (since we're producing a
summary). I'll post a separate patch for review to make this happen and
then re-commit.
This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd.
llvm-svn: 335291
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.
This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).
Reviewers: pcc, tejohnson, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D34156
llvm-svn: 335284
|
|
|
|
|
|
|
|
|
|
|
|
| |
__stosw/d/q to mark registers/memory as modified
The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified.
This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed.
Differential Revision: https://reviews.llvm.org/D48448
llvm-svn: 335270
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This test is a strip down version of a function inside the
amalgamated sqlite source. When converted to IR clang produces
a phi instruction without debug location.
This patch fixes the above issue.
Differential Revision: https://reviews.llvm.org/D47720
llvm-svn: 335255
|
|
|
|
|
|
|
|
|
|
|
|
| |
other intrinsics and remove undef shuffle indices.
Similar to what was done to max/min recently.
These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code.
Differential Revision: https://reviews.llvm.org/D48346
llvm-svn: 335253
|
|
|
|
|
|
| |
select in IR instead.
llvm-svn: 335200
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SPIR target currently allows for half precision floating point types to be
emitted using the LLVM intrinsic functions which convert half types to floats
and doubles. However, this is illegal in SPIR as the only intrinsic allowed by
SPIR is memcpy, as per section 3 of the SPIR specification. Currently this is
leading to an assert being hit in the Clang CodeGen when attempting to emit a
constant or literal _Float16 type in a comparison operation on a SPIR or SPIR64
target. This assert stems from the CodeGen attempting to emit a constant half
value as an integer because the backend has specified that it is using these
half conversion intrinsics (which represents half as i16). This patch prevents
SPIR targets from using these intrinsics by overloading the responsible target
info method, marks SPIR targets as having a legal half type and provides
additional regression testing for the _Float16 type on SPIR targets.
Patch by: Stephen McGroarty
Differential Revision: https://reviews.llvm.org/D48188
llvm-svn: 335111
|
|
|
|
|
|
|
|
| |
better use of other functions and to reduce width to 256 and 128 bits were possible.""
Test has been updated to reflect the IRGen.
llvm-svn: 335075
|
|
|
|
|
|
|
|
| |
better use of other functions and to reduce width to 256 and 128 bits were possible."
The test changes are failing the buildbot and its going to take me some time to fix it.
llvm-svn: 335072
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other functions and to reduce width to 256 and 128 bits were possible.
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.
For v16i32 and floating point we have legacy 128/256 bit instructions we can use.
I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.
Differential Revision: https://reviews.llvm.org/D47401
llvm-svn: 335070
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, cfe-commits
Differential Revision: https://reviews.llvm.org/D41168
llvm-svn: 334850
|
|
|
|
| |
llvm-svn: 334820
|
|
|
|
|
|
|
|
|
|
| |
_InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility.
Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix.
Differential Revision: https://reviews.llvm.org/D47672
llvm-svn: 334751
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.
Reviewers: craig.topper, RKSimon, spatel, sroland
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47979
llvm-svn: 334741
|
|
|
|
| |
llvm-svn: 334696
|
|
|
|
| |
llvm-svn: 334693
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.
Reviewers: mstorsjo, compnerd, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, cfe-commits
Differential Revision: https://reviews.llvm.org/D48132
llvm-svn: 334639
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=37778
...shows a miscompile resulting from marking nan builtins as 'const'.
The nan libcalls/builtins take a pointer argument:
http://www.cplusplus.com/reference/cmath/nan-function/
...and the chars dereferenced by that arg are used to fill in the NaN constant payload bits.
"const" means that the pointer argument isn't dereferenced. That's translated to "readnone" in LLVM.
"pure" means that the pointer argument may be dereferenced. That's translated to "readonly" in LLVM.
This change prevents the IR optimizer from killing the lead-up to the nan call here:
double a() {
char buf[4];
buf[0] = buf[1] = buf[2] = '9';
buf[3] = '\0';
return __builtin_nan(buf);
}
...the optimizer isn't currently able to simplify this to a constant as we might hope,
but this patch should solve the miscompile.
Differential Revision: https://reviews.llvm.org/D48134
llvm-svn: 334628
|
|
|
|
|
|
| |
builtins. Use select builtins instead.
llvm-svn: 334577
|