| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r345826 | erichkeane | 2018-11-01 05:50:37 -0700 (Thu, 01 Nov 2018) | 15 lines
CPU-Dispatch-- Fix conflict between 'generic' and 'pentium'
When a dispatch function was being emitted that had both a generic and a
pentium configuration listed, we would assert. This is because neither
configuration has any 'features' associated with it so they were both
considered the 'default' version. 'pentium' lacks any features because
we implement it in terms of __builtin_cpu_supports (instead of Intel
proprietary checks), which is unable to decern between the two.
The fix for this is to omit the 'generic' version from the dispatcher if
both are present. This permits existing code to compile, and still will
choose the 'best' version available (since 'pentium' is technically
better than 'generic').
Change-Id: I4b69f3e0344e74cbdbb04497845d5895dd05fda0
------------------------------------------------------------------------
llvm-svn: 348682
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r347556 | nemanjai | 2018-11-26 06:35:38 -0800 (Mon, 26 Nov 2018) | 11 lines
[PowerPC] Vector load/store builtins overstate alignment of pointers
A number of builtins in altivec.h load/store vectors from pointers to scalar
types. Currently they just cast the pointer to a vector pointer, but expressions
like that have the alignment of the target type. Of course, the input pointer
did not have that alignment so this triggers UBSan (and rightly so).
This resolves https://bugs.llvm.org/show_bug.cgi?id=39704
Differential revision: https://reviews.llvm.org/D54787
------------------------------------------------------------------------
llvm-svn: 347935
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r344824 | ctopper | 2018-10-19 18:30:00 -0700 (Fri, 19 Oct 2018) | 14 lines
[X86] When checking the bits in cpu_features for function multiversioning dispatcher in the resolver, make sure all the required bits are set. Not just one of them
Summary:
The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits.
The correct check is (_cpu_features & mask) == mask
Reviewers: erichkeane, echristo
Reviewed By: echristo
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D53460
------------------------------------------------------------------------
llvm-svn: 344923
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r340048 | nico | 2018-08-17 19:19:06 +0200 (Fri, 17 Aug 2018) | 10 lines
Make __shiftleft128 / __shiftright128 real compiler built-ins.
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
------------------------------------------------------------------------
llvm-svn: 340289
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r339317 | theraven | 2018-08-09 10:02:42 +0200 (Thu, 09 Aug 2018) | 15 lines
Correctly initialise global blocks on Windows.
Summary:
Windows does not allow globals to be initialised to point to globals in
another DLL. Exported globals may be referenced only from code. Work
around this by creating an initialiser that runs in early library
initialisation and sets the isa pointer.
Reviewers: rjmccall
Reviewed By: rjmccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D50436
------------------------------------------------------------------------
llvm-svn: 339339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r338602 | hans | 2018-08-01 19:51:23 +0200 (Wed, 01 Aug 2018) | 11 lines
Revert r338455 "[constexpr] Support for constant evaluation of __builtin_memcpy and __builtin_memmove (in non-type-punning cases)."
It caused asserts during Chromium builds, see reply on the cfe-commits thread.
> This is intended to permit libc++ to make std::copy etc constexpr
> without sacrificing the optimization that uses memcpy on
> trivially-copyable types.
>
> __builtin_strcpy and __builtin_wcscpy are not handled by this change.
> They'd be straightforward to add, but we haven't encountered a need for
> them just yet.
------------------------------------------------------------------------
llvm-svn: 338674
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__builtin_memmove (in non-type-punning cases).
This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.
__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.
llvm-svn: 338455
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: rnk, mstorsjo, ssijaric, haripul, TomTan
Reviewed By: rnk
Subscribers: kristof.beyls, chrib, cfe-commits
Differential Revision: https://reviews.llvm.org/D50029
llvm-svn: 338405
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:
```
unsigned char store = 0;
bool consume(unsigned int val);
void test(unsigned long val) {
if (consume(val)) {
// the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
// If their bit widths are different on this platform, the implicit
// truncation happens. And if that `unsigned long` had a value bigger
// than UINT_MAX, then you may or may not have a bug.
// Similarly, integer addition happens on `int`s, so `store` will
// be promoted to an `int`, the sum calculated (0+768=768),
// and the result demoted to `unsigned char`, and stored to `store`.
// In this case, the `store` will still be 0. Again, not always intended.
store = store + 768; // before addition, 'store' was promoted to int.
}
// But yes, sometimes this is intentional.
// You can either make the conversion explicit
(void)consume((unsigned int)val);
// or mask the value so no bits will be *implicitly* lost.
(void)consume((~((unsigned int)0)) & val);
}
```
Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.
The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.
The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ
Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.
As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.
One thing i know isn't handled is bitfields.
This is a clang part.
The compiler-rt part is D48959.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)
Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane
Reviewed By: rsmith, vsk, erichkeane
Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D48958
llvm-svn: 338288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "Procedure Call Procedure Call Standard for the ARM® Architecture"
(https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that
composite types are passed according to their "natural alignment", i.e. the
alignment before alignment adjustment on the entire composite is applied.
The same applies for AArch64 ABI.
Clang, however, used the adjusted alignment.
GCC already implements the ABI correctly. With this patch Clang becomes
compatible with GCC and passes such arguments in accordance with AAPCS.
Differential Revision: https://reviews.llvm.org/D46013
llvm-svn: 338279
|
|
|
|
| |
llvm-svn: 338246
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Refer the MS ARM64 ABI Convention for the behavior for struct returns:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions#return-values
Reviewers: mstorsjo, compnerd, rnk, javed.absar, yinma, efriedma
Reviewed By: rnk, efriedma
Subscribers: haripul, TomTan, yinma, efriedma, kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D49464
llvm-svn: 338050
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.
This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.
<rdar://problem/42563091>
Reviewers: dexonsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D49771
llvm-svn: 337887
|
|
|
|
|
|
| |
This reverts commit 4288dd3bf082482e02c8a044c611c18168cb0180.
llvm-svn: 337803
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 337800
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.
Differential Revision: https://reviews.llvm.org/D48829
llvm-svn: 337690
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
|
|
|
|
| |
llvm-svn: 337508
|
|
|
|
|
|
|
|
|
|
|
| |
constant, don't convert the rest into a packed struct.
If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.
llvm-svn: 337498
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.
Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: jyknight
Subscribers: drinkcat, xbolva00, cfe-commits
Differential Revision: https://reviews.llvm.org/D47894
llvm-svn: 337433
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Using _Atomic to do implicit load / store is just a seq_cst atomic_load / atomic_store. Stores currently assert in Sema::ImpCastExprToType with 'can't implicitly cast lvalue to rvalue with this cast kind', but that's erroneous. The codegen is fine as the test shows.
While investigating I found that Richard had found the problem here: https://reviews.llvm.org/D46112#1113557
<rdar://problem/40347123>
Reviewers: dexonsmith
Subscribers: cfe-commits, efriedma, rsmith, aaron.ballman
Differential Revision: https://reviews.llvm.org/D49458
llvm-svn: 337410
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
which was reverted in r337336.
The problem that required a revert was fixed in r337338.
Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.
Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337339
|
|
|
|
|
|
|
|
|
| |
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.
/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o
llvm-svn: 337336
|
|
|
|
|
|
|
|
|
|
|
| |
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.
Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337333
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.
Reviewers: compnerd, mstorsjo, rnk, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D49445
llvm-svn: 337327
|
|
|
|
| |
llvm-svn: 337277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the build path is short, `Line` field can end up fitting on the same line as `File`,
but the `{{.*}}` would consume it. Keeping in mind rL293149, i think we can fix it,
while keeping it working when there are and there are not any quotations.
At least this fixes this test for me.
Reviewers: anemet, aaron.ballman, hfinkel
Reviewed By: anemet
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D49348
llvm-svn: 337249
|
|
|
|
| |
llvm-svn: 337049
|
|
|
|
|
|
|
|
|
|
| |
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
|
|
|
|
|
|
| |
Using CUR_DIRECTION is not a realistic scenario. That is equivalent to the intrinsic without rounding.
llvm-svn: 337040
|
|
|
|
| |
llvm-svn: 336936
|
|
|
|
|
|
|
|
|
|
|
| |
string, choose the strictest one instead of the last.
Also fix an undefined behavior. Move the pointer update to a later point to
avoid adding StringRef::npos to the pointer.
rdar://problem/40706280
llvm-svn: 336863
|
|
|
|
|
|
|
|
|
|
| |
See https://reviews.llvm.org/D47106 for details.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D47172
llvm-svn: 336844
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.
The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.
Differential Revision: https://reviews.llvm.org/D48916
llvm-svn: 336842
|
|
|
|
|
|
|
|
| |
of a copy of the intrinsic implementation.
This had the same issue I just fixed in r336739. Apparently I copy pasted _mm512_mullo_epi64 when I added _mm512_mullox_epi64.
llvm-svn: 336740
|
|
|
|
|
|
| |
a copy of the intrinsic implementation.
llvm-svn: 336739
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38011
Committing on behalf of deepak2427 (Deepak Panickal)
Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope
Reviewed By: bjope
Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D48721
llvm-svn: 336717
|
|
|
|
| |
llvm-svn: 336681
|
|
|
|
| |
llvm-svn: 336676
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead
Differential Revision: https://reviews.llvm.org/D48712
llvm-svn: 336643
|
|
|
|
|
|
|
|
|
|
| |
rounding version of the fma intrinsics.
The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.
Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.
llvm-svn: 336637
|
|
|
|
|
|
|
|
|
|
| |
Privacy annotations shouldn't have to appear in the first
comma-delimited string in order to be recognized. Also, they should be
ignored if they are preceded or followed by non-whitespace characters.
rdar://problem/40706280
llvm-svn: 336629
|
|
|
|
|
|
|
|
|
|
|
|
| |
is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
|
|
|
|
|
|
|
|
|
| |
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)
Differential Revision: https://reviews.llvm.org/D48482
llvm-svn: 336596
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
|
|
|
|
|
|
|
|
|
| |
Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.
Differential Revision: https://reviews.llvm.org/D47548
llvm-svn: 336579
|
|
|
|
|
|
| |
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma
llvm-svn: 336507
|
|
|
|
|
|
|
|
|
|
| |
and hardcoded _MM_FROUND_CUR_DIRECTION internally.
I believe these have been broken since their introduction into clang.
I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.
llvm-svn: 336498
|
|
|
|
|
|
|
|
| |
that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.
llvm-svn: 336487
|