| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
The optimization looks for opportunities to emit bzero, not memset. Rename the functions accordingly (and clang-format the diff) because I want to add a fallback optimization which actually tries to generate memset. bzero is still better and it would confuse the code to merge both.
llvm-svn: 337636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HIP generates one fat binary for all devices after linking. However, for each compilation
unit a ctor function is emitted which register the same fat binary. Measures need to be
taken to make sure the fat binary is only registered once.
Currently each ctor function calls __hipRegisterFatBinary and stores the returned value
to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce
so that they are shared between LLVM modules. Then this patch adds check of value of
__hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code
is equivalent to
void *_gpubin_handle;
void ctor() {
if (__hip_gpubin_handle == 0) {
__hip_gpubin_handle = __hipRegisterFatBinary(...);
}
// register kernels and variables.
}
The patch also does similar change to dtors so that __hipUnregisterFatBinary
is called once.
Differential Revision: https://reviews.llvm.org/D49083
llvm-svn: 337631
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC doesn't, so neither should we.
Fixes PR38004, which is a crash that happens when we try to emit debug
info for a still-dependent partial variable template specialization.
As a follow-up, we should review what we're doing for function and class
member templates. It looks like we don't filter those out, but I can't
seem to get clang to emit any.
llvm-svn: 337616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
no-ops.
A non-escaping block on the stack will never be called after its
lifetime ends, so it doesn't have to be copied to the heap. To prevent
a non-escaping block from being copied to the heap, this patch sets
field 'isa' of the block object to NSConcreteGlobalBlock and sets the
BLOCK_IS_GLOBAL bit of field 'flags', which causes the runtime to treat
the block as if it were a global block (calling _Block_copy on the block
just returns the original block and calling _Block_release is a no-op).
Also, a new flag bit 'BLOCK_IS_NOESCAPE' is added, which allows the
runtime or tools to distinguish between true global blocks and
non-escaping blocks.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D49303
llvm-svn: 337580
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
|
|
|
|
| |
llvm-svn: 337530
|
|
|
|
| |
llvm-svn: 337508
|
|
|
|
|
|
|
|
|
|
|
| |
constant, don't convert the rest into a packed struct.
If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.
llvm-svn: 337498
|
|
|
|
| |
llvm-svn: 337480
|
|
|
|
| |
llvm-svn: 337473
|
|
|
|
|
|
|
|
|
|
|
| |
libomptarget. The changes in the interface are the following:
device IDs are now 64-bit integers (as opposed to 32-bit)
map flags are 64-bit long (used to be 32-bit)
mappings for partially mapped structs are now calculated at compile time and members of partially mapped structs are flagged using the MEMBER_OF field
Support for is_device_ptr on struct members was dropped - this functionality is not supported by the OpenMP standard and its implementation is technically infeasible (however, use_device_ptr on struct members works as a non-standard extension of the compiler)
llvm-svn: 337468
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous version of this patch (r332839) was reverted because it was
causing "definition with same mangled name as another definition" errors
in some module builds. This was caused by an unrelated bug in module
importing which it exposed. The importing problem was fixed in r336240,
so this recommits the original patch (r332839).
Differential Revision: https://reviews.llvm.org/D46685
llvm-svn: 337456
|
|
|
|
|
|
|
|
| |
The commit for
https://reviews.llvm.org/D49424
missed the comment about the extraneous semicolons. Remove them.
llvm-svn: 337451
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.
Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: jyknight
Subscribers: drinkcat, xbolva00, cfe-commits
Differential Revision: https://reviews.llvm.org/D47894
llvm-svn: 337433
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses CodeSegAttr to represent __declspec(code_seg) rather than
building on the existing support for #pragma code_seg.
The code_seg declspec is applied on functions and classes. This attribute
enables the placement of code into separate named segments, including compiler-
generated codes and template instantiations.
For more information, please see the following:
https://msdn.microsoft.com/en-us/library/dn636922.aspx
This patch fixes the regression for the support for attribute ((section).
https://github.com/llvm-mirror/clang/commit/746b78de7812bc785fbb5207b788348040b23fa7
Patch by Soumi Manna (Manna)
Differential Revision: https://reviews.llvm.org/D48841
llvm-svn: 337420
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
which was reverted in r337336.
The problem that required a revert was fixed in r337338.
Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.
Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337339
|
|
|
|
|
|
|
|
|
| |
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.
/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o
llvm-svn: 337336
|
|
|
|
|
|
|
|
|
|
|
| |
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.
Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337333
|
|
|
|
|
|
|
| |
Various places in Clang and LLVM are already using alignas; it seems
our minimum host configuration now requires it.
llvm-svn: 337330
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.
Reviewers: compnerd, mstorsjo, rnk, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D49445
llvm-svn: 337327
|
|
|
|
|
|
|
|
| |
If the declare target link entries are created but not used, the
compiler will produce an error message. Patch improves handling of such
situations + improves checks for possibly lost declare target variables.
llvm-svn: 337207
|
|
|
|
|
|
| |
Fixed spelling of the offloading error messages.
llvm-svn: 337196
|
|
|
|
|
|
|
| |
Sometimes we can try to globalize non-variable declarations, which may
lead to compiler crash.
llvm-svn: 337191
|
|
|
|
|
|
|
| |
This reverts commit r337082, restoring r337051, since the LLVM side
patch has been restored.
llvm-svn: 337185
|
|
|
|
|
|
| |
This reverts commit r337051.
llvm-svn: 337082
|
|
|
|
|
|
|
| |
Clang change to reflect the FunctionsToImportTy type change
in the llvm changes for D48670.
llvm-svn: 337051
|
|
|
|
|
|
|
|
|
|
| |
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit.
Reviewers: ABataev, carlo.bertolli, caomhin
Reviewed By: ABataev
Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D49188
llvm-svn: 337015
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.
The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.
Differential Revision: https://reviews.llvm.org/D48916
llvm-svn: 336842
|
|
|
|
| |
llvm-svn: 336840
|
|
|
|
|
|
|
|
|
| |
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
Note: This is what was intended to be committed in r336726
llvm-svn: 336729
|
|
|
|
| |
llvm-svn: 336727
|
|
|
|
|
|
|
|
| |
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
llvm-svn: 336726
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38011
Committing on behalf of deepak2427 (Deepak Panickal)
Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope
Reviewed By: bjope
Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D48721
llvm-svn: 336717
|
|
|
|
|
|
| |
__builtin_ia32_divsd_round_mask.
llvm-svn: 336628
|
|
|
|
|
|
|
|
|
|
|
|
| |
is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
|
|
|
|
|
|
|
|
|
|
| |
In generic data-sharing mode we are allowed to not globalize local
variables that escape their declaration context iff they are declared
inside of the parallel region. We can do this because L2 parallel
regions are executed sequentially and, thus, we do not need to put
shared local variables in the global memory.
llvm-svn: 336567
|
|
|
|
|
|
| |
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma
llvm-svn: 336507
|
|
|
|
|
|
|
|
|
|
| |
sure we optimize the all ones mask case.
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
|
|
|
|
|
|
|
|
| |
native IR using llvm.fma intrinsic.
This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns.
llvm-svn: 336417
|
|
|
|
|
|
|
|
|
|
| |
fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
|
|
|
|
|
|
|
|
|
| |
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.
Differential Revision: https://reviews.llvm.org/D48044
llvm-svn: 336308
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Emmiting new intrinsic that strips invariant.groups to make
devirtulization sound, as described in RFC: Devirtualization v2.
Reviewers: rjmccall, rsmith, amharc, kuhar
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D47299
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336137
|
|
|
|
|
|
| |
builtins instead.
llvm-svn: 335945
|
|
|
|
|
|
|
|
| |
That's where CUDA binaries appear to put them.
Differential Revision: https://reviews.llvm.org/D48615
llvm-svn: 335880
|
|
|
|
|
|
| |
Follow-up commit for r335757 to address some inconsistencies.
llvm-svn: 335834
|
|
|
|
|
|
|
|
|
|
| |
This matches the way NVCC does it. Doing module cleanup at global
destructor phase used to work, but is, apparently, too late for
the CUDA runtime in CUDA-9.2, which ends up crashing with double-free.
Differential Revision: https://reviews.llvm.org/D48613
llvm-svn: 335763
|