summaryrefslogtreecommitdiffstats
path: root/clang/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Merging r347262:Tom Stellard2018-11-281-11/+42
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r347262 | vedantk | 2018-11-19 12:10:22 -0800 (Mon, 19 Nov 2018) | 8 lines [Coverage] Fix PR39258: support coverage regions that start deeper than they end popRegions used to assume that the start location of a region can't be nested deeper than the end location, which is not always true. Patch by Orivej Desh! Differential Revision: https://reviews.llvm.org/D53244 ------------------------------------------------------------------------ llvm-svn: 347798
* Merging r344824:Tom Stellard2018-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r344824 | ctopper | 2018-10-19 18:30:00 -0700 (Fri, 19 Oct 2018) | 14 lines [X86] When checking the bits in cpu_features for function multiversioning dispatcher in the resolver, make sure all the required bits are set. Not just one of them Summary: The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits. The correct check is (_cpu_features & mask) == mask Reviewers: erichkeane, echristo Reviewed By: echristo Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D53460 ------------------------------------------------------------------------ llvm-svn: 344923
* Merging r340181:Hans Wennborg2018-08-211-1/+2
| | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340181 | abataev | 2018-08-20 18:00:22 +0200 (Mon, 20 Aug 2018) | 7 lines [OPENMP][BLOCKS]Fix PR38923: reference to a global variable is captured by a block. Added checks for capturing of the variable in the block when trying to emit correct address for the variable with the reference type. This extra check allows correctly identify the variables that are not captured in the block context. ------------------------------------------------------------------------ llvm-svn: 340352
* Merging r340191:Hans Wennborg2018-08-211-9/+11
| | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340191 | abataev | 2018-08-20 20:03:40 +0200 (Mon, 20 Aug 2018) | 6 lines [OPENMP] Fix crash on the emission of the weak function declaration. If the function is actually a weak reference, it should not be marked as deferred definition as this is only a declaration. Patch adds checks for the definitions if they must be emitted. Otherwise, only declaration is emitted. ------------------------------------------------------------------------ llvm-svn: 340351
* Merging r339372, r339373, r339374, and r339379Hans Wennborg2018-08-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339372 | steveire | 2018-08-09 22:05:03 +0200 (Thu, 09 Aug 2018) | 5 lines Add getBeginLoc API to replace getLocStart Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D50346 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r339373 | steveire | 2018-08-09 22:05:18 +0200 (Thu, 09 Aug 2018) | 7 lines Add getBeginLoc API to replace getStartLoc Reviewers: teemperor! Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D50347 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r339374 | steveire | 2018-08-09 22:05:47 +0200 (Thu, 09 Aug 2018) | 5 lines Add getEndLoc API to replace getLocEnd Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D50348 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r339379 | steveire | 2018-08-09 22:21:09 +0200 (Thu, 09 Aug 2018) | 1 line Fix build ------------------------------------------------------------------------ llvm-svn: 340332
* Merging r340048:Hans Wennborg2018-08-211-0/+21
| | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r340048 | nico | 2018-08-17 19:19:06 +0200 (Fri, 17 Aug 2018) | 10 lines Make __shiftleft128 / __shiftright128 real compiler built-ins. r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 ------------------------------------------------------------------------ llvm-svn: 340289
* Backport r339704 to 7.0 for PR38598Hans Wennborg2018-08-172-39/+11
| | | | | | | | | | | | | | | > Author: abataev > Date: Tue Aug 14 11:31:20 2018 > New Revision: 339704 > > URL: http://llvm.org/viewvc/llvm-project?rev=339704&view=rev > Log: > [OPENMP] Fix processing of declare target construct. > > The attribute marked as inheritable since OpenMP 5.0 supports it + > additional fixes to support new functionality. llvm-svn: 339998
* Merging r339603:Hans Wennborg2018-08-163-37/+96
| | | | | | | | | | | | | ------------------------------------------------------------------------ r339603 | abataev | 2018-08-13 21:04:24 +0200 (Mon, 13 Aug 2018) | 4 lines [OPENMP] Fix emission of the loop doacross constructs. The number of loops associated with the OpenMP loop constructs should not be considered as the number loops to collapse. ------------------------------------------------------------------------ llvm-svn: 339851
* Merging r339264:Hans Wennborg2018-08-131-8/+8
| | | | | | | | | | | | | | ------------------------------------------------------------------------ r339264 | rksimon | 2018-08-08 17:53:14 +0200 (Wed, 08 Aug 2018) | 5 lines [CGObjCGNU] Rename GetSelector helper method to fix -Woverloaded-virtual warning (PR38210) As suggested by @theraven on PR38210, this patch fixes the gcc -Woverloaded-virtual warnings by renaming the extra CGObjCGNU::GetSelector method to CGObjCGNU::GetTypedSelector Differential Revision: https://reviews.llvm.org/D50448 ------------------------------------------------------------------------ llvm-svn: 339555
* Merging r339428 and r339494:Hans Wennborg2018-08-135-85/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339428 | theraven | 2018-08-10 14:53:13 +0200 (Fri, 10 Aug 2018) | 18 lines Add Windows support for the GNUstep Objective-C ABI V2. Summary: Introduces funclet-based unwinding for Objective-C and fixes an issue where global blocks can't have their isa pointers initialised on Windows. After discussion with Dustin, this changes the name mangling of Objective-C types to prevent a C++ catch statement of type struct X* from catching an Objective-C object of type X*. Reviewers: rjmccall, DHowett-MSFT Reviewed By: rjmccall, DHowett-MSFT Subscribers: mgrang, mstorsjo, smeenai, cfe-commits Differential Revision: https://reviews.llvm.org/D50144 ------------------------------------------------------------------------ ------------------------------------------------------------------------ r339494 | dyung | 2018-08-11 04:46:47 +0200 (Sat, 11 Aug 2018) | 2 lines Make the section boundary checks on Windows not depend on the order as they are emitted in reverse when the compiler is built by Visual C++. ------------------------------------------------------------------------ llvm-svn: 339538
* Merging r339281:Hans Wennborg2018-08-091-0/+1
| | | | | | | | | | | | | | ------------------------------------------------------------------------ r339281 | ctopper | 2018-08-08 21:14:23 +0200 (Wed, 08 Aug 2018) | 5 lines [CodeGen][Timers] Enable llvm::TimePassesIsEnabled when -ftime-report is specified r330571 added a new FrontendTimesIsEnabled variable and replaced many usages of llvm::TimePassesIsEnabled. Including the place that set llvm::TimePassesIsEnabled for -ftime-report. The effect of this is that -ftime-report now only contains the timers specifically referenced in CodeGenAction.cpp and none of the timers in the backend. This commit adds back the assignment, but otherwise leaves everything else unchanged. ------------------------------------------------------------------------ llvm-svn: 339341
* Merging r339317:Hans Wennborg2018-08-091-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339317 | theraven | 2018-08-09 10:02:42 +0200 (Thu, 09 Aug 2018) | 15 lines Correctly initialise global blocks on Windows. Summary: Windows does not allow globals to be initialised to point to globals in another DLL. Exported globals may be referenced only from code. Work around this by creating an initialiser that runs in early library initialisation and sets the isa pointer. Reviewers: rjmccall Reviewed By: rjmccall Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D50436 ------------------------------------------------------------------------ llvm-svn: 339339
* Merging r339128:Hans Wennborg2018-08-081-34/+4
| | | | | | | | | | | | | | | | | ------------------------------------------------------------------------ r339128 | theraven | 2018-08-07 14:02:46 +0200 (Tue, 07 Aug 2018) | 8 lines [objc-gnustep] Don't emit .guess ivar offset vars. These were intended to allow non-fragile and fragile ABI code to be mixed, as long as the fragile classes were higher up the hierarchy than the non-fragile ones. Unfortunately: - No one actually wants to do this. - Recent versions of Linux's run-time linker break it. ------------------------------------------------------------------------ llvm-svn: 339233
* [OPENMP] Change linkage of offloading symbols to support droppingAlexey Bataev2018-07-311-2/+4
| | | | | | | | offload targets. Changed the linkage of omp_offloading.img_start.<triple> and omp_offloading.img_end.<triple> symbols from external to external weak to allow dropping of some targets during linking. llvm-svn: 338413
* [OPENMP] Prevent problems with linking of the static variables.Alexey Bataev2018-07-311-0/+13
| | | | | | No need to change the linkage, we can avoid the problem using special variable. That points to the original variable and, thus, prevent some of the optimizations that might break the compilation. llvm-svn: 338399
* Revert "Add a definition for FieldSize that seems to make sense here."Eric Christopher2018-07-301-1/+0
| | | | | | This reverts commit r338327, the problem was previously fixed in r338321. llvm-svn: 338328
* Add a definition for FieldSize that seems to make sense here.Eric Christopher2018-07-301-0/+1
| | | | | | This could be sunk out of the if statements, but fix the warning for now. llvm-svn: 338327
* Fix use of uninitialized variable in r338299Scott Linder2018-07-301-1/+1
| | | | llvm-svn: 338321
* [DebugInfo][OpenCL] Generate correct block literal debug info for OpenCLScott Linder2018-07-301-34/+48
| | | | | | | | | OpenCL block literal structs have different fields which are now correctly identified in the debug info. Differential Revision: https://reviews.llvm.org/D49930 llvm-svn: 338299
* Remove trailing spaceFangrui Song2018-07-3038-486/+486
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338291
* [clang][ubsan] Implicit Conversion Sanitizer - integer truncation - clang partRoman Lebedev2018-07-302-16/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: C and C++ are interesting languages. They are statically typed, but weakly. The implicit conversions are allowed. This is nice, allows to write code while balancing between getting drowned in everything being convertible, and nothing being convertible. As usual, this comes with a price: ``` unsigned char store = 0; bool consume(unsigned int val); void test(unsigned long val) { if (consume(val)) { // the 'val' is `unsigned long`, but `consume()` takes `unsigned int`. // If their bit widths are different on this platform, the implicit // truncation happens. And if that `unsigned long` had a value bigger // than UINT_MAX, then you may or may not have a bug. // Similarly, integer addition happens on `int`s, so `store` will // be promoted to an `int`, the sum calculated (0+768=768), // and the result demoted to `unsigned char`, and stored to `store`. // In this case, the `store` will still be 0. Again, not always intended. store = store + 768; // before addition, 'store' was promoted to int. } // But yes, sometimes this is intentional. // You can either make the conversion explicit (void)consume((unsigned int)val); // or mask the value so no bits will be *implicitly* lost. (void)consume((~((unsigned int)0)) & val); } ``` Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda noisy, since it warns on everything (unlike sanitizers, warning on an actual issues), and second, there are cases where it does **not** warn. So a Sanitizer is needed. I don't have any motivational numbers, but i know i had this kind of problem 10-20 times, and it was never easy to track down. The logic to detect whether an truncation has happened is pretty simple if you think about it - https://godbolt.org/g/NEzXbb - basically, just extend (using the new, not original!, signedness) the 'truncated' value back to it's original width, and equality-compare it with the original value. The most non-trivial thing here is the logic to detect whether this `ImplicitCastExpr` AST node is **actually** an implicit conversion, //or// part of an explicit cast. Because the explicit casts are modeled as an outer `ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children. https://godbolt.org/g/eE1GkJ Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`. So if that flag is **not** set, then it is an actual implicit conversion. As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`. There are potentially some more implicit conversions to be warned about. Namely, implicit conversions that result in sign change; implicit conversion between different floating point types, or between fp and an integer, when again, that conversion is lossy. One thing i know isn't handled is bitfields. This is a clang part. The compiler-rt part is D48959. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]]. Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]]. Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions) Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane Reviewed By: rsmith, vsk, erichkeane Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D48958 llvm-svn: 338288
* [ARM, AArch64]: Use unadjusted alignment when passing composites as argumentsMomchil Velikov2018-07-301-5/+14
| | | | | | | | | | | | | | | | | | The "Procedure Call Procedure Call Standard for the ARM® Architecture" (https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that composite types are passed according to their "natural alignment", i.e. the alignment before alignment adjustment on the entire composite is applied. The same applies for AArch64 ABI. Clang, however, used the adjusted alignment. GCC already implements the ABI correctly. With this patch Clang becomes compatible with GCC and passes such arguments in accordance with AAPCS. Differential Revision: https://reviews.llvm.org/D46013 llvm-svn: 338279
* [mips64][clang] Provide the signext attribute for i32 return valuesStefan Maksimovic2018-07-301-2/+8
| | | | | | | | Additional info: see r338019. Differential Revision: https://reviews.llvm.org/D49289 llvm-svn: 338239
* Revert r337456: [CodeGen] Disable aggressive structor optimizations at -O0, ↵Chandler Carruth2018-07-291-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | take 3 This commit increases the number of sections and overall output size of .o files by 10% and sometimes a bit more. This alone is challenging for some users, but it also appears to trigger an as-yet unexplained behavior in the Gold linker where the memory usage increases considerably more than 10% (we think). The increase is also frustrating because in many (if not all) cases we end up with almost all of the growth coming from the ELF overhead of -ffunction-sections and such, not from actual extra code being emitted. Richard Smith and Eric Christopher are both going to investigate this and try to get to the bottom of what is triggering this and whether the kinds of increases here are sustainable or what options we might have to minimize the impact they have. However, this is currently breaking a pretty large number of our users' builds so reverting it while we sort out how to make progress here. I've seen a longer and more detailed update to the commit thread. llvm-svn: 338209
* [UBSan] Strengthen pointer checks in 'new' expressionsSerge Pavlov2018-07-284-26/+69
| | | | | | | | | | | | | | | | | | | | With this change compiler generates alignment checks for wider range of types. Previously such checks were generated only for the record types with non-trivial default constructor. So the types like: struct alignas(32) S2 { int x; }; typedef __attribute__((ext_vector_type(2), aligned(32))) float float32x2_t; did not get checks when allocated by 'new' expression. This change also optimizes the checks generated for the arrays created in 'new' expressions. Previously the check was generated for each invocation of type constructor. Now the check is generated only once for entire array. Differential Revision: https://reviews.llvm.org/D49589 llvm-svn: 338199
* [CUDA][HIP] Allow function-scope static const variableYaxun Liu2018-07-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__ function, only __shared__ variables or variables without any device memory qualifiers may be declared with static storage class. It is unclear how a function-scope non-const static variable without device memory qualifier is implemented, therefore only static const variable without device memory qualifier is allowed, which can be emitted as a global variable in constant address space. Currently clang only allows function-scope static variable with __shared__ qualifier. This patch also allows function-scope static const variable without device memory qualifier and emits it as a global variable in constant address space. Differential Revision: https://reviews.llvm.org/D49931 llvm-svn: 338188
* [AST] Add a convenient getter from QualType to RecordDeclGeorge Karpenkov2018-07-282-5/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D49951 llvm-svn: 338187
* [ARM64] [Windows] Follow MS X86_64 C++ ABI when passing structsSanjin Sijaric2018-07-261-0/+1
| | | | | | | | | | | | | | | | | Summary: Microsoft's C++ object model for ARM64 is the same as that for X86_64. For example, small structs with non-trivial copy constructors or virtual function tables are passed indirectly. Currently, they are passed in registers when compiled with clang. Reviewers: rnk, mstorsjo, TomTan, haripul, javed.absar Reviewed By: rnk, mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D49770 llvm-svn: 338076
* [COFF, ARM64] Decide when to mark struct returns as SRetMandeep Singh Grang2018-07-262-1/+14
| | | | | | | | | | | | | | | | Summary: Refer the MS ARM64 ABI Convention for the behavior for struct returns: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions#return-values Reviewers: mstorsjo, compnerd, rnk, javed.absar, yinma, efriedma Reviewed By: rnk, efriedma Subscribers: haripul, TomTan, yinma, efriedma, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49464 llvm-svn: 338050
* [RISCV] Add support for interrupt attributeAna Pazos2018-07-261-0/+21
| | | | | | | | | | | | | | | | | | | Summary: Clang supports the GNU style ``__attribute__((interrupt))`` attribute on RISCV targets. Permissible values for this parameter are user, supervisor, and machine. If there is no parameter, then it defaults to machine. Reference: https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Function-Attributes.html Based on initial patch by Zhaoshi Zheng. Reviewers: asb, aaron.ballman Reviewed By: asb, aaron.ballman Subscribers: rkruppe, the_o, aaron.ballman, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, cfe-commits Differential Revision: https://reviews.llvm.org/D48412 llvm-svn: 338045
* [CodeGen][ObjC] Make block copy/dispose helper functions exception-safe.Akira Hatanaka2018-07-263-61/+103
| | | | | | | | | | | | | When an exception is thrown in a block copy helper function, captured objects that have previously been copied should be destructed or released. Similarly, captured objects that are yet to be released should be released when an exception is thrown in a dispose helper function. rdar://problem/42410255 Differential Revision: https://reviews.llvm.org/D49718 llvm-svn: 338041
* [OPENMP] ThreadId in serialized parallel regions is 0.Alexey Bataev2018-07-252-9/+16
| | | | | | | | The first argument for the parallel outlined functions, called as serialized parallel regions, should be a pointer to the global thread id that always is 0. llvm-svn: 337957
* CodeGen: use non-zero memset when possible for automatic variablesJF Bastien2018-07-251-26/+141
| | | | | | | | | | | | | | | | | Summary: Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows. This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now. <rdar://problem/42563091> Reviewers: dexonsmith Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D49771 llvm-svn: 337887
* Revert "[DebugInfo] Generate debug information for labels. (Fix PR37395)"Shiva Chen2018-07-243-39/+0
| | | | | | This reverts commit 4288dd3bf082482e02c8a044c611c18168cb0180. llvm-svn: 337803
* [DebugInfo] Generate debug information for labels. (Fix PR37395)Shiva Chen2018-07-243-0/+39
| | | | | | | | | | | | | Generate DILabel metadata and call llvm.dbg.label after label statement to associate the metadata with the label. After fixing PR37395. Differential Revision: https://reviews.llvm.org/D45045 Patch by Hsiangkai Wang. llvm-svn: 337800
* Borrow visibility from __fundamental_type_info for generated fundamental ↵Thomas Anderson2018-07-241-60/+67
| | | | | | | | | | | | type infos This is necessary so the clang gives hidden visibility to fundamental types when -fvisibility=hidden is passed. Fixes https://bugs.llvm.org/show_bug.cgi?id=35066 Differential Revision: https://reviews.llvm.org/D49109 llvm-svn: 337788
* Support lifetime-extension of conditional temporaries.Richard Smith2018-07-233-109/+148
| | | | llvm-svn: 337767
* [CodeGen] Record if a C++ record is a trivial typeAaron Smith2018-07-231-0/+4
| | | | | | | | | | | | | | Summary: This has a dependence on D45122 Reviewers: rnk, zturner, llvm-commits, aleksandr.urakov Reviewed By: rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D45124 llvm-svn: 337736
* [NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsicsIvan A. Kosarev2018-07-231-6/+13
| | | | | | | | | | This patch adds support for vrndi_f32() and vrndiq_f32() intrinsics in AArch32 mode and for vrndns_f32() intrinsic in AArch64 mode. Differential Revision: https://reviews.llvm.org/D48829 llvm-svn: 337690
* [HIP] Support -fcuda-flush-denormals-to-zero for amdgcnYaxun Liu2018-07-212-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D48287 llvm-svn: 337639
* [NFC] CodeGen: rename memset to bzeroJF Bastien2018-07-201-30/+27
| | | | | | The optimization looks for opportunities to emit bzero, not memset. Rename the functions accordingly (and clang-format the diff) because I want to add a fallback optimization which actually tries to generate memset. bzero is still better and it would confuse the code to merge both. llvm-svn: 337636
* [HIP] Register/unregister device fat binary only onceYaxun Liu2018-07-201-17/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631
* [codeview] Don't emit variable templates as class membersReid Kleckner2018-07-201-5/+10
| | | | | | | | | | | | | MSVC doesn't, so neither should we. Fixes PR38004, which is a crash that happens when we try to emit debug info for a still-dependent partial variable template specialization. As a follow-up, we should review what we're doing for function and class member templates. It looks like we don't filter those out, but I can't seem to get clang to emit any. llvm-svn: 337616
* [CodeGen][ObjC] Make copying and disposing of a non-escaping blockAkira Hatanaka2018-07-202-5/+20
| | | | | | | | | | | | | | | | | | | | | | no-ops. A non-escaping block on the stack will never be called after its lifetime ends, so it doesn't have to be copied to the heap. To prevent a non-escaping block from being copied to the heap, this patch sets field 'isa' of the block object to NSConcreteGlobalBlock and sets the BLOCK_IS_GLOBAL bit of field 'flags', which causes the runtime to treat the block as if it were a global block (calling _Block_copy on the block just returns the original block and calling _Block_release is a no-op). Also, a new flag bit 'BLOCK_IS_NOESCAPE' is added, which allows the runtime or tools to distinguish between true global blocks and non-escaping blocks. rdar://problem/39352313 Differential Revision: https://reviews.llvm.org/D49303 llvm-svn: 337580
* Implement cpu_dispatch/cpu_specific MultiversioningErich Keane2018-07-205-46/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
* Change \t to spacesFangrui Song2018-07-202-5/+5
| | | | llvm-svn: 337530
* Fix typo causing assert in self-host.Richard Smith2018-07-191-1/+1
| | | | llvm-svn: 337508
* When we choose to use zeroinitializer for a trailing portion of an arrayRichard Smith2018-07-191-1/+13
| | | | | | | | | | | constant, don't convert the rest into a packed struct. If an array constant has a large non-zero portion and a large zero portion, we want to emit the first part as an array and the rest as a zeroinitializer if possible. This fixes a memory usage regression from r333141 when compiling PHP. llvm-svn: 337498
* fix typo in commentNico Weber2018-07-191-1/+1
| | | | llvm-svn: 337480
* Fix unused variable warning.Erich Keane2018-07-191-1/+1
| | | | llvm-svn: 337473
OpenPOWER on IntegriCloud