summaryrefslogtreecommitdiffstats
path: root/clang/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [Power9] [CLANG] Add __float128 support for trunc to double round to oddStefan Pintilie2018-07-091-0/+5
| | | | | | | | | Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48482 llvm-svn: 336596
* [Builtins][Attributes][X86] Tag all X86 builtins with their required vector ↵Craig Topper2018-07-092-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583
* [Power9] Add __float128 builtins for Round To OddStefan Pintilie2018-07-091-0/+45
| | | | | | | | | Add a number of builtins for __float128 Round To Odd. This is the Clang portion of the builtins work. Differential Revision: https://reviews.llvm.org/D47548 llvm-svn: 336579
* [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.Craig Topper2018-07-081-32/+268
| | | | | | This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507
* [X86] Fix a few intrinsics that were ignoring their rounding mode argument ↵Craig Topper2018-07-071-44/+44
| | | | | | | | | | and hardcoded _MM_FROUND_CUR_DIRECTION internally. I believe these have been broken since their introduction into clang. I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name. llvm-svn: 336498
* [X86] Fix various type mismatches in intrinsic headers and intrinsic tests ↵Craig Topper2018-07-074-32/+32
| | | | | | | | that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487
* [X86] When creating a select for scalar masked sqrt and div builtins make ↵Craig Topper2018-07-061-94/+90
| | | | | | | | | | sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472
* [X86] Add missing scalar fma intrinsics with rounding, but no mask.Craig Topper2018-07-061-24/+72
| | | | | | | | We had the mask versions of the rounding intrinsics, but not one without masking. Also change the rounding tests to not use the CUR_DIRECTION rounding mode. llvm-svn: 336470
* [X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with ↵Craig Topper2018-07-061-8/+40
| | | | | | | | native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417
* [ms] Fix mangling of string literals used to initialize arrays larger or ↵Hans Wennborg2018-07-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | smaller than the literal A Chromium developer reported a bug which turned out to be a mangling collision between these two literals: char s[] = "foo"; char t[32] = "foo"; They may look the same, but for the initialization of t we will (under some circumstances) use a literal that's extended with zeros, and both the length and those zeros should be accounted for by the mangling. This actually makes the mangling code simpler: where it previously had special logic for null terminators, which are not part of the StringLiteral, that is now covered by the general algorithm. (The problem was reported at https://crbug.com/857442) Differential Revision: https://reviews.llvm.org/D48928 llvm-svn: 336415
* [X86] Use shufflevector instead of a select with a constant mask for ↵Craig Topper2018-07-054-208/+208
| | | | | | | | | | fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388
* [X86] Fix some vector cmp builtins - TRUE/FALSE predicatesGabor Buella2018-07-053-116/+88
| | | | | | | | | | | | | | | | | This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355
* [X86] NFC - add more test cases for vector cmp intrinsicsGabor Buella2018-07-053-222/+3152
| | | | | | | | | | | | | | | | | | | | | | | | | | Add test cases with each predicate using the following intrinsics: _mm_cmp_pd _mm_cmp_ps _mm256_cmp_pd _mm256_cmp_ps _mm_cmp_pd_mask _mm_cmp_ps_mask _mm256_cmp_pd_mask _mm256_cmp_ps_mask _mm512_cmp_pd_mask _mm512_cmp_ps_mask _mm_mask_cmp_pd_mask _mm_mask_cmp_ps_mask _mm256_mask_cmp_pd_mask _mm256_mask_cmp_ps_mask _mm512_mask_cmp_pd_mask _mm512_mask_cmp_ps_mask Some of these are marked with FIXME, as there is bug in lowering e.g. _mm512_mask_cmp_ps_mask. llvm-svn: 336346
* [Power9] Update fp128 as a valid homogenous aggregate base typeLei Huang2018-07-051-0/+124
| | | | | | | | | Update clang to treat fp128 as a valid base type for homogeneous aggregate passing and returning. Differential Revision: https://reviews.llvm.org/D48044 llvm-svn: 336308
* NFC - Fix type in builtins-ppc-p9vector.c testGabor Buella2018-07-041-1/+1
| | | | llvm-svn: 336264
* NFC - typo fix in test/CodeGen/avx512f-builtins.cGabor Buella2018-07-041-2/+2
| | | | llvm-svn: 336243
* Add expected fail triple x86_64-pc-windows-gnu to test as x86_64-w64-mingw32 ↵Yaron Keren2018-06-301-1/+1
| | | | | | is already there llvm-svn: 336047
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-303-91/+89
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042
* [X86] Remove masking from the avx512 rotate builtins. Use a select builtin ↵Craig Topper2018-06-302-72/+120
| | | | | | instead. llvm-svn: 336036
* [X86] Remove masking from the avx512 packed sqrt builtins. Use select ↵Craig Topper2018-06-291-12/+12
| | | | | | builtins instead. llvm-svn: 335945
* [NEON] Remove empty test file from r335734Francis Visoiu Mistrih2018-06-271-0/+0
| | | | | | | | | | | Fails on Green Dragon: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50174/consoleFull UNRESOLVED: Clang :: CodeGen/vld_dup.c (5546 of 38947) ******************** TEST 'Clang :: CodeGen/vld_dup.c' FAILED ******************** Test has no run line! llvm-svn: 335750
* [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the ↵Craig Topper2018-06-272-12/+12
| | | | | | name to match llvm. llvm-svn: 335745
* [NEON] Support vldNq intrinsics in AArch32 (Clang part)Ivan A. Kosarev2018-06-274-1539/+1087
| | | | | | | | | This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734
* [ThinLTO] Add testing of summary index parsing to a couple CFI testsTeresa Johnson2018-06-262-0/+4
| | | | | | | | | | | | | | | Summary: Changes to some clang side tests to go with the summary parsing patch. Depends on D47905. Reviewers: pcc, dexonsmith, mehdi_amini Subscribers: inglorion, eraman, cfe-commits, steven_wu Differential Revision: https://reviews.llvm.org/D47906 llvm-svn: 335618
* [WebAssembly] Add no-prototype attribute to prototype-less C functionsSam Clegg2018-06-251-0/+20
| | | | | | | | | | The WebAssembly backend in particular benefits from being able to distinguish between varargs functions (...) and prototype-less C functions. Differential Revision: https://reviews.llvm.org/D48443 llvm-svn: 335510
* [clang-cl] Don't emit dllexport inline functions etc. from pch files (PR37801)Hans Wennborg2018-06-251-0/+84
| | | | | | | | | | | | | | | | | | | | | | | With MSVC, PCH files are created along with an object file that needs to be linked into the final library or executable. That object file contains the code generated when building the headers. In particular, it will include definitions of inline dllexport functions, and because they are emitted in this object file, other files using the PCH do not need to emit them. See the bug for an example. This patch makes clang-cl match MSVC's behaviour in this regard, causing significant compile-time savings when building dlls using precompiled headers. For example, in a 64-bit optimized shared library build of Chromium with PCH, it reduces the binary size and compile time of stroke_opacity_custom.obj from 9315564 bytes to 3659629 bytes and 14.6 to 6.63 s. The wall-clock time of building blink_core.dll goes from 38m41s to 22m33s. ("user" time goes from 1979m to 1142m). Differential Revision: https://reviews.llvm.org/D48426 llvm-svn: 335466
* Re-land "[LTO] Enable module summary emission by default for regular LTO"Tobias Edler von Koch2018-06-222-0/+18
| | | | | | | | | | | | Since we are now producing a summary also for regular LTO builds, we need to run the NameAnonGlobals pass in those cases as well (the summary cannot handle anonymous globals). See https://reviews.llvm.org/D34156 for details on the original change. This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3. llvm-svn: 335385
* [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRGabor Buella2018-06-225-112/+321
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339
* [x86] Teach the builtin argument range check to allow invalid ranges inChandler Carruth2018-06-216-112/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dead code. This is important for C++ templates that essentially compute the valid input in a way that is constant and will cause all the invalid cases to be dead code that is deleted. Code in the wild actually does this and GCC also accepts these kinds of patterns so it is important to support it. To make this work, we provide a non-error path to diagnose these issues, and use a default-error warning instead. This keeps the relatively strict handling but prevents nastiness like SFINAE on these errors. It also allows us to safely use the system to diagnose this only when it occurs at runtime (in emitted code). Entertainingly, this required fixing the syntax in various other ways for the x86 test because we never bothered to diagnose that the returns were invalid. Since debugging these compile failures was super confusing, I've also improved the diagnostic to actually say what the value was. Most of the checks I've made ignore this to simplify maintenance, but I've checked it in a few places to make sure the diagnsotic is working. Depends on D48462. Without that, we might actually crash some part of the compiler after bypassing the error here. Thanks to Richard, Ben Kramer, and especially Craig Topper for all the help here. Differential Revision: https://reviews.llvm.org/D48464 llvm-svn: 335309
* [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.Craig Topper2018-06-214-19/+19
| | | | | | | | D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308
* Ignore blacklist when generating __cfi_check_fail.Evgeniy Stepanov2018-06-211-0/+6
| | | | | | | | | | | | Summary: Fixes PR37898. Reviewers: pcc, vlad.tsyrklevich Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D48454 llvm-svn: 335305
* Revert "[LTO] Enable module summary emission by default for regular LTO"Tobias Edler von Koch2018-06-211-17/+0
| | | | | | | | | | | This is breaking a couple of buildbots. We need to run the NameAnonGlobal pass for regular LTO now as well (since we're producing a summary). I'll post a separate patch for review to make this happen and then re-commit. This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd. llvm-svn: 335291
* [LTO] Enable module summary emission by default for regular LTOTobias Edler von Koch2018-06-211-0/+17
| | | | | | | | | | | | | | | | | | | Summary: With D33921, we gained the ability to have module summaries in regular LTO modules without triggering ThinLTO compilation. Module summaries in regular LTO allow garbage collection (dead stripping) before LTO compilation and thus open up additional optimization opportunities. This patch enables summary emission in regular LTO for all targets except ld64-based ones (which use the legacy LTO API). Reviewers: pcc, tejohnson, mehdi_amini Subscribers: inglorion, eraman, cfe-commits Differential Revision: https://reviews.llvm.org/D34156 llvm-svn: 335284
* [X86] Correct the inline assembly implementations of __movsb/w/d/q and ↵Craig Topper2018-06-211-0/+83
| | | | | | | | | | | | __stosw/d/q to mark registers/memory as modified The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified. This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed. Differential Revision: https://reviews.llvm.org/D48448 llvm-svn: 335270
* [DebugInfo] Inline for without DebugLocationAnastasis Grammenos2018-06-211-0/+13
| | | | | | | | | | | | | Summary: This test is a strip down version of a function inside the amalgamated sqlite source. When converted to IR clang produces a phi instruction without debug location. This patch fixes the above issue. Differential Revision: https://reviews.llvm.org/D47720 llvm-svn: 335255
* [X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of ↵Craig Topper2018-06-211-312/+306
| | | | | | | | | | | | other intrinsics and remove undef shuffle indices. Similar to what was done to max/min recently. These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code. Differential Revision: https://reviews.llvm.org/D48346 llvm-svn: 335253
* [X86] Remove masking from the 512-bit floating point max/min builtins. Use ↵Craig Topper2018-06-211-20/+35
| | | | | | select in IR instead. llvm-svn: 335200
* [SPIR] Prevent SPIR targets from using half conversion intrinsicsSjoerd Meijer2018-06-201-0/+146
| | | | | | | | | | | | | | | | | | | | | The SPIR target currently allows for half precision floating point types to be emitted using the LLVM intrinsic functions which convert half types to floats and doubles. However, this is illegal in SPIR as the only intrinsic allowed by SPIR is memcpy, as per section 3 of the SPIR specification. Currently this is leading to an assert being hit in the Clang CodeGen when attempting to emit a constant or literal _Float16 type in a comparison operation on a SPIR or SPIR64 target. This assert stems from the CodeGen attempting to emit a constant half value as an integer because the backend has specified that it is using these half conversion intrinsics (which represents half as i16). This patch prevents SPIR targets from using these intrinsics by overloading the responsible target info method, marks SPIR targets as having a legal half type and provides additional regression testing for the _Float16 type on SPIR targets. Patch by: Stephen McGroarty Differential Revision: https://reviews.llvm.org/D48188 llvm-svn: 335111
* Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make ↵Craig Topper2018-06-191-2287/+2407
| | | | | | | | better use of other functions and to reduce width to 256 and 128 bits were possible."" Test has been updated to reflect the IRGen. llvm-svn: 335075
* Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make ↵Craig Topper2018-06-191-2598/+2285
| | | | | | | | better use of other functions and to reduce width to 256 and 128 bits were possible." The test changes are failing the buildbot and its going to take me some time to fix it. llvm-svn: 335072
* [X86] Rewrite the max and min reduction intrinsics to make better use of ↵Craig Topper2018-06-191-2285/+2598
| | | | | | | | | | | | | | other functions and to reduce width to 256 and 128 bits were possible. We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 335070
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-155-40/+120
| | | | | | | | | | | | Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850
* [AArch64] Reverted rC334696 with Clang VCVTA test fixLuke Geeson2018-06-152-0/+14
| | | | llvm-svn: 334820
* [X86] Add inline assembly versions of ↵Craig Topper2018-06-142-1/+50
| | | | | | | | | | _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility. Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix. Differential Revision: https://reviews.llvm.org/D47672 llvm-svn: 334751
* [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)Tomasz Krupa2018-06-141-16/+148
| | | | | | | | | | | | | | | Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741
* [AArch64] reverting rC334693 due to build failuresLuke Geeson2018-06-142-14/+0
| | | | llvm-svn: 334696
* [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-ALuke Geeson2018-06-142-0/+14
| | | | llvm-svn: 334693
* [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevlMandeep Singh Grang2018-06-131-0/+35
| | | | | | | | | | | | | | Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639
* [CodeGen] make nan builtins pure rather than const (PR37778)Sanjay Patel2018-06-131-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=37778 ...shows a miscompile resulting from marking nan builtins as 'const'. The nan libcalls/builtins take a pointer argument: http://www.cplusplus.com/reference/cmath/nan-function/ ...and the chars dereferenced by that arg are used to fill in the NaN constant payload bits. "const" means that the pointer argument isn't dereferenced. That's translated to "readnone" in LLVM. "pure" means that the pointer argument may be dereferenced. That's translated to "readonly" in LLVM. This change prevents the IR optimizer from killing the lead-up to the nan call here: double a() { char buf[4]; buf[0] = buf[1] = buf[2] = '9'; buf[3] = '\0'; return __builtin_nan(buf); } ...the optimizer isn't currently able to simplify this to a constant as we might hope, but this patch should solve the miscompile. Differential Revision: https://reviews.llvm.org/D48134 llvm-svn: 334628
* [X86] Remove masking from avx512vbmi2 concat and shift by immediate ↵Craig Topper2018-06-132-54/+90
| | | | | | builtins. Use select builtins instead. llvm-svn: 334577
OpenPOWER on IntegriCloud