summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64][SVE] Asm: Support for unpredicated FP operations.Sander de Smalen2018-07-1814-3/+243
| | | | | | | | | | | | | | | | | | This patch adds support for the following unpredicated floating-point instructions: FADD Floating point add FSUB Floating point subtract FMUL Floating point multiplication FTSMUL Floating point trigonometric starting value FRECPS Floating point reciprocal step FRSQRTS Floating point reciprocal square root step The instructions have the following assembly format: fadd z0.h, z1.h, z2.h and have variants for 16, 32 and 64-bit FP elements. llvm-svn: 337383
* [ELF] - Stop silently producing a broken .eh_frame_hdr.George Rimar2018-07-182-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, getFdePC() returns uint64_t. Its because the following encodings might use 8 bytes: DW_EH_PE_absptr and DW_EH_PE_udata8. But caller assigns returned value to uint32_t field: https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L508 Value is used for building .eh_frame_hdr section. We use DW_EH_PE_sdata4 encoding for building it at this moment: https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L2545 And that means that an overflow issue might happen if DW_EH_PE_absptr/DW_EH_PE_udata8 address encodings are present in .eh_frame. In that case, before this patch, we silently would truncate the address and produced broken .eh_frame_hdr section. It would be not hard to support real 64-bit values for DW_EH_PE_absptr/DW_EH_PE_udata8 encodings, but it is unclear if it is usefull and if we should do it. Since nobody faced/reported it, int this patch I only implement a check to stop producing broken output silently for now. llvm-svn: 337382
* Mention clang-cl improvements from r335466 and r336379 in ReleaseNotes.rstNico Weber2018-07-181-1/+13
| | | | llvm-svn: 337381
* [TargetInstPredicate] Add definition of CheckInvalidRegisterOperand.Andrea Di Biagio2018-07-181-0/+3
| | | | | | | This should have been part of r337378. I forgot to svn add it before committing the change. llvm-svn: 337380
* [NFC] Make a test more neatMax Kazantsev2018-07-181-2/+5
| | | | llvm-svn: 337379
* [Tablegen][PredicateExpander] Add the ability to define checks for invalid ↵Andrea Di Biagio2018-07-182-0/+10
| | | | | | | | registers. This was discussed in review D49436. llvm-svn: 337378
* [ELF] - Add a test case to check DW_EH_PE_absptr address encoding.George Rimar2018-07-181-0/+74
| | | | | | | This covers the following line of the code: https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L525 llvm-svn: 337377
* [InstCombine] Re-commit: Fold 'check for [no] signed truncation' patternRoman Lebedev2018-07-183-28/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` This is a re-commit, as the original patch, committed in rL337190 was reverted in rL337344 as it broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337376
* [X86][SSE] Add extra scalar fop + blend tests for commuted inputsSimon Pilgrim2018-07-181-24/+208
| | | | | | While working on PR38197, I noticed that we don't make use of FADD/FMUL being able to commute the inputs to support the addps+movss -> addss style combine llvm-svn: 337375
* [ELF] - Improve eh-frame-value-format7.s test case.George Rimar2018-07-181-2/+23
| | | | | | | This adds .eh_frame_hdr content checking to test that DW_EH_PE_udata2 address was decoded correctly. llvm-svn: 337374
* Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64"Daniel Cederman2018-07-182-10/+1
| | | | | | | | | This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1. I missed that this patch has a dependency on https://reviews.llvm.org/D49219 that has not been approved yet. llvm-svn: 337373
* [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.Sander de Smalen2018-07-186-0/+252
| | | | | | | | | | | | | | | | | | | | | | | | | The signed/unsigned DOT instructions perform a dot-product on quadtuplets from two source vectors and accumulate the result in the destination register. The instructions come in two forms: Vector form, e.g. sdot z0.s, z1.b, z2.b - signed dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h - unsigned dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. Indexed form, e.g. sdot z0.s, z1.b, z2.b[3] - signed dot product on four 8-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h[1] - dot product on four 16-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 64-bit elements. llvm-svn: 337372
* [llvm-objdump] - An attempt to fix BB after r337361.George Rimar2018-07-181-2/+2
| | | | | | | | | | | Seems r337361 is the reason of the following ARM BB failures: http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick http://lab.llvm.org:8011/builders/clang-cmake-armv8-full/builds/4633 Reason is unclear to me, other bots are OK. If this will not help, I'll revert r337361. llvm-svn: 337371
* [Sparc] Use the IntPair reg class for r constraints with value type f64Daniel Cederman2018-07-182-1/+10
| | | | | | | | | | | | | | | Summary: This is how it appears to be handled in GCC and it prevents a "Unknown mismatch" error in the SelectionDAGBuilder. Reviewers: venkatra, jyknight, jrtc27 Reviewed By: jyknight, jrtc27 Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49218 llvm-svn: 337370
* [AArch64][SVE] Asm: Integer divide instructions.Sander de Smalen2018-07-1810-0/+223
| | | | | | | | | | | | | | | | | | This patch adds the following predicated instructions: UDIV Unsigned divide active elements UDIVR Unsigned divide active elements, reverse form. SDIV Signed divide active elements SDIVR Signed divide active elements, reverse form. e.g. udiv z0.s, p0/m, z0.s, z1.s (unsigned divide active elements in z0 by z1, store result in z0) sdivr z0.s, p0/m, z0.s, z1.s (signed divide active elements in z1 by z0, store result in z0) llvm-svn: 337369
* Fix -Wdocumentation warning. NFCI.Simon Pilgrim2018-07-181-1/+1
| | | | llvm-svn: 337368
* Fix -Wdocumentation warning. NFCI.Simon Pilgrim2018-07-181-1/+1
| | | | llvm-svn: 337367
* [CMake] Export the LLVM_LINK_LLVM_DYLIB settingPhilip Pfaffe2018-07-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When building out-of-tree tools, there are several macros available to automate linking against llvm. An examples is `add_llvm_executable`, or the clang variant of this. These macros use the LLVM_LINK_LLVM_DYLIB option to decide whether to link against libraries defined by setting LLVM_LINK_COMPONENTS or to link against libLLVM instead. Currently this is problematic in out-of-tree targets, because they cannot identify whether this option is required or even available. If the option was enabled in LLVM's own build, the clang libraries are built against libLLVM, so a client linking against those must link against it too. On the other hand the client can't just always link against it, because it might not be available. This is related to D44391, but that change assumed the client knew whether they wanted the dylib or not. Reviewers: mgorny, beanz, labath Reviewed By: mgorny Subscribers: bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D49193 llvm-svn: 337366
* [ELF] - Improve relocatable-many-sections.s test case. NFC.George Rimar2018-07-181-0/+7
| | | | | | This adds a check for .shstrtab section index. llvm-svn: 337365
* [NFC][InstCombine] i65 tests for 'check for [no] signed truncation' patternRoman Lebedev2018-07-182-0/+28
| | | | | | | | Those initially broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337364
* [ELF] - Do not produce broken output when amount of sections is > ~65kGeorge Rimar2018-07-182-3/+108
| | | | | | | | | | | | | | | | | | | | | | | | This is a part of ttps://bugs.llvm.org//show_bug.cgi?id=38119 We produce broken ELF header now when the number of output sections is >= SHN_LORESERVE (0xff00). ELF spec says (http://www.sco.com/developers/gabi/2003-12-17/ch4.eheader.html): e_shnum: If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), this member has the value zero and the actual number of section header table entries is contained in the sh_size field of the section header at index 0. (Otherwise, the sh_size member of the initial entry contains 0.) e_shstrndx If the section name string table section index is greater than or equal to SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX (0xffff) and the actual index of the section name string table section is contained in the sh_link field of the section header at index 0. (Otherwise, the sh_link member of the initial entry contains 0.) We did not set these fields correctly earlier. The patch fixes the issue. Differential revision: https://reviews.llvm.org/D49371 llvm-svn: 337363
* [ELF] — Add a test case for DW_EH_PE_udata2 encoding.George Rimar2018-07-181-0/+54
| | | | | | | | This adds a test to check LLD can handle such address format correctly. Test case covers the following line: https://github.com/llvm-mirror/lld/blob/master/ELF/SyntheticSections.cpp#L519 llvm-svn: 337362
* [llvm-objdump] - Stop reporting bogus section IDs.George Rimar2018-07-182-4/+28
| | | | | | | | | | | | | | | | Imagine we have a file with few sections, and one of them is .foo with index N != 0. Problem is that when llvm-objdump is given a -section=.foo parameter it lists .foo as a section at index 0. That makes impossible to write test cases which needs to find the index of the particular section, while ignoring dumping of others. The patch fixes that. Differential revision: https://reviews.llvm.org/D49372 llvm-svn: 337361
* [llvm-readobj] - Teach tool to dump objects with >= SHN_LORESERVE of sections.George Rimar2018-07-184-4/+64
| | | | | | | | | | | | | | | | | | http://www.sco.com/developers/gabi/2003-12-17/ch4.eheader.html says that e_shnum and/or e_shstrndx may have special values if "the number of sections is greater than or equal to SHN_LORESERVE" or "the section name string table section index is greater than or equal to SHN_LORESERVE (0xff00)" Previously llvm-readobj was unable to dump such files, patch changes that. I had to add a precompiled test case because it does not seem possible to prepare a test using yaml2obj or llvm-mc (not clear how to make .shstrtab to have index >= SHN_LORESERVE). Differential revision: https://reviews.llvm.org/D49369 llvm-svn: 337360
* Revert test changes part of "Revert "[InstCombine] Fold 'check for [no] ↵Roman Lebedev2018-07-182-108/+108
| | | | | | | | | | | signed truncation' pattern"" We want the test to remain good anyway. I think the fix is incoming. This reverts part of commit rL337344. llvm-svn: 337359
* [AArch64][SVE] Asm: Support for integer MUL instructions.Sander de Smalen2018-07-188-8/+246
| | | | | | | | | | | | | | | | This patch adds the following instructions: MUL - multiply vectors, e.g. mul z0.h, p0/m, z0.h, z1.h - multiply with immediate, e.g. mul z0.h, z0.h, #127 SMULH - signed multiply returning high half, e.g. smulh z0.h, p0/m, z0.h, z1.h UMULH - unsigned multiply returning high half, e.g. umulh z0.h, p0/m, z0.h, z1.h llvm-svn: 337358
* [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by ↵Craig Topper2018-07-185-30/+51
| | | | | | | | using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357
* [X86] Add test case for missed opportunity to commute vunpckhpd to enable ↵Craig Topper2018-07-181-0/+16
| | | | | | | | use of vmovlps to fold a load. We do this transform for SSE, but not AVX or AVX512VL. llvm-svn: 337356
* [libomptarget] Also support several images for elfJoachim Protze2018-07-181-4/+5
| | | | | | | | | | | | | | | | | | | | In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355
* [X86] Regenerate fma.ll checks using current version of the script which ↵Craig Topper2018-07-181-348/+348
| | | | | | produces different regular expressions on spills and reloads. NFC llvm-svn: 337354
* [modules] Print input files when -module-file-info file switch is passed.Vassil Vassilev2018-07-182-0/+49
| | | | | | | | | This patch improves traceability of duplicated header files which end up in multiple pcms. Differential Revision: https://reviews.llvm.org/D47118 llvm-svn: 337353
* [AArch64] Define TARGET_HEADER_BUILTINMartin Storsjo2018-07-181-0/+2
| | | | | | | Without it, the new intrinsics became available for all language variants. This was missed in SVN r337327. llvm-svn: 337352
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-07-182-4/+4
| | | | llvm-svn: 337351
* Fix build failures from r337347, found by clangJustin Hibbits2018-07-183-15/+6
| | | | | | | | | | | * Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350
* [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.Craig Topper2018-07-182-33/+0
| | | | | | The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349
* [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of ↵Craig Topper2018-07-183-5/+18
| | | | | | | | | | | | X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2. I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts. I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code. We already support domain switching for UNPCKLPD and MOVLHPS. llvm-svn: 337348
* Introduce codegen for the Signal Processing EngineJustin Hibbits2018-07-1823-624/+1922
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
* Complete the SPE instruction set patternsJustin Hibbits2018-07-188-230/+1232
| | | | | | | | | This is the lead-up to having SPE codegen. Add the rest of the instructions, along with MC tests. Differential Revision: https://reviews.llvm.org/D44829 llvm-svn: 337346
* Add PowerPC e500(v2) core scheduler and directives.Justin Hibbits2018-07-187-220/+497
| | | | | | Differential Revision: https://reviews.llvm.org/D44828 llvm-svn: 337345
* Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"Bob Haarman2018-07-183-179/+116
| | | | | | | | | This reverts r337190 (and a few follow-up commits), which caused the Chromium build to fail. See https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337344
* [XRay][compiler-rt] Segmented Array: Simplify and OptimiseDean Michael Berris2018-07-185-232/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is a follow-on to D49217 which simplifies and optimises the implementation of the segmented array. In this patch we co-locate the book-keeping for segments in the `__xray::Array<T>` with the data it's managing. We take the chance in this patch to actually rename `Chunk` to `Segment` to better align with the high-level description of the segmented array. With measurements using benchmarks landed in D48879, we've identified that calls to `pthread_getspecific` started dominating the cycles, which led us to revert the change made in D49217 to use C++ thread_local initialisation instead (it reduces the cost by a huge margin, since we save one PLT-based call to pthread functions in the hot path). In particular, this is in `__xray::getThreadLocalData()`. We also took the opportunity to remove the least-common-multiple based calculation and instead pack as much data into segments of the array. This greatly simplifies the API of the container which hides as much of the implementation details as possible. For instance, we calculate the number of elements we need for the each segment internally in the Array instead of making it part of the type. With the changes here, we're able to get a measurable improvement on the performance of profiling mode on top of what D48879 already provides. Depends on D48879. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49363 llvm-svn: 337343
* [XRay][compiler-rt] Simplify Allocator ImplementationDean Michael Berris2018-07-1810-216/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This change simplifies the XRay Allocator implementation to self-manage an mmap'ed memory segment instead of using the internal allocator implementation in sanitizer_common. We've found through benchmarks and profiling these benchmarks in D48879 that using the internal allocator in sanitizer_common introduces a bottleneck on allocating memory through a central spinlock. This change allows thread-local allocators to eliminate contention on the centralized allocator. To get the most benefit from this approach, we also use a managed allocator for the chunk elements used by the segmented array implementation. This gives us the chance to amortize the cost of allocating memory when creating these internal segmented array data structures. We also took the opportunity to remove the preallocation argument from the allocator API, simplifying the usage of the allocator throughout the profiling implementation. In this change we also tweak some of the flag values to reduce the amount of maximum memory we use/need for each thread, when requesting memory through mmap. Depends on D48956. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49217 llvm-svn: 337342
* [XRay][compiler-rt] FDR Mode: Allow multiple runsDean Michael Berris2018-07-182-33/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fix a bug in FDR mode which didn't allow for re-initialising the logging in the same process. This change ensures that: - When we flush the FDR mode logging, that the state of the logging implementation is `XRAY_LOG_UNINITIALIZED`. - Fix up the thread-local initialisation to use aligned storage and `pthread_getspecific` as well as `pthread_setspecific` for the thread-specific data. - Actually use the pointer provided to the thread-exit cleanup handling, instead of assuming that the thread has thread-local data associated with it, and reaching at thread-exit time. In this change we also have an explicit test for two consecutive sessions for FDR mode tracing, and ensuring both sessions succeed. Reviewers: kpw, eizan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49359 llvm-svn: 337341
* Workaround warning bug in old versions of gcc.Sterling Augustine2018-07-181-0/+6
| | | | | | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56480 llvm-svn: 337340
* Re-land r337333, "Teach Clang to emit address-significance tables.",Peter Collingbourne2018-07-189-0/+60
| | | | | | | | | | | | | | | | | | | | | which was reverted in r337336. The problem that required a revert was fixed in r337338. Also added a missing "REQUIRES: x86-registered-target" to one of the tests. Original commit message: > Teach Clang to emit address-significance tables. > > By default, we emit an address-significance table on all ELF > targets when the integrated assembler is enabled. The emission of an > address-significance table can be controlled with the -faddrsig and > -fno-addrsig flags. > > Differential Revision: https://reviews.llvm.org/D48155 llvm-svn: 337339
* CodeGen: Don't create address significance table entries for thread-local ↵Peter Collingbourne2018-07-182-1/+5
| | | | | | | | | | variables. The presence of these symbols in the symbol table can cause symbol type mismatch errors (or undefined symbol errors on emulated TLS targets) and they can't be ICF'd anyway. llvm-svn: 337338
* [NFC][llvm-objcopy] Cleanup namespace usage in llvm-objcopy.Puyan Lotfi2018-07-184-29/+39
| | | | | | | | | | Nest any classes not used outside of a file into anon. Nest any classes used across files in llvm-objcopy into namespace llvm::objcopy. Differential Revision: https://reviews.llvm.org/D49449 llvm-svn: 337337
* Revert r337333, "Teach Clang to emit address-significance tables."Peter Collingbourne2018-07-179-58/+0
| | | | | | | | | Causing multiple failures on sanitizer bots due to TLS symbol errors, e.g. /usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o llvm-svn: 337336
* Link the lldb driver ("lldb") against the llvm staticJason Molenda2018-07-171-3/+13
| | | | | | libraries because of the new prettystackprinter dependency. llvm-svn: 337335
* [X86] Remove the vector alignment requirement from the patterns added in ↵Craig Topper2018-07-172-4/+6
| | | | | | | | r337320. The resulting instruction will only load 64 bits so alignment isn't required. llvm-svn: 337334
OpenPOWER on IntegriCloud