summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Share more output section creation code.Rafael Espindola2017-02-165-60/+38
| | | | | | | We can do this now that the linker script and the writer agree on which sections should be combined. llvm-svn: 295341
* Added an option to bind initial thread at the start of applicationAndrey Churbanov2017-02-161-3/+8
| | | | | | | | via setting envirable KMP_INITIAL_THREAD_BIND=1. Differential Revision: https://reviews.llvm.org/D29665 llvm-svn: 295339
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-169-166/+57
| | | | | | | | | | | | Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336
* [OpenMP] Teams reduction on the NVPTX device.Arpith Chacko Jacob2017-02-163-7/+1590
| | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-167-35/+1940
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333
* [ELF] - Do not crash when discarding sections that are referenced by others.George Rimar2017-02-162-0/+37
| | | | | | | | | | | | | SHF_LINK_ORDER sections adds special ordering requirements. Such sections references other sections. Previously we would crash if section that other were referenced to was discarded by script. Patch fixes that by discarding all dependent sections in that case. It supports chained dependencies, testcase is provided. Differential revision: https://reviews.llvm.org/D30033 llvm-svn: 295332
* [AArch64] AArch64AsmParser clean up of isImmediate functions. NFCSjoerd Meijer2017-02-163-151/+18
| | | | | | | | | | | Regression test neon-diagnostics.s needed changing because it now produces a more specific diagnostic about the immediate ranges. One change in the expected error message is not obvious, but there multiple candidate and it happens to pick the immediate diagnostic. Differential Revision: https://reviews.llvm.org/D29939 llvm-svn: 295331
* math: correct the MSVCRT conditionSaleem Abdulrasool2017-02-162-4/+4
| | | | | | Fixes a number of tests in the testsuite on Windows. llvm-svn: 295330
* threading_support: make __thread_sleep_for be alertableSaleem Abdulrasool2017-02-161-2/+5
| | | | | | | | | | On Windows, we were using `Sleep` which is not alertable. This means that if the thread was used for a user APC or WinProc handling and thread::sleep was used, we could potentially dead lock. Use `SleepEx` with an alertable sleep, resuming until the time has expired if we are awoken early. llvm-svn: 295329
* Fix build due to clang r295311Pavel Labath2017-02-161-1/+0
| | | | | | BuiltinType::Kind::OCLNDRange was removed. llvm-svn: 295328
* [WebAssembly] Add a cast to void to fix an unused private member warning, ↵Dan Gohman2017-02-161-1/+3
| | | | | | for now. llvm-svn: 295327
* [X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead.Simon Pilgrim2017-02-161-9/+1
| | | | llvm-svn: 295326
* Remove uses of deprecated std::random_shuffle in the LLVM code base. ↵Marshall Clow2017-02-164-14/+16
| | | | | | Reviewed as https://reviews.llvm.org/D29780. llvm-svn: 295325
* Ignore relocation sections in linker scripts.Rafael Espindola2017-02-162-0/+22
| | | | | | | | | | | | | | Unfortunately, the common way of writing linker scripts seems to be to get the output of ld.bfd --verbose and edit it a bit. Also unfortunately, the bfd default script contains things like .rela.dyn : { *(... .rela.data ...) } but bfd actually ignores that for -emit-relocs, so we have to do the same. llvm-svn: 295324
* Revert r295319 while investigating buildbot failure.Arpith Chacko Jacob2017-02-167-1939/+35
| | | | llvm-svn: 295323
* Fix crash with -emit-relocs -shared.Rafael Espindola2017-02-162-1/+21
| | | | | | | The code to handle the input SHT_REL/SHT_RELA sections was getting confused with the linker generated relocation sections. llvm-svn: 295322
* [ARM] GlobalISel: Select floating point loadsDiana Picus2017-02-162-10/+87
| | | | llvm-svn: 295321
* Silence sign compare warning. NFC.Benjamin Kramer2017-02-161-6/+6
| | | | | | | | | ExprConstant.cpp:6344:20: warning: comparison of integers of different signs: 'const size_t' (aka 'const unsigned long') and 'typename iterator_traits<Expr *const *>::difference_type' (aka 'long') [-Wsign-compare] llvm-svn: 295320
* [OpenMP] Parallel reduction on the NVPTX device.Arpith Chacko Jacob2017-02-167-35/+1939
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319
* [tsan] Provide external tags (object types) via debugging APIKuba Mracek2017-02-163-0/+75
| | | | | | | | In D28836, we added a way to tag heap objects and thus provide object types into report. This patch exposes this information into the debugging API. Differential Revision: https://reviews.llvm.org/D30023 llvm-svn: 295318
* Fix clang-move test after clang-format update r295312Krasimir Georgiev2017-02-161-3/+3
| | | | llvm-svn: 295317
* Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in ↵Artur Pilipenko2017-02-167-40/+147
| | | | | | | | load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316
* [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combineArtur Pilipenko2017-02-167-147/+40
| | | | | | | | | | Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314
* [OpenCL][Doc] Added OpenCL vendor extension description to user manual docAnastasia Stulova2017-02-161-1/+38
| | | | | | | | | Added description of a new feature that allows to specify vendor extension in flexible way using compiler pragma instead of modifying source code directly (committed in clang@r289979). Review: D29829 llvm-svn: 295313
* [clang-format] Align block comment decorationsKrasimir Georgiev2017-02-164-10/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements block comment decoration alignment. source: ``` /* line 1 * line 2 */ ``` result before: ``` /* line 1 * line 2 */ ``` result after: ``` /* line 1 * line 2 */ ``` Reviewers: djasper, bkramer, klimek Reviewed By: klimek Subscribers: mprobst, cfe-commits, klimek Differential Revision: https://reviews.llvm.org/D29943 llvm-svn: 295312
* [OpenCL] Correct ndrange_t implementationAnastasia Stulova2017-02-1627-92/+48
| | | | | | | | | | | | | | | Removed ndrange_t as Clang builtin type and added as a struct type in the OpenCL header. Use type name to do the Sema checking in enqueue_kernel and modify IR generation accordingly. Review: D28058 Patch by Dmitry Borisenkov! llvm-svn: 295311
* [ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACTDiana Picus2017-02-162-0/+123
| | | | | | | | Since they're only used for passing around double precision floating point values into the general purpose registers, we'll lower them to VMOVDRR and VMOVRRD. llvm-svn: 295310
* [ARM] GlobalISel: Select double G_FADD and copiesDiana Picus2017-02-162-6/+63
| | | | | | Just use VADDD if available, bail out if not. llvm-svn: 295309
* [ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFPDiana Picus2017-02-162-4/+18
| | | | llvm-svn: 295308
* [OpenCL] Disallow blocks capture other blocks (v2.0, s6.12.5)Anastasia Stulova2017-02-163-0/+24
| | | | llvm-svn: 295307
* [ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACTDiana Picus2017-02-162-0/+58
| | | | | | | Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating point values in the soft-fp float mode. llvm-svn: 295306
* [clangd] Fix Output.log errorKrasimir Georgiev2017-02-161-1/+1
| | | | llvm-svn: 295305
* [clangd] Implement format on typeKrasimir Georgiev2017-02-166-4/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds onTypeFormatting to clangd. The trigger character is '}' and it works by scanning for the matching '{' and formatting the range in-between. There are problems with ';' as a trigger character, the cursor position is before the `|`: ``` int main() { int i;| } ``` becomes: ``` int main() { int i;| } ``` which is not likely what the user intended. Also formatting at semicolon in a non-properly closed scope puts the following tokens in the same unwrapped line, which doesn't reformat nicely. Reviewers: bkramer Reviewed By: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D29990 llvm-svn: 295304
* [clang-tidy] Ignore spaces between globs in the Checks option.Alexander Kornienko2017-02-162-1/+2
| | | | llvm-svn: 295303
* [ARM] GlobalISel: Make the FPR bank 64-bit wideDiana Picus2017-02-163-7/+56
| | | | | | | Also add mappings for single and double precision FP, and use them for G_FADD and G_LOAD. llvm-svn: 295302
* Cache FileID when translating diagnostics in PCH filesErik Verbruggen2017-02-161-1/+6
| | | | | | | | | | | | | | | | | | | | Modules/preambles/PCH files can contain diagnostics, which, when used, are added to the current ASTUnit. For that to work, they are translated to use the current FileManager's FileIDs. When the entry is not the main file, all local source locations will be checked by a linear search. Now this is a problem, when there are lots of diagnostics (say, 25000) and lots of local source locations (say, 440000), and end up taking seconds when using such a preamble. The fix is to cache the last FileID, because many subsequent diagnostics refer to the same file. This reduces the time spent in ASTUnit::TranslateStoredDiagnostics from seconds to a few milliseconds for files with many slocs/diagnostics. This fixes PR31353. Differential Revision: https://reviews.llvm.org/D29755 llvm-svn: 295301
* [ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOADDiana Picus2017-02-162-0/+36
| | | | | | | | For now we just mark them as legal all the time and let the other passes bail out if they can't handle it. In the future, we'll want to move more of the brains into the legalizer. llvm-svn: 295300
* [sanitizers] Fix formatting of the shell script.Vitaly Buka2017-02-161-10/+10
| | | | llvm-svn: 295299
* [ELF] - Allow section to have multiple dependent sections.George Rimar2017-02-164-5/+24
| | | | | | | | | | | That fixes a case when section has more than one metadata section. Previously GC would collect one of such sections because we had implementation that stored only last one as dependent. Differential revision: https://reviews.llvm.org/D29981 llvm-svn: 295298
* RWMutex.h: Use llvm-config.h instead of config.h in installed headers.NAKAMURA Takumi2017-02-161-1/+1
| | | | llvm-svn: 295297
* [sanitizers] Redirect pthread calls to interceptors.Vitaly Buka2017-02-161-0/+24
| | | | | | It's needed if libcxx is build without disabling threads. llvm-svn: 295296
* [ARM] GlobalISel: Lower double precision FP argsDiana Picus2017-02-164-9/+212
| | | | | | | | | | | | | | For the hard float calling convention, we just use the D registers. For the soft-fp calling convention, we use the R registers and move values to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we make sure to honor the endianness of the target, since the CCAssignFn doesn't do that for us. For pure soft float targets, we still bail out because we don't support the libcalls yet. llvm-svn: 295295
* [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus ↵Craig Topper2017-02-163-8/+132
| | | | | | intrinsics like it does 128/256-bit. llvm-svn: 295294
* Revert r295284: Add better ODR checking for modules.Richard Trieu2017-02-1614-3051/+97
| | | | | | Fix modules build bot. llvm-svn: 295293
* [FIX] Fix the typo in ScheduleOptimizer.cpp.Roman Gareev2017-02-161-1/+2
| | | | llvm-svn: 295292
* [AVX-512] Replace 512-bit masked packss/packus builtins and replace with new ↵Craig Topper2017-02-163-80/+64
| | | | | | | | unmasked builtins. These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend. llvm-svn: 295291
* [AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked ↵Craig Topper2017-02-167-264/+1589
| | | | | | | | intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290
* Use isRelExprOneOf.Rui Ueyama2017-02-161-6/+6
| | | | llvm-svn: 295289
* Removes a trivial accessor.Rui Ueyama2017-02-164-17/+6
| | | | llvm-svn: 295288
* Do not overload a one-bit variable, NeedsCopyOrPltAddr.Rui Ueyama2017-02-164-20/+23
| | | | | | | | | This patch removes NeedsCopyOrPltAddr and instead add two variables, NeedsCopy and NeedsPltAddr. This uses one more bit in Symbol class, but the actual size doesn't increase because we had unused bits. This should improve code readability. llvm-svn: 295287
OpenPOWER on IntegriCloud