summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert r324835 "[X86] Reduce Store Forward Block issues in HW"Hans Wennborg2018-02-125-1936/+0
| | | | | | | | | | | | | | | | | | It asserts building Chromium; see PR36346. (This also reverts the follow-up r324836.) > If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. > A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. > The estimated penalty for a store forward block is ~13 cycles. > > This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence > of a load and a store. > > The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. > breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. llvm-svn: 324887
* [clang-move] Fix the incorrect expansion end location.Haojian Wu2018-02-122-4/+8
| | | | | | | | | | | | | | | | | | | | Summary: Before the fix, if clang-move decides to move the following macro statement, it only moves the first line `DEFINE(A,`. ``` DEFINE(A, B); ``` Reviewers: ioeric Reviewed By: ioeric Subscribers: klimek, cfe-commits Differential Revision: https://reviews.llvm.org/D43174 llvm-svn: 324886
* [mips] Fix 'l' constraint handling for types smaller than 32 bitsSimon Atanasyan2018-02-123-1/+24
| | | | | | | | | | In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. This commit is the same as r324869 with fixed the test's file name. llvm-svn: 324885
* ASan+operator new[]: Add an option for more thorough operator new[] cookie ↵Filipe Cabecinhas2018-02-125-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | poisoning Summary: Right now clang is skipping array cookie poisoning for any operator new[] which is not part of the set of replaceable global allocation functions. This commit adds a flag to tell clang to poison all operator new[] cookies. A previous review was poisoning all array cookies unconditionally, but there is an edge case which would stop working under ASan (a custom operator new[] saves whatever pointer it returned, and then accesses it). This newer revision adds a command line argument to toggle this feature. Original revision: https://reviews.llvm.org/D41301 Compiler-rt test revision with an explanation of the edge case: https://reviews.llvm.org/D41664 Reviewers: rjmccall, kcc, rsmith Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D43013 llvm-svn: 324884
* [clangd] Remove codeComplete that returns std::future<>Ilya Biryukov2018-02-127-49/+115
| | | | | | | | | | | | | | | | Summary: It was deprecated and callback version and is used everywhere. Only changes to the testing code were needed. Reviewers: hokein, ioeric, sammccall Reviewed By: sammccall Subscribers: mgorny, klimek, jkorous-apple, cfe-commits Differential Revision: https://reviews.llvm.org/D43068 llvm-svn: 324883
* [mips] Revert rL324869Simon Atanasyan2018-02-123-24/+1
| | | | | | | This commit adds inlineasm-cnstrnt-bad-l.ll which is clashing with inlineasm-cnstrnt-bad-L.ll on case insensitive file systems. llvm-svn: 324882
* [LoopInterchange] Simplify splitInnerLoopHeader logic (NFC).Florian Hahn2018-02-121-11/+4
| | | | | | | We can use SplitBlock for both cases, which makes the code slightly simpler and updates both LoopInfo and the dominator tree. llvm-svn: 324881
* [CodeGen] Add a -trap-unreachable option for debuggingDavid Green2018-02-124-7/+16
| | | | | | | | | | | Add a common -trap-unreachable option, similar to the target specific hexagon equivalent, which has been replaced. This turns unreachable instructions into traps, which is useful for debugging. Differential Revision: https://reviews.llvm.org/D42965 llvm-svn: 324880
* [libomptarget] Fix detection of CUDA stubs libraryJonas Hahnfeld2018-02-121-1/+10
| | | | | | | CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879
* [CUDA] Add option to generate relocatable device codeJonas Hahnfeld2018-02-126-11/+59
| | | | | | | | | | | As a first step, pass '-c/--compile-only' to ptxas so that it doesn't complain about references to external function. This will successfully generate object files, but they won't work at runtime because the registration routines need to adapted. Differential Revision: https://reviews.llvm.org/D42921 llvm-svn: 324878
* [CUDA] Fix test cuda-external-tools.cuJonas Hahnfeld2018-02-121-51/+54
| | | | | | | | This didn't verify the CHECK prefix before! Differential Revision: https://reviews.llvm.org/D42920 llvm-svn: 324877
* [gtest] Support raw_ostream printing functions more comprehensively.Sam McCall2018-02-123-41/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These are functions like operator<<(raw_ostream&, Foo). Previously these were only supported for messages. In the assertion EXPECT_EQ(A, B) << C; the local modifications would explicitly try to use raw_ostream printing for C. However A and B would look for a std::ostream printing function, and often fall back to gtest's default "168 byte object <00 01 FE 42 ...>". This patch pulls out the raw_ostream support into a new header under `custom/`. I changed the mechanism: instead of a convertible stream, we wrap the printed value in a proxy object to allow it to be sent to a std::ostream. I think the new way is clearer. I also changed the policy: we prefer raw_ostream printers over std::ostream ones. This is because the fallback printers are defined using std::ostream, while all the raw_ostream printers should be "good". Reviewers: ilya-biryukov, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43091 llvm-svn: 324876
* Skip TestTargetXMLArch on non-darwin OSsPavel Labath2018-02-121-0/+1
| | | | | | | | | | | | | This test uses XML packets, but libxml is an optional dependency of lldb, and this test fails if it is not present. I'm leaving this enabled on mac, as thats the only platform that's likely to have libxml always available, but ideally we should have a way to skip this based on build configuration. I'll see if I can whip something like that up soon, but for the time being, this unblocks the buildbots. llvm-svn: 324870
* [mips] Fix 'l' constraint handling for types smaller than 32 bitsSimon Atanasyan2018-02-123-1/+24
| | | | | | | | In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. llvm-svn: 324869
* [MC] Issue error message when data region is not terminatedGerolf Hoflehner2018-02-122-1/+15
| | | | llvm-svn: 324868
* [NFC] Fix typosMax Kazantsev2018-02-121-14/+14
| | | | llvm-svn: 324867
* [SCEV] Make getPostIncExpr guaranteed to return AddRecMax Kazantsev2018-02-123-3/+70
| | | | | | | | | | | | | The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies and expects that it always returns it a recurrency. But this is not guaranteed to happen if we have reached max recursion depth or refused to make SCEV simplification for other reasons. This patch changes its implementation so that now it always returns SCEVAddRec without relying on `getAddExpr`. Differential Revision: https://reviews.llvm.org/D42953 llvm-svn: 324866
* [X86] Don't look for TEST instruction shrinking opportunities when the root ↵Craig Topper2018-02-121-10/+3
| | | | | | | | node is a X86ISD::SUB. I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots. llvm-svn: 324865
* [X86] Remove check for X86ISD::AND with no flag users from the TEST ↵Craig Topper2018-02-121-2/+1
| | | | | | | | instruction immediate shrinking code. We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG. llvm-svn: 324864
* [X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 ↵Craig Topper2018-02-124-14/+14
| | | | | | | | helper fragments. This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold. llvm-svn: 324863
* [X86] Autogenerate complete checks. NFCCraig Topper2018-02-121-124/+363
| | | | llvm-svn: 324862
* [X86] Add KADD X86ISD opcode instead of reusing ISD::ADD.Craig Topper2018-02-124-1/+7
| | | | | | | | ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861
* [X86] Allow zextload/extload i1->i8 to be folded into instructions during iselCraig Topper2018-02-122-9/+19
| | | | | | | | Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier. The gpr-to-mask.ll test changed because the kaddb instruction has no memory form. llvm-svn: 324860
* Follow on to rL324854 (Added tests)Charles Saternos2018-02-121-0/+58
| | | | llvm-svn: 324859
* [X86] Remove MASK_BINOP intrinsic type. NFCCraig Topper2018-02-112-10/+1
| | | | llvm-svn: 324858
* [X86] Remove dead code from getMaskNode that looked for a i64 mask with a ↵Craig Topper2018-02-111-19/+11
| | | | | | maskVT that wasn't v64i1. NFC llvm-svn: 324857
* [X86] Remove LowerBoolVSETCC_AVX512, we get this with a target independent ↵Craig Topper2018-02-111-49/+1
| | | | | | DAG combine now. NFC llvm-svn: 324856
* Add default C++ ABI libname and include paths for FreeBSDDimitry Andric2018-02-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: As noted in a discussion about testing the LLVM 6.0.0 release candidates (with libc++) for FreeBSD, many tests turned out to fail with "exception_ptr not yet implemented". This was because libc++ did not choose the correct C++ ABI library, and therefore it fell back to the `exception_fallback.ipp` header. Since FreeBSD 10.x, we have been using libcxxrt as our C++ ABI library, and its headers have always been installed in /usr/include/c++/v1, together with the (system) libc++ headers. (Older versions of FreeBSD used GNU libsupc++ by default, but these are now unsupported.) Therefore, if we are building libc++ for FreeBSD, set: * `LIBCXX_CXX_ABI_LIBNAME` to "libcxxrt" * `LIBCXX_CXX_ABI_INCLUDE_PATHS` to "/usr/include/c++/v1" by default. Reviewers: emaste, EricWF, mclow.lists Reviewed By: EricWF Subscribers: mgorny, cfe-commits, krytarowski Differential Revision: https://reviews.llvm.org/D43166 llvm-svn: 324855
* [ThinLTO] Add GraphTraits for FunctionSummariesCharles Saternos2018-02-114-2/+181
| | | | | | Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes. llvm-svn: 324854
* Fix libcxx MSVC C++17 redefinition of 'align_val_t'Eric Fiselier2018-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from charlieio@outlook.com Reviewed as https://reviews.llvm.org/D42354 When the following command is used: > clang-cl -std:c++17 -Iinclude\c++\v1 hello.cc c++.lib An error occurred: In file included from hello.cc:1: In file included from include\c++\v1\iostream:38: In file included from include\c++\v1\ios:216: In file included from include\c++\v1\__locale:15: In file included from include\c++\v1\string:477: In file included from include\c++\v1\string_view:176: In file included from include\c++\v1\__string:56: In file included from include\c++\v1\algorithm:643: In file included from include\c++\v1\memory:656: include\c++\v1\new(165,29): error: redefinition of 'align_val_t' enum class _LIBCPP_ENUM_VIS align_val_t : size_t { }; ^ C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\vcruntime_new.h(43,16): note: previous definition is here enum class align_val_t : size_t {}; ^ 1 error generated. vcruntime_new.h has defined align_val_t, libcxx need hide align_val_t. This patch fixes that error. llvm-svn: 324853
* Mark two issues as completeEric Fiselier2018-02-111-4/+4
| | | | llvm-svn: 324852
* Fix a typo in the synopsis comment. NFC. Thanks to K-ballo for the catchMarshall Clow2018-02-111-1/+1
| | | | llvm-svn: 324851
* [CodeView] Allow variable names to be as long as the codeview format supportsBrock Wyma2018-02-112-4/+20
| | | | | | | | | Instead of reserving 0xF00 bytes for the fixed length portion of the CodeView symbol name, calculate the actual length of the fixed length portion. Differential Revision: https://reviews.llvm.org/D42125 llvm-svn: 324850
* Revert r324847, there's bot failures.Kuba Mracek2018-02-116-82/+93
| | | | llvm-svn: 324849
* [sanitizer] Implement NanoTime() on DarwinKuba Mracek2018-02-111-2/+12
| | | | | | | | Currently NanoTime() on Darwin is unimplemented and always returns 0. Looks like there's quite a few things broken because of that (TSan periodic memory flush, ASan allocator releasing pages back to the OS). Let's fix that. Differential Revision: https://reviews.llvm.org/D40665 llvm-svn: 324847
* [compiler-rt] Replace forkpty with posix_spawnKuba Mracek2018-02-116-93/+82
| | | | | | | | | | On Darwin, we currently use forkpty to communicate with the "atos" symbolizer. There are several problems that fork or forkpty has, e.g. that after fork, interceptors are still active and this sometimes causes crashes or hangs. This is especially problematic for TSan, which uses interceptors for OS-provided locks and mutexes, and even Libc functions use those. This patch replaces forkpty with posix_spawn. Since posix_spawn doesn't fork (at least on Darwin), the interceptors are not a problem. Additionally, this also fixes a latent threading problem with ptsname (it's unsafe to use this function in multithreaded programs). Yet another benefit is that we'll handle post-fork failures (e.g. sandbox disallows "exec") gracefully now. Differential Revision: https://reviews.llvm.org/D40032 llvm-svn: 324846
* [X86] Update some required-vector-width.ll test cases to not pass 512-bit ↵Craig Topper2018-02-111-116/+143
| | | | | | | | vectors in arguments or return. ABI for these would require 512 bits support so we don't want to test that. llvm-svn: 324845
* [X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before ↵Simon Pilgrim2018-02-112-62/+34
| | | | | | | | they're split on AVX1 This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first. llvm-svn: 324844
* [InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflowSanjay Patel2018-02-112-12/+25
| | | | | | | | | | | The related cases for (X * Y) / X were handled in rL124487. https://rise4fun.com/Alive/6k9 The division in these tests is subsequently eliminated by existing instcombines for 1/X. llvm-svn: 324843
* [X86] Use min/max for vector ult/ugt compares if avoids a sign flip.Craig Topper2018-02-117-566/+825
| | | | | | | | | | | | | | | | | | | | | Summary: Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare. I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior. Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into. I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42935 llvm-svn: 324842
* [X86][SSE] Moved SplitBinaryOpsAndApply earlier so more methods can use it. ↵Simon Pilgrim2018-02-111-47/+47
| | | | | | NFCI. llvm-svn: 324841
* [InstCombine] add tests for div-mul folds; NFCSanjay Patel2018-02-111-0/+53
| | | | | | The related cases for (X * Y) / X were handled in rL124487. llvm-svn: 324840
* [TargetLowering] try to create -1 constant operand for math ops via demanded ↵Sanjay Patel2018-02-114-7/+25
| | | | | | | | | | | | | | | | | | bits This reverses instcombine's demanded bits' transform which always tries to clear bits in constants. As noted in PR35792 and shown in the test diffs: https://bugs.llvm.org/show_bug.cgi?id=35792 ...we can do better in codegen by trying to form -1. The x86 sub test shows a missed opportunity. I did investigate changing instcombine's behavior, but it would be more work to change canonicalization in IR. Clearing bits / shrinking constants can allow killing instructions, so we'd have to figure out how to not regress those cases. Differential Revision: https://reviews.llvm.org/D42986 llvm-svn: 324839
* [X86] Add PR33747 test caseSimon Pilgrim2018-02-111-0/+29
| | | | llvm-svn: 324838
* [X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal typesSimon Pilgrim2018-02-1118-12642/+10760
| | | | | | | | This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT. Differential Revision: https://reviews.llvm.org/D43014 llvm-svn: 324837
* fix test/CodeGen/X86/fixup-sfb.ll test failure after commit ↵Lama Saba2018-02-111-8/+8
| | | | | | | https://reviews.llvm.org/rL324835 Change-Id: I2526c2f342654e85ce054237de03ae9db9ab4994 llvm-svn: 324836
* [X86] Reduce Store Forward Block issues in HWLama Saba2018-02-115-0/+1936
| | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748 llvm-svn: 324835
* [X86] Don't make 512-bit vectors legal when preferred vector width is 256 ↵Craig Topper2018-02-115-25/+705
| | | | | | | | | | | | | | | | | | bits and 512 bits aren't required This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors. For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar. This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking. I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences. We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it. Differential Revision: https://reviews.llvm.org/D42724 llvm-svn: 324834
* [X86] Remove setOperationAction lines for promoting vXi1 SINT_TO_FP/UINT_TO_FP.Craig Topper2018-02-111-26/+0
| | | | | | | | We promote these via a DAG combine now before lowering gets the chance. Also remove the v2i1 custom handling since it will no longer be triggered. llvm-svn: 324833
* [SelectionDAG] Remove TargetLowering::getConstTrueVal. Use ↵Craig Topper2018-02-113-16/+3
| | | | | | | | SelectionDAG::getBoolConstant in the one place it was used. SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently. llvm-svn: 324832
OpenPOWER on IntegriCloud