summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix test added in r321992 failing on some buildbots (again), test requires x86.Sean Eveson2018-01-081-0/+2
| | | | llvm-svn: 322000
* [CodeGen] Fix TBAA info for accesses to members of base classesIvan A. Kosarev2018-01-082-2/+59
| | | | | | | | | | | Resolves: Bug 35724 - regression (r315984): fatal error: error in backend: Broken function found (Did not see access type in access path!) https://bugs.llvm.org/show_bug.cgi?id=35724 Differential Revision: https://reviews.llvm.org/D41547 llvm-svn: 321999
* [InstCombine] fold min/max tree with common operand (PR35717)Sanjay Patel2018-01-083-28/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is precedence for factorization transforms in instcombine for FP ops with fast-math. We also have similar logic in foldSPFofSPF(). It would take more work to add this to reassociate because that's specialized for binops, and min/max are not binops (or even single instructions). Also, I don't have evidence that larger min/max trees than this exist in real code, but if we find that's true, we might want to reorganize where/how we do this optimization. In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have: int test(int xc, int xm, int xy) { int xk; if (xc < xm) xk = xc < xy ? xc : xy; else xk = xm < xy ? xm : xy; return xk; } This patch solves that problem because we recognize more min/max patterns after rL321672 https://rise4fun.com/Alive/Qjne https://rise4fun.com/Alive/3yg Differential Revision: https://reviews.llvm.org/D41603 llvm-svn: 321998
* Avoid assumption that lit tests are writable. NFCSam McCall2018-01-086-6/+6
| | | | llvm-svn: 321997
* [ARM] Fix PR35379 - incorrect unwind information when compiling with -OzMomchil Velikov2018-01-083-4/+72
| | | | | | | | | | The patch makes the unwind information not mention registers, which were pushed solely for the purpose of saving stack adjustment instructions. Differential revision: https://reviews.llvm.org/D41300 Fixes https://bugs.llvm.org/show_bug.cgi?id=35379 llvm-svn: 321996
* Fix test added in r321992 failing on some buildbots.Sean Eveson2018-01-081-2/+2
| | | | llvm-svn: 321995
* [SLP] Fix PR35777: Incorrect handling of aggregate values.Alexey Bataev2018-01-086-94/+95
| | | | | | | | | | | | | | | | Summary: Fixes the bug with incorrect handling of InsertValue|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994
* [SLP] Fix PR35628: Count external uses on extra reduction arguments.Alexey Bataev2018-01-083-1/+138
| | | | | | | | | | | | | | Summary: If the vectorized value is marked as extra reduction argument, its users are not considered as external users. Patch fixes this. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41786 llvm-svn: 321993
* [Driver] Add flag enabling the function stack size section that was added in ↵Sean Eveson2018-01-087-0/+28
| | | | | | | | | | r319430 Adds the -fstack-size-section flag to enable the .stack_sizes section. The flag defaults to on for the PS4 triple. Differential Revision: https://reviews.llvm.org/D40712 llvm-svn: 321992
* [DAGCombine] Fix for PR35761Sam Parker2018-01-082-4/+46
| | | | | | | | | | | I had falsely assumed that constant operands would be operand(1) of the bin ops that may need their constant operand to be masked. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35761 Differential Revision: https://reviews.llvm.org/D41667 llvm-svn: 321991
* [SystemZ] Comment fix in SystemZElimCompare.cppJonas Paulsson2018-01-081-5/+2
| | | | | | | NFC Review: Ulrich Weigand llvm-svn: 321990
* [ARM] Fix PR35481Momchil Velikov2018-01-082-5/+38
| | | | | | | | | | | | This patch allows `r7` to be used, regardless of its use as a frame pointer, as a temporary register when popping `lr`, and also falls back to using a high temporary register if, for some reason, we weren't able to find a suitable low one. Differential revision: https://reviews.llvm.org/D40961 Fixes https://bugs.llvm.org/show_bug.cgi?id=35481 llvm-svn: 321989
* [X86] Renamed CodeGen testSam Parker2018-01-081-0/+0
| | | | llvm-svn: 321988
* [X86] Remove side-effects from determineCalleeSavesFrancis Visoiu Mistrih2018-01-081-28/+27
| | | | | | | | | | | | | | (Target)FrameLowering::determineCalleeSaves can be called multiple times. I don't think it should have side-effects as creating stack objects and setting global MachineFunctionInfo state as it is doing today (in other back-ends as well). This moves the creation of stack objects from determineCalleeSaves to assignCalleeSavedSpillSlots. Differential Revision: https://reviews.llvm.org/D41703 llvm-svn: 321987
* [ELF] Compress debug sections after assignAddresses and support custom layoutJames Henderson2018-01-086-15/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, in r320472, I moved the calculation of section offsets and sizes for compressed debug sections into maybeCompress, which happens before assignAddresses, so that the compression had the required information. However, I failed to take account of relocations that patch such sections. This had two effects: 1. A race condition existed when a debug section referred to a different debug section (see PR35788). 2. References to symbols in non-debug sections would be patched incorrectly. This is because the addresses of such symbols are not calculated until after assignAddresses (this was a partial regression caused by r320472, but they could still have been broken before, in the event that a custom layout was used in a linker script). assignAddresses does not need to know about the output section size of non-allocatable sections, because they do not affect the value of Dot. This means that there is no longer a reason not to support custom layout of compressed debug sections, as far as I'm aware. These two points allow for delaying when maybeCompress can be called, removing the need for the loop I previously added to calculate the section size, and therefore the race condition. Furthermore, by delaying, we fix the issues of relocations getting incorrect symbol values, because they have now all been finalized. llvm-svn: 321986
* [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.Craig Topper2018-01-088-146/+218
| | | | | | CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
* [X86] Add patterns to allow 512-bit BWI compare instructions to be used for ↵Craig Topper2018-01-083-56/+73
| | | | | | 128/256-bit compares when VLX is not available. llvm-svn: 321984
* [COFF] Initalize ErrorHandler with CanExitEarly valueShoaib Meenai2018-01-081-0/+1
| | | | | | | | | | | | | Previously, the COFF driver would call exit(1) from the ErrorHandler in the case of a link error, even if CanExitEarly=false was specified. Now it initializes the ErrorHandler in the same way that the ELF driver does. Patch by Andrew Kelley. Differential Revision: https://reviews.llvm.org/D41803 llvm-svn: 321983
* [ELF] Drop unnecessary VersionId setting in scanShlibUndefinedShoaib Meenai2018-01-082-6/+19
| | | | | | | | | | | | | | | | | | | | | | | LLD previously used to handle dynamic lists and version scripts in the exact same way, even though they have very different semantics for shared libraries and subtly different semantics for executables. r315114 untangled their semantics for executables (building on previous work to correct their semantics for shared libraries). With that change, dynamic lists won't set the default version to VER_NDX_LOCAL, and so resetting the version to VER_NDX_GLOBAL in scanShlibUndefined is unnecessary. This was causing an issue because version scripts containing `local: *` work by setting the default version to VER_NDX_LOCAL, but scanShlibUndefined would override this default, and therefore symbols which should have been local would end up in the dynamic symbol table, which differs from both bfd and gold's behavior. gold silently keeps the symbol hidden in such a scenario, whereas bfd issues an error. I prefer bfd's behavior and plan to implement that in LLD in a follow-up (and the test case added here will be updated accordingly). Differential Revision: https://reviews.llvm.org/D41639 llvm-svn: 321982
* Don't try to run MCJIT/OrcJIT EH tests when C++ library is statically linkedPetr Hosek2018-01-089-0/+38
| | | | | | | | | | | | | These tests assumes availability of external symbols provided by the C++ library, but those won't be available in case when the C++ library is statically linked because lli itself doesn't need these. This uses llvm-readobj -needed-libs to check if C++ library is linked as shared library and exposes that information as a feature to lit. Differential Revision: https://reviews.llvm.org/D41272 llvm-svn: 321981
* [llvm-readobj] Support -needed-libs option for Mach-O filesPetr Hosek2018-01-082-0/+56
| | | | | | | | This implements the -needed-libs option in Mach-O dumper. Differential Revision: https://reviews.llvm.org/D41527 llvm-svn: 321980
* [X86] Simplify some code in lower1BitVectorShuffle by relying on getNode's ↵Craig Topper2018-01-071-15/+2
| | | | | | ability to constant fold vector SIGN_EXTEND. llvm-svn: 321979
* [X86] Add VSHUFF32X4 and similar instructions to load folding tables.Craig Topper2018-01-075-0/+170
| | | | llvm-svn: 321978
* Remove bogus check for template specialization from self-comparison warning.Richard Smith2018-01-072-14/+18
| | | | | | | The important check is that we're not within a template *instantiation*, which we check separately. llvm-svn: 321977
* Fix a couple of wrong self-comparison diagnostics.Richard Smith2018-01-073-4/+13
| | | | | | | | Check whether we are comparing the same entity, not merely the same declaration, and don't assume that weak declarations resolve to distinct entities. llvm-svn: 321976
* Revert "[SCCP] Manually fold branches on undef."Davide Italiano2018-01-071-26/+3
| | | | | | | I thought this was responsible for PR35723, but I was wrong, the issue lies elsewhere. Revert while I debug. llvm-svn: 321975
* [SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()).Davide Italiano2018-01-072-162/+23
| | | | | | | | The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974
* Add tests for three-way self- and array comparison.Richard Smith2018-01-071-0/+8
| | | | llvm-svn: 321973
* Factor out common tautological comparison code from scalar and vector ↵Richard Smith2018-01-074-105/+119
| | | | | | | | compare checking. In passing, improve vector compare diagnostic to match scalar compare diagnostic. llvm-svn: 321972
* [X86] Revert accidental change to CMakeLists.txt in r321952Craig Topper2018-01-071-1/+3
| | | | | | I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change. llvm-svn: 321971
* X86 Tests: Add Tests for PMADDWD selection. NFC.Zvi Rackover2018-01-071-99/+373
| | | | | | Support for ISel to be added. llvm-svn: 321970
* [DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcastsSimon Pilgrim2018-01-075-134/+92
| | | | | | | | | | Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand. Committed on the behalf of @sameconrad (Sam Conrad) Differential Revision: https://reviews.llvm.org/D41643 llvm-svn: 321969
* [X86] Remove unneeded code from combineGatherScatter that used to delte ↵Craig Topper2018-01-071-11/+1
| | | | | | | | SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL. v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated. llvm-svn: 321968
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-0723-8229/+4687
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* Mark the transparent version set::count() as const. Thanks to Ivan Matek for ↵Marshall Clow2018-01-071-1/+1
| | | | | | the bug report. llvm-svn: 321966
* Correct types of pointers to doacross_num_doneJonas Hahnfeld2018-01-071-3/+3
| | | | | | | | | | This field is defined as kmp_int32, so we should use neither pointers to kmp_int64 nor 64 bit atomic instructions. (Found while testing on a Raspberry Pi, 32 bit ARM) Differential Revision: https://reviews.llvm.org/D41656 llvm-svn: 321964
* Add pre-C++11 is_constructible wrappers for 3 argumentsDimitry Andric2018-01-072-4/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After rL319736 for D28253 (which fixes PR28929), gcc cannot compile `<memory>` anymore in pre-C+11 modes, complaining: ``` In file included from /usr/include/c++/v1/memory:648:0, from test.cpp:1: /usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::make_shared(_A0&, _A1&, _A2&)': /usr/include/c++/v1/memory:4365:5: error: wrong number of template arguments (4, should be at least 1) static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" ); ^ In file included from /usr/include/c++/v1/memory:649:0, from test.cpp:1: /usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible' struct _LIBCPP_TEMPLATE_VIS is_constructible ^~~~~~~~~~~~~~~~ In file included from /usr/include/c++/v1/memory:648:0, from test.cpp:1: /usr/include/c++/v1/memory:4365:5: error: template argument 1 is invalid static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in make_shared" ); ^ /usr/include/c++/v1/memory: In static member function 'static std::__1::shared_ptr<_Tp> std::__1::shared_ptr<_Tp>::allocate_shared(const _Alloc&, _A0&, _A1&, _A2&)': /usr/include/c++/v1/memory:4444:5: error: wrong number of template arguments (4, should be at least 1) static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" ); ^ In file included from /usr/include/c++/v1/memory:649:0, from test.cpp:1: /usr/include/c++/v1/type_traits:3198:29: note: provided for 'template<class _Tp, class _A0, class _A1> struct std::__1::is_constructible' struct _LIBCPP_TEMPLATE_VIS is_constructible ^~~~~~~~~~~~~~~~ In file included from /usr/include/c++/v1/memory:648:0, from test.cpp:1: /usr/include/c++/v1/memory:4444:5: error: template argument 1 is invalid static_assert((is_constructible<_Tp, _A0, _A1, _A2>::value), "Can't construct object in allocate_shared" ); ^ ``` This is also reported in https://bugs.freebsd.org/224946 (FreeBSD is apparently one of the very few projects that regularly builds programs against libc++ with gcc). The reason is that the static assertions are invoking `is_constructible` with three arguments, while gcc does not have the built-in `is_constructible` feature, and the pre-C++11 `is_constructible` wrappers in `<type_traits>` only provide up to two arguments. I have added additional wrappers for three arguments, modified the `is_constructible` entry point to take three arguments instead, and added a simple test to is_constructible.pass.cpp. Reviewers: EricWF, mclow.lists Reviewed By: EricWF Subscribers: krytarowski, cfe-commits, emaste Differential Revision: https://reviews.llvm.org/D41805 llvm-svn: 321963
* [LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of ↵Hal Finkel2018-01-074-271/+269
| | | | | | | | | | | | | | | | | | | | | | | | | LoopVectorize.cpp Another small step forward to move VPlan stuff outside of LoopVectorize.cpp. VPlanBuilder.h is renamed to LoopVectorizationPlanner.h LoopVectorizationPlanner class is moved from LoopVectorize.cpp to LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor class is moved to LoopVectorizationPlanner.h (used by the planner class) --- this needs further streamlining work in later patches and thus all I did was take it out of the CostModel class and moved to the header file. The callback function had to stay inside LoopVectorize.cpp since it calls an InnerLoopVectorizer member function declared in it. Next Steps: Make InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular and more aligned with VPlan direction, in small increments. Previous step was: r320900 (https://reviews.llvm.org/D41045) Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41420 llvm-svn: 321962
* [CodeExtractor] Use subset of function attributes for extracted function.Florian Hahn2018-01-072-4/+159
| | | | | | | | | | | | | | | | | | | | | | | | In addition to target-dependent attributes, we can also preserve a white-listed subset of target independent function attributes. The white-list excludes problematic attributes, most prominently: * attributes related to memory accesses, as alloca instructions could be moved in/out of the extracted block * control-flow dependent attributes, like no_return or thunk, as the relerelevant instructions might or might not get extracted. Thanks @efriedma and @aemerson for providing a set of attributes that cannot be propagated. Reviewers: efriedma, davidxl, davide, silvas Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41334 llvm-svn: 321961
* Remove outdated doxygen comment [-Wdocumentation]Benjamin Kramer2018-01-071-4/+0
| | | | | | No functionality change. llvm-svn: 321960
* [PowerPC] Add an ISD::TRUNCATE to the legalization for ↵Craig Topper2018-01-072-14/+3
| | | | | | | | | | | | | | | | | | | | | ppc_is_decremented_ctr_nonzero Summary: I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work. There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed. With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: nemanjai, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D41654 llvm-svn: 321959
* [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.Craig Topper2018-01-073-9/+19
| | | | llvm-svn: 321958
* Simplify the internal API for checking whether swiftcall passes a type ↵John McCall2018-01-074-19/+22
| | | | | | indirectly and expose that API externally. llvm-svn: 321957
* [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.Craig Topper2018-01-071-4/+4
| | | | | | The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment. llvm-svn: 321956
* [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load ↵Craig Topper2018-01-071-4/+4
| | | | | | folding table. llvm-svn: 321955
* [X86] Don't put any EVEX_B instructions in the tablegen generated load ↵Craig Topper2018-01-072-2/+4
| | | | | | | | folding tables. EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent. llvm-svn: 321954
* [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.Craig Topper2018-01-073-2/+86
| | | | llvm-svn: 321953
* [X86] Add some 8 and 16-bit instructions to the load folding tables.Craig Topper2018-01-072-3/+7
| | | | llvm-svn: 321952
* [X86] Add EVEX vcvtph2ps to the load folding tables.Craig Topper2018-01-072-1/+13
| | | | llvm-svn: 321951
* [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex ↵Craig Topper2018-01-073-11/+12
| | | | | | | | versions of cvtps2ph to the store folding tables. The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros. llvm-svn: 321950
OpenPOWER on IntegriCloud