summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Fix debug build breakJF Bastien2018-11-261-2/+3
| | | | | | Comment out an assertion from D54543 which failed with error: no member named 'Range' in '(anonymous namespace)::PassAsArgInfo'. llvm-svn: 347616
* Notify the linker when a TU compiled with split-stack has a function without ↵Sterling Augustine2018-11-262-4/+16
| | | | | | | | a prologue. More context here: https://go-review.googlesource.com/c/go/+/148819/ llvm-svn: 347614
* [stack-safety] Inter-Procedural Analysis implementationVitaly Buka2018-11-261-4/+203
| | | | | | | | | | | | | | | | Summary: IPA is implemented as module pass which produce map from Function or Alias to StackSafetyInfo for a single function. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D54543 llvm-svn: 347611
* [stack-safety] Empty local passes for Stack Safety Global AnalysisVitaly Buka2018-11-263-0/+48
| | | | | | | | | | Reviewers: eugenis, vlad.tsyrklevich Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54541 llvm-svn: 347610
* AArch64ISelLowering: Remove a return-of-assignment to allow NRVODavid Blaikie2018-11-261-1/+1
| | | | | | Patch by Arthur O'Dwyer! llvm-svn: 347609
* Add new passes to X86 pipeline testsMircea Trofin2018-11-262-0/+6
| | | | | | | | | | | | | | Summary: Fixes test failures introduced by rL347596. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54916 llvm-svn: 347607
* [X86] Add dependency from X86 to ProfileData after rL347596Fangrui Song2018-11-261-1/+1
| | | | llvm-svn: 347606
* [ICP] Remove incompatible attributes at indirect-call promoted callsites.Xin Tong2018-11-261-2/+27
| | | | | | | | | | | | | | Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605
* [InstCombine] add helper function to reduce code duplication; NFCSanjay Patel2018-11-261-24/+19
| | | | llvm-svn: 347604
* [stack-safety] Local analysis implementationVitaly Buka2018-11-261-5/+377
| | | | | | | | | | | | | | | | Summary: Analysis produces StackSafetyInfo which contains information with how allocas and parameters were used in functions. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54504 llvm-svn: 347603
* [stack-safety] Empty local passes for Stack Safety Local AnalysisVitaly Buka2018-11-265-1/+66
| | | | | | | | | | Reviewers: eugenis, vlad.tsyrklevich Subscribers: mgorny, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54502 llvm-svn: 347602
* [AArch64] Refactor the scheduling predicates (3/3) (NFC)Evandro Menezes2018-11-264-52/+31
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasExtendedReg()`. Differential revision: https://reviews.llvm.org/D54822 llvm-svn: 347599
* [AArch64] Refactor the scheduling predicates (2/3) (NFC)Evandro Menezes2018-11-264-75/+55
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasShiftedReg()`. Differential revision: https://reviews.llvm.org/D54820 llvm-svn: 347598
* [AArch64] Refactor the scheduling predicates (1/3) (NFC)Evandro Menezes2018-11-268-75/+84
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::isScaledAddr()` Differential revision: https://reviews.llvm.org/D54777 llvm-svn: 347597
* Support for inserting profile-directed cache prefetchesMircea Trofin2018-11-265-0/+372
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp. The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54052 llvm-svn: 347596
* AMDGPU: Record SGPR spills when restoring tooMatt Arsenault2018-11-261-1/+3
| | | | | | | | | | | | It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <https://reviews.llvm.org/D54366>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is. llvm-svn: 347595
* [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use ↵Craig Topper2018-11-262-12/+11
| | | | | | | | | | | | SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593
* [ThinLTO] Consolidate cache key computation between new/old LTO APIsTeresa Johnson2018-11-262-92/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The old legacy LTO API had a separate cache key computation, which was a subset of the cache key computation in the new LTO API (from what I can tell this is largely just because certain features such as CFI, dsoLocal, etc are only utilized via the new LTO API). However, having separate computations is unnecessary (much of the code is duplicated), and can lead to bugs when adding new optimizations if both cache computation algorithms aren't updated properly - it's much easier to maintain if we have a single facility. This patch refactors the old LTO API code to use the cache key computation from the new LTO API. To do this, we set up an lto::Config object and fill in the fields that the old LTO was hashing (the others will just use the defaults). There are two notable changes: - I added a Freestanding flag to the LTO Config. Currently this is only used by the legacy LTO API. In the patch that added it (D30791) I had asked about adding it to the new LTO API, but it looks like that was not addressed. This should probably be discussed as a follow up to this change, as it is orthogonal. - The legacy LTO API had some code that was hashing the GUID of all preserved symbols defined in the module. I looked back at the history of this (which was added with the original hashing in the legacy LTO API in D18494), and there is a comment in the review thread that it was added in preparation for future internalization. We now do the internalization of course, and that is handled in the new LTO API cache key computation by hashing the recorded linkage type of all defined globals. Therefore I didn't try to move over and keep the preserved symbols handling. Reviewers: steven_wu, pcc Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D54635 llvm-svn: 347592
* [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking ↵Craig Topper2018-11-261-3/+3
| | | | | | | | | | through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591
* [CodeGen] Support custom format of stack mapsThan McIntosh2018-11-265-11/+28
| | | | | | | | | | | | | | | | | | | | | | | Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584
* Delete dead code introduced in r347354.Erich Keane2018-11-261-4/+0
| | | | | | | | | ParentTy is never used other than an assignment, and since it is a pointer, there is no side effect. Some versions of GCC notice and warn on this. Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f llvm-svn: 347581
* AMDGPU: Don't optimize exec masks at -O0Matt Arsenault2018-11-261-1/+2
| | | | llvm-svn: 347573
* AMDGPU: Only add implicit super-reg def for first subregMatt Arsenault2018-11-261-2/+2
| | | | llvm-svn: 347572
* [CodeGen] Take SPAdj into account for STATEPOINT liveness argsThan McIntosh2018-11-261-1/+1
| | | | | | | | | | | | | | | | | | Summary: STATEPOINT records its args' locations on stack relative to SP. If the SP is changed, take that into account. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D53603 llvm-svn: 347569
* Remove an unnecessary file; NFC.Aaron Ballman2018-11-262-18/+0
| | | | | | This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221). llvm-svn: 347565
* [DemandedBits] Add support for funnel shiftsNikita Popov2018-11-261-0/+21
| | | | | | | | | | | | | | Add support for funnel shifts to the DemandedBits analysis. The demanded bits of the first two operands can be determined if the shift amount is constant. The demanded bits of the third operand (shift amount) can be determined if the bitwidth is a power of two. This is basically the same functionality as implemented in D54869 and D54478, but for DemandedBits rather than InstCombine. Differential Revision: https://reviews.llvm.org/D54876 llvm-svn: 347561
* [x86] promote all multiply i8 by constant to i32Sanjay Patel2018-11-261-26/+37
| | | | | | | | | | | | | | | | | | | | We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of our existing LEA and other multiply expansion magic happens as it would for i32 ops. Some of the test diffs show that we could end up with an actual 32-bit mul instruction here because we choose not to expand to simpler ops. That instruction could be slower depending on the subtarget. On the plus side, this means we don't need a separate instruction to load the constant operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, we could add a later transform that tries to shrink it back to i8 based on subtarget timing. I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means LEA is generally cheap. Differential Revision: https://reviews.llvm.org/D54803 llvm-svn: 347557
* [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEFDiana Picus2018-11-263-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do. llvm-svn: 347545
* Fix typo in comment. NFCDiana Picus2018-11-261-1/+1
| | | | llvm-svn: 347544
* [ARM] Prevent parallel macs for unsigned valuesSam Parker2018-11-261-13/+7
| | | | | | | | | | | | Both zext and sext are currently allowed during the search for narrow sequences and sexts operands are later added to the mac candidates. But operands of muls are also added, without checking whether they're sext or zext, which means we can generate a signed smlad when we shouldn't. Differential Revision: https://reviews.llvm.org/D54790 llvm-svn: 347542
* Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction"Kang Zhang2018-11-261-12/+4
| | | | | | | | This reverts commits r347532. Forget add the option -mtriple powerpc64-unknown-linux-gnu. So other platform is error except for PowerPC. llvm-svn: 347534
* [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instructionKang Zhang2018-11-261-4/+12
| | | | | | | | | | | | | Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 347532
* [Support/FileSystem] Add sub-second precision for atime/mtime of ↵Argyrios Kyrtzidis2018-11-261-5/+16
| | | | | | | | | | | | | | | | | | | | sys::fs::file_status on unix platforms Summary: getLastAccessedTime() and getLastModificationTime() provided times in nanoseconds but with only 1 second resolution, even when the underlying file system could provide more precise times than that. These changes add sub-second precision for unix platforms that support improved precision. Also add some comments to make sure people are aware that the resolution of times can vary across different file systems. Reviewers: labath, zturner, aaron.ballman, kristina Reviewed By: aaron.ballman, kristina Subscribers: lebedev.ri, mgorny, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54826 llvm-svn: 347530
* [x86] limit transform for select-of-fp-constantsSanjay Patel2018-11-253-0/+13
| | | | | | | | | | | | This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526
* [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.Florian Hahn2018-11-251-5/+5
| | | | | | | | | | | OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524
* [SelectionDAG] move constant or splat functions to common locationSanjay Patel2018-11-252-39/+28
| | | | | | | | rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523
* [X86] Synchronize a macro in getAvailableFeatures in Host.cpp with the same ↵Craig Topper2018-11-241-3/+3
| | | | | | macro in compiler-rt to fix a negative shift amount warning. llvm-svn: 347518
* [InstCombine] Determine demanded and known bits for funnel shiftsNikita Popov2018-11-241-0/+24
| | | | | | | | | | | | | | Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515
* Revert unapproved commitJoel Jones2018-11-241-154/+4
| | | | llvm-svn: 347511
* [AArch64] Enable libm vectorized functions via SLEEFJoel Jones2018-11-241-4/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 llvm-svn: 347510
* [ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494Fangrui Song2018-11-231-1/+1
| | | | | | This fixes -DBUILD_SHARED_LIBS=on llvm-svn: 347506
* [InstCombine] Simplify funnel shift with zero/undef operand to shiftNikita Popov2018-11-231-0/+23
| | | | | | | | | | | | | | | | | | | | The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505
* [DAG] consolidate shift simplificationsSanjay Patel2018-11-232-74/+58
| | | | | | | | | | ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502
* Revert r347490 as it breaks address sanitizer buildsLuke Cheeseman2018-11-2314-88/+11
| | | | llvm-svn: 347499
* [ARM][AsmParser] Improve debug printing of parsed asm operandsOliver Stannard2018-11-231-19/+39
| | | | | | | | | | | | | | In ARMOperand::print: - Print human-readable register names, instead of numbers. - Print the correct names for IT condition masks (these were in the wrong order before). - Print all parts of memory operands, not just the base register. This makes the output of llvm-mc -show-inst-operands more readable. Differential revision: https://reviews.llvm.org/D54850 llvm-svn: 347494
* Attempt to fix buildbot after r347489Eugene Leviant2018-11-231-1/+1
| | | | llvm-svn: 347492
* Revert r343341Luke Cheeseman2018-11-2314-11/+88
| | | | | | | - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490
* [ThinLTO] Assembly representation of ReadOnly attributeEugene Leviant2018-11-235-24/+75
| | | | | | Differential revision: https://reviews.llvm.org/D54754 llvm-svn: 347489
* Disable LoopSimplifyCFG terminator folding by defaultMax Kazantsev2018-11-231-0/+6
| | | | llvm-svn: 347486
* [LoopSimplifyCFG] Don't delete LCSSA PhisMax Kazantsev2018-11-231-1/+4
| | | | | | | | | | | | When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484
OpenPOWER on IntegriCloud