summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Refactor the scheduling predicates (3/3) (NFC)Evandro Menezes2018-11-264-52/+31
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasExtendedReg()`. Differential revision: https://reviews.llvm.org/D54822 llvm-svn: 347599
* [AArch64] Refactor the scheduling predicates (2/3) (NFC)Evandro Menezes2018-11-264-75/+55
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasShiftedReg()`. Differential revision: https://reviews.llvm.org/D54820 llvm-svn: 347598
* [AArch64] Refactor the scheduling predicates (1/3) (NFC)Evandro Menezes2018-11-268-75/+84
| | | | | | | | | Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::isScaledAddr()` Differential revision: https://reviews.llvm.org/D54777 llvm-svn: 347597
* Support for inserting profile-directed cache prefetchesMircea Trofin2018-11-265-0/+372
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp. The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54052 llvm-svn: 347596
* AMDGPU: Record SGPR spills when restoring tooMatt Arsenault2018-11-261-1/+3
| | | | | | | | | | | | It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <https://reviews.llvm.org/D54366>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is. llvm-svn: 347595
* [LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use ↵Craig Topper2018-11-262-12/+11
| | | | | | | | | | | | SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593
* [ThinLTO] Consolidate cache key computation between new/old LTO APIsTeresa Johnson2018-11-262-92/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The old legacy LTO API had a separate cache key computation, which was a subset of the cache key computation in the new LTO API (from what I can tell this is largely just because certain features such as CFI, dsoLocal, etc are only utilized via the new LTO API). However, having separate computations is unnecessary (much of the code is duplicated), and can lead to bugs when adding new optimizations if both cache computation algorithms aren't updated properly - it's much easier to maintain if we have a single facility. This patch refactors the old LTO API code to use the cache key computation from the new LTO API. To do this, we set up an lto::Config object and fill in the fields that the old LTO was hashing (the others will just use the defaults). There are two notable changes: - I added a Freestanding flag to the LTO Config. Currently this is only used by the legacy LTO API. In the patch that added it (D30791) I had asked about adding it to the new LTO API, but it looks like that was not addressed. This should probably be discussed as a follow up to this change, as it is orthogonal. - The legacy LTO API had some code that was hashing the GUID of all preserved symbols defined in the module. I looked back at the history of this (which was added with the original hashing in the legacy LTO API in D18494), and there is a comment in the review thread that it was added in preparation for future internalization. We now do the internalization of course, and that is handled in the new LTO API cache key computation by hashing the recorded linkage type of all defined globals. Therefore I didn't try to move over and keep the preserved symbols handling. Reviewers: steven_wu, pcc Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D54635 llvm-svn: 347592
* [SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking ↵Craig Topper2018-11-261-3/+3
| | | | | | | | | | through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591
* [CodeGen] Support custom format of stack mapsThan McIntosh2018-11-265-11/+28
| | | | | | | | | | | | | | | | | | | | | | | Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584
* Delete dead code introduced in r347354.Erich Keane2018-11-261-4/+0
| | | | | | | | | ParentTy is never used other than an assignment, and since it is a pointer, there is no side effect. Some versions of GCC notice and warn on this. Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f llvm-svn: 347581
* AMDGPU: Don't optimize exec masks at -O0Matt Arsenault2018-11-261-1/+2
| | | | llvm-svn: 347573
* AMDGPU: Only add implicit super-reg def for first subregMatt Arsenault2018-11-261-2/+2
| | | | llvm-svn: 347572
* [CodeGen] Take SPAdj into account for STATEPOINT liveness argsThan McIntosh2018-11-261-1/+1
| | | | | | | | | | | | | | | | | | Summary: STATEPOINT records its args' locations on stack relative to SP. If the SP is changed, take that into account. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D53603 llvm-svn: 347569
* Remove an unnecessary file; NFC.Aaron Ballman2018-11-262-18/+0
| | | | | | This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221). llvm-svn: 347565
* [DemandedBits] Add support for funnel shiftsNikita Popov2018-11-261-0/+21
| | | | | | | | | | | | | | Add support for funnel shifts to the DemandedBits analysis. The demanded bits of the first two operands can be determined if the shift amount is constant. The demanded bits of the third operand (shift amount) can be determined if the bitwidth is a power of two. This is basically the same functionality as implemented in D54869 and D54478, but for DemandedBits rather than InstCombine. Differential Revision: https://reviews.llvm.org/D54876 llvm-svn: 347561
* [x86] promote all multiply i8 by constant to i32Sanjay Patel2018-11-261-26/+37
| | | | | | | | | | | | | | | | | | | | We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of our existing LEA and other multiply expansion magic happens as it would for i32 ops. Some of the test diffs show that we could end up with an actual 32-bit mul instruction here because we choose not to expand to simpler ops. That instruction could be slower depending on the subtarget. On the plus side, this means we don't need a separate instruction to load the constant operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, we could add a later transform that tries to shrink it back to i8 based on subtarget timing. I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means LEA is generally cheap. Differential Revision: https://reviews.llvm.org/D54803 llvm-svn: 347557
* [ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEFDiana Picus2018-11-263-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do. llvm-svn: 347545
* Fix typo in comment. NFCDiana Picus2018-11-261-1/+1
| | | | llvm-svn: 347544
* [ARM] Prevent parallel macs for unsigned valuesSam Parker2018-11-261-13/+7
| | | | | | | | | | | | Both zext and sext are currently allowed during the search for narrow sequences and sexts operands are later added to the mac candidates. But operands of muls are also added, without checking whether they're sext or zext, which means we can generate a signed smlad when we shouldn't. Differential Revision: https://reviews.llvm.org/D54790 llvm-svn: 347542
* Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction"Kang Zhang2018-11-261-12/+4
| | | | | | | | This reverts commits r347532. Forget add the option -mtriple powerpc64-unknown-linux-gnu. So other platform is error except for PowerPC. llvm-svn: 347534
* [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instructionKang Zhang2018-11-261-4/+12
| | | | | | | | | | | | | Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 347532
* [Support/FileSystem] Add sub-second precision for atime/mtime of ↵Argyrios Kyrtzidis2018-11-261-5/+16
| | | | | | | | | | | | | | | | | | | | sys::fs::file_status on unix platforms Summary: getLastAccessedTime() and getLastModificationTime() provided times in nanoseconds but with only 1 second resolution, even when the underlying file system could provide more precise times than that. These changes add sub-second precision for unix platforms that support improved precision. Also add some comments to make sure people are aware that the resolution of times can vary across different file systems. Reviewers: labath, zturner, aaron.ballman, kristina Reviewed By: aaron.ballman, kristina Subscribers: lebedev.ri, mgorny, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54826 llvm-svn: 347530
* [x86] limit transform for select-of-fp-constantsSanjay Patel2018-11-253-0/+13
| | | | | | | | | | | | This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526
* [IPSCCP] Use input operand instead of OriginalOp for ssa_copy.Florian Hahn2018-11-251-5/+5
| | | | | | | | | | | OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524
* [SelectionDAG] move constant or splat functions to common locationSanjay Patel2018-11-252-39/+28
| | | | | | | | rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523
* [X86] Synchronize a macro in getAvailableFeatures in Host.cpp with the same ↵Craig Topper2018-11-241-3/+3
| | | | | | macro in compiler-rt to fix a negative shift amount warning. llvm-svn: 347518
* [InstCombine] Determine demanded and known bits for funnel shiftsNikita Popov2018-11-241-0/+24
| | | | | | | | | | | | | | Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515
* Revert unapproved commitJoel Jones2018-11-241-154/+4
| | | | llvm-svn: 347511
* [AArch64] Enable libm vectorized functions via SLEEFJoel Jones2018-11-241-4/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 llvm-svn: 347510
* [ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494Fangrui Song2018-11-231-1/+1
| | | | | | This fixes -DBUILD_SHARED_LIBS=on llvm-svn: 347506
* [InstCombine] Simplify funnel shift with zero/undef operand to shiftNikita Popov2018-11-231-0/+23
| | | | | | | | | | | | | | | | | | | | The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505
* [DAG] consolidate shift simplificationsSanjay Patel2018-11-232-74/+58
| | | | | | | | | | ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502
* Revert r347490 as it breaks address sanitizer buildsLuke Cheeseman2018-11-2314-88/+11
| | | | llvm-svn: 347499
* [ARM][AsmParser] Improve debug printing of parsed asm operandsOliver Stannard2018-11-231-19/+39
| | | | | | | | | | | | | | In ARMOperand::print: - Print human-readable register names, instead of numbers. - Print the correct names for IT condition masks (these were in the wrong order before). - Print all parts of memory operands, not just the base register. This makes the output of llvm-mc -show-inst-operands more readable. Differential revision: https://reviews.llvm.org/D54850 llvm-svn: 347494
* Attempt to fix buildbot after r347489Eugene Leviant2018-11-231-1/+1
| | | | llvm-svn: 347492
* Revert r343341Luke Cheeseman2018-11-2314-11/+88
| | | | | | | - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490
* [ThinLTO] Assembly representation of ReadOnly attributeEugene Leviant2018-11-235-24/+75
| | | | | | Differential revision: https://reviews.llvm.org/D54754 llvm-svn: 347489
* Disable LoopSimplifyCFG terminator folding by defaultMax Kazantsev2018-11-231-0/+6
| | | | llvm-svn: 347486
* [LoopSimplifyCFG] Don't delete LCSSA PhisMax Kazantsev2018-11-231-1/+4
| | | | | | | | | | | | When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484
* [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading ↵Craig Topper2018-11-231-0/+9
| | | | | | | | | | | | towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482
* [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to ↵Craig Topper2018-11-221-1/+7
| | | | | | | | | | SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479
* [DAGCombiner] form 'not' ops ahead of shifts (PR39657)Sanjay Patel2018-11-221-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478
* [NFC] Assert that all blocks staying in loop are liveMax Kazantsev2018-11-221-0/+2
| | | | llvm-svn: 347458
* [NFC] Ensure deterministic order of dead exit blocksMax Kazantsev2018-11-221-6/+11
| | | | llvm-svn: 347457
* [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTORJohn Brawn2018-11-222-1/+10
| | | | | | | | | | | | | A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456
* [NFC] Simplify code by using standard exit blocks collectionMax Kazantsev2018-11-221-10/+8
| | | | llvm-svn: 347454
* [TI removal] Leverage the fact that TerminatorInst is gone to createChandler Carruth2018-11-222-37/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a normal base class that provides all common "call" functionality. This merges two complex CRTP mixins for the common "call" logic and common operand bundle logic into a single, normal base class of `CallInst` and `InvokeInst`. Going forward, users can typically `dyn_cast<CallBase>` and use the resulting API. No more need for the `CallSite` wrapper. I'm planning to migrate current usage of the wrapper to directly use the base class and then it can be removed, but those are simpler and much more incremental steps. The big change is to introduce this abstraction into the type system. I've tried to do some basic simplifications of the APIs that I couldn't really help but touch as part of this: - I've tried to organize the attribute API and bundle API into groups to make understanding the API of `CallBase` easier. Without this, I wasn't able to navigate the API sanely for all of the ways I needed to modify it. - I've added what seem like more clear and consistent APIs for getting at the called operand. These ended up being especially useful to consolidate the *numerous* duplicated code paths trying to do this. - I've largely reworked the organization and implementation of the APIs for computing the argument operands as they needed to change to work with the new subclass approach. To minimize any cost associated with this abstraction, I've moved the operand layout in memory to store the called operand last. This makes its position relative to the end of the operand array the same, regardless of the subclass. It should make it much cheaper to reference from the `CallBase` abstraction, and this is likely one of the most frequent things to query. We do still pay one abstraction penalty here: we have to branch to determine whether there are 0 or 2 extra operands when computing the end of the argument operand sequence. However, that seems both rare and should optimize well. I've implemented this in a way specifically designed to allow it to optimize fairly well. If this shows up in profiles, we can add overrides of the relevant methods to the subclasses that bypass this penalty. It seems very unlikely that this will be an issue as the code was *already* dealing with an ever present abstraction of whether or not there are operand bundles, so this isn't the first branch to go into the computation. I've tried to remove as much of the obvious vestigial API surface of the old CRTP implementation as I could, but I suspect there is further cleanup that should now be possible, especially around the operand bundle APIs. I'm leaving all of that for future work in this patch as enough things are changing here as-is. One thing that made this harder for me to reason about and debug was the pervasive use of unsigned values in subtraction and other arithmetic computations. I had to debug more than one unintentional wrap. I've switched a few of these to use `int` which seems substantially simpler, but I've held back from doing this more broadly to avoid creating confusing divergence within a single class's API. I also worked to remove all of the magic numbers used to index into operands, putting them behind named constants or putting them into a single method with a comment and strictly using the method elsewhere. This was necessary to be able to re-layout the operands as discussed above. Thanks to Ben for reviewing this (somewhat large and awkward) patch! Differential Revision: https://reviews.llvm.org/D54788 llvm-svn: 347452
* [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.Jonas Paulsson2018-11-222-0/+33
| | | | | | | | | | Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445
* [PM] correcting return value for new-pass-manager version of ScalarizerFedor Sergeev2018-11-211-2/+2
| | | | | | Obvious mistake missed during D54695 review. llvm-svn: 347432
* [mingw] Use unmangled name after the $ in the section nameReid Kleckner2018-11-211-2/+3
| | | | | | | | | | | | | | | GCC does it this way, and we have to be consistent. This includes stdcall and fastcall functions with suffixes. I confirmed that a fastcall function named "foo" ends up in ".text$foo", not ".text$@foo@8". Based on a patch by Andrew Yohn! Fixes PR39218. Differential Revision: https://reviews.llvm.org/D54762 llvm-svn: 347431
OpenPOWER on IntegriCloud