summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/Inline
Commit message (Collapse)AuthorAgeFilesLines
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-246-12/+12
| | | | as cleanups after D56351
* [InlineCost] Fix infinite loop in indirect call evaluationEhud Katz2019-11-281-0/+55
| | | | | | | | | | | | | | | Currently every time we encounter an indirect call of a known function, we try to evaluate the inline cost of that function. In case of a recursion, that evaluation never stops. The solution I propose is to evaluate only the indirect call of the function, while any further indirect calls (of a known function) will be treated just as direct function calls, which, actually, never tries to evaluate the call. Fixes PR35469. Differential Revision: https://reviews.llvm.org/D69349
* Revert "[InlineCost] Fix infinite loop in indirect call evaluation"Ehud Katz2019-11-231-55/+0
| | | | | | | This reverts commit 854e956219e78cb8d7ef3b021d7be6b5d6b6af04. It broke tests: Transforms/Inline/redundant-loads.ll Transforms/SampleProfile/inline-callee-update.ll
* [InlineCost] Fix infinite loop in indirect call evaluationEhud Katz2019-11-231-0/+55
| | | | | | | | | | | | | | | Currently every time we encounter an indirect call of a known function, we try to evaluate the inline cost of that function. In case of a recursion, that evaluation never stops. The solution presented is to evaluate only the indirect call of the function, while any further indirect calls (of a known function) will be treated just as direct function calls, which, actually, never tries to evaluate the call. Fixes PR35469. Differential Revision: https://reviews.llvm.org/D69349
* Recommit "[CodeView] Add option to disable inline line tables."Amy Huang2019-11-041-0/+99
| | | | | | | | | | | | This reverts commit 004ed2b0d1b86d424643ffc88fce20ad8bab6804. Original commit hash 6d03890384517919a3ba7fe4c35535425f278f89 Summary: This adds a clang option to disable inline line tables. When it is used, the inliner uses the call site as the location of the inlined function instead of marking it as an inline location with the function location. https://reviews.llvm.org/D67723
* Revert "[CodeView] Add option to disable inline line tables."Amy Huang2019-10-301-99/+0
| | | | | | because it breaks compiler-rt tests. This reverts commit 6d03890384517919a3ba7fe4c35535425f278f89.
* [CodeView] Add option to disable inline line tables.Amy Huang2019-10-301-0/+99
| | | | | | | | | | | | | | | | | Summary: This adds a clang option to disable inline line tables. When it is used, the inliner uses the call site as the location of the inlined function instead of marking it as an inline location with the function location. See https://bugs.llvm.org/show_bug.cgi?id=42344 Reviewers: rnk Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67723
* [utils] InlineFunction: fix for debug info affecting optimizationsBjorn Pettersson2019-10-281-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Debug info affects output from "opt -inline", InlineFunction could not handle the llvm.dbg.value when it exist between alloca instructions. Problem was that the first alloca in a sequence of allocas was handled differently from the subsequence alloca instructions. Now all static alloca instructions are treated the same (being removed if the have no uses). So it does not matter if there are dbg instructions (or any other instructions) in between. Fix the issue: https://bugs.llvm.org/show_bug.cgi?id=43291k Patch by: yechunliang (Chris Ye) Reviewers: bjope, jmorse, vsk, probinson, jdoerfert, mtrofin, aprantl, fhahn Reviewed By: bjope Subscribers: uabelho, ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68633
* [Inliner] Remove incorrect early exit during switch cost computationTeresa Johnson2019-09-201-0/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The CallAnalyzer::visitSwitchInst has an early exit when the estimated lower bound of the switch cost will put the overall cost of the inline above the threshold. However, this code is not correctly estimating the lower bound for switches that can be transformed into bit tests, leading to unnecessary lost inlines, and also differing behavior with optimization remarks enabled. First, the early exit is controlled by whether ComputeFullInlineCost is enabled or not, and that in turn is disabled by default but enabled when enabling -pass-remarks=missed. This by itself wouldn't lead to a problem, except that as described below, the lower bound can be above the real lower bound, so we can sometimes get different inline decisions with inline remarks enabled, which is problematic. The early exit was added in along with a new switch cost model in D31085. The reason why this early exit was added is due to a concern one reviewer raised about compile time for large switches: https://reviews.llvm.org/D31085?id=94559#inline-276200 However, the code just below there calls getEstimatedNumberOfCaseClusters, which in turn immediately calls BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case does a linear scan of the cases to get the high and low values. The bit test handling in particular is guarded by whether the number of cases fits into the max bit width. There is no suggestion that anyone measured a compile time issue, it appears to be theoretical. The problem is that the reviewer's comment about the lower bound calculation is incorrect, specifically in the case of a switch that can be lowered to a bit test. This isn't followed up on the comment thread, but the author does add a FIXME to that effect above the early exit added when they subsequently revised the patch. As a result, we were incorrectly early exiting and not inlining functions with switch statements that would be lowered to bit tests in cases where we were nearing the threshold. Combined with the fact that this early exit was skipped with opt remarks enabled, this caused different inlining decisions to be made when -pass-remarks=missed is enabled to debug the missing inline. Remove the early exit for the above reasons. I also copied over an existing AArch64 inlining test to X86, and adjusted the threshold so that the bit test inline only occurs with the fix in this patch. Reviewers: davidxl Subscribers: eraman, kristof.beyls, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67716 llvm-svn: 372440
* [Inliner][NFC] Make test less brittle.Clement Courbet2019-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: This tests inlining size thresholds, but relies on the output of running the full O2 pipeline, making it brittle against changes in unrelated passes. Only run the inlining pass and set thresholds on the test RUN line instead. Found while investigating D60318. Reviewers: RKSimon, qcolombet Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67349 llvm-svn: 371397
* [FunctionAttrs] Annotate "willreturn" for intrinsicsHideto Ueno2019-07-282-2/+2
| | | | | | | | | | | | | | | | | | | Summary: In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef). In this patch, willreturn is annotated for LLVM intrinsics. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64904 llvm-svn: 367184
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-171-0/+31
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* ARM MTE stack sanitizer.Evgeniy Stepanov2019-07-151-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Add "memtag" sanitizer that detects and mitigates stack memory issues using armv8.5 Memory Tagging Extension. It is similar in principle to HWASan, which is a software implementation of the same idea, but there are enough differencies to warrant a new sanitizer type IMHO. It is also expected to have very different performance properties. The new sanitizer does not have a runtime library (it may grow one later, along with a "debugging" mode). Similar to SafeStack and StackProtector, the instrumentation pass (in a follow up change) will be inserted in all cases, but will only affect functions marked with the new sanitize_memtag attribute. Reviewers: pcc, hctim, vitalybuka, ostannard Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64169 llvm-svn: 366123
* Revert [InlineCost] cleanup calculations of Cost and ThresholdJordan Rupprecht2019-07-031-6/+6
| | | | | | | | This reverts r364422 (git commit 1a3dc761860d620ac8ed7e32a4285952142f780b) The inlining cost calculation is incorrect, leading to stack overflow due to large stack frames from heavy inlining. llvm-svn: 365000
* [InlineCost] cleanup calculations of Cost and ThresholdFedor Sergeev2019-06-261-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Doing better separation of Cost and Threshold. Cost counts the abstract complexity of live instructions, while Threshold is an upper bound of complexity that inlining is comfortable to pay. There are two parts: - huge 15K last-call-to-static bonus is no longer subtracted from Cost but rather is now added to Threshold. That makes much more sense, as the cost of inlining (Cost) is not changed by the fact that internal function is called once. It only changes the likelyhood of this inlining being profitable (Threshold). - bonus for calls proved-to-be-inlinable into callee is no longer subtracted from Cost but added to Threshold instead. While calculations are somewhat different, overall InlineResult should stay the same since Cost >= Threshold compares the same. Reviewers: eraman, greened, chandlerc, yrouban, apilipenko Reviewed By: apilipenko Tags: #llvm Differential Revision: https://reviews.llvm.org/D60740 llvm-svn: 364422
* [lit] Delete empty lines at the end of lit.local.cfg NFCFangrui Song2019-06-172-2/+0
| | | | llvm-svn: 363538
* [NFC] Added test from PR42084 for D63058David Bolvansky2019-06-091-0/+66
| | | | llvm-svn: 362906
* [InlineCost] Add support for unary fneg.Craig Topper2019-06-061-0/+31
| | | | | | | | | | This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost. Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now. Differential Revision: https://reviews.llvm.org/D62699 llvm-svn: 362732
* [InlineCost] Don't add the soft float function call cost for the fneg idiom, ↵Craig Topper2019-06-011-0/+25
| | | | | | | | | | | | | | | | | | fsub -0.0, %x Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699 Reviewers: efriedma, cameron.mcinally Reviewed By: efriedma Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62747 llvm-svn: 362304
* [AMDGPU] Use InliningThresholdMultiplier for inline hintStanislav Mekhanoshin2019-05-311-0/+77
| | | | | | | | | | | | | | AMDGPU uses multiplier 9 for the inline cost. It is taken into account everywhere except for inline hint threshold. As a result we are penalizing functions with the inline hint making them less probable to be inlined than those without the hint. Defaults are 225 for a normal function and 325 for a function with an inline hint. Currently we have effective threshold 225 * 9 = 2025 for normal functions and just 325 for those with the hint. That is fixed by this patch. Differential Revision: https://reviews.llvm.org/D62707 llvm-svn: 362239
* Reapply: IR: add optional type to 'byval' function parametersTim Northover2019-05-301-2/+2
| | | | | | | | | | | | | | | | | When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128
* Revert "IR: add optional type to 'byval' function parameters"Tim Northover2019-05-291-2/+2
| | | | | | | The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029
* IR: add optional type to 'byval' function parametersTim Northover2019-05-291-2/+2
| | | | | | | | | | | | | | When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012
* [ARM] Replace fp-only-sp and d16 with fp64 and d32.Simon Tatham2019-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list *before* rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845
* AMDGPU: Boost inline threshold with addrspacecasted alloca argumentsMatt Arsenault2019-05-241-0/+70
| | | | | | | This was skipping GetUnderlyingObject for nonprivate addresses, but an alloca could also be found through an addrspacecast if it's flat. llvm-svn: 361649
* [INLINER] allow inlining of blockaddresses if sole uses are callbrsNick Desaulniers2019-05-202-0/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: It was supposed that Ref LazyCallGraph::Edge's were being inserted by inlining, but that doesn't seem to be the case. Instead, it seems that there was no test for a blockaddress Constant in an instruction that referenced the function that contained the instruction. Ex: ``` define void @f() { %1 = alloca i8*, align 8 2: store i8* blockaddress(@f, %2), i8** %1, align 8 ret void } ``` When iterating blockaddresses, do not add the function they refer to back to the worklist if the blockaddress is referring to the contained function (as opposed to an external function). Because blockaddress has sligtly different semantics than GNU C's address of labels, there are 3 cases that can occur with blockaddress, where only 1 can happen in GNU C due to C's scoping rules: * blockaddress is within the function it refers to (possible in GNU C). * blockaddress is within a different function than the one it refers to (not possible in GNU C). * blockaddress is used in to declare a global (not possible in GNU C). The second case is tested in: ``` $ ./llvm/build/unittests/Analysis/AnalysisTests \ --gtest_filter=LazyCallGraphTest.HandleBlockAddress ``` This patch adjusts the iteration of blockaddresses in LazyCallGraph::visitReferences to not revisit the blockaddresses function in the first case. The Linux kernel contains code that's not semantically valid at -O0; specifically code passed to asm goto. It requires that asm goto be inline-able. This patch conservatively does not attempt to handle the more general case of inlining blockaddresses that have non-callbr users (pr/39560). https://bugs.llvm.org/show_bug.cgi?id=39560 https://bugs.llvm.org/show_bug.cgi?id=40722 https://github.com/ClangBuiltLinux/linux/issues/6 https://reviews.llvm.org/rL212077 Reviewers: jyknight, eli.friedman, chandlerc Reviewed By: chandlerc Subscribers: george.burgess.iv, nathanchance, mgorny, craig.topper, mengxu.gatech, void, mehdi_amini, E5ten, chandlerc, efriedma, eraman, hiraditya, haicheng, pirama, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D58260 llvm-svn: 361173
* Resubmit "[DebugInfo] Update loop metadata for inlined loops"Orlando Cazalet-Hyams2019-05-201-0/+159
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 95805bc425b264805a472232a75ed2ffe58aceda. I've squashed the test fix into this commit. [DebugInfo] Update loop metadata for inlined loops Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A). This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this. This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders. Reviewers: aprantl, dblaikie, anemet Reviewed By: aprantl Subscribers: eraman, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D61933 llvm-svn: 361149
* Revert "[DebugInfo] Update loop metadata for inlined loops"Orlando Cazalet-Hyams2019-05-201-108/+0
| | | | | | | This reverts commit 6e8f1a80cd988db8870aff9c3bc2ca7a20e04104. Reverting patch while investigating build bot failure. llvm-svn: 361143
* [DebugInfo] Update loop metadata for inlined loopsOrlando Cazalet-Hyams2019-05-201-0/+108
| | | | | | | | | | | | | | | | | | | | | Summary: Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A). This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this. This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders. Reviewers: aprantl, dblaikie, anemet Reviewed By: aprantl Subscribers: eraman, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D61933 llvm-svn: 361132
* AMDGPU: Assume xnack is enabled by defaultMatt Arsenault2019-05-161-0/+67
| | | | | | | | | | | This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
* Revert "Temporarily Revert "Add basic loop fusion pass.""Eric Christopher2019-04-17188-0/+17222
| | | | | | | | The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
* Temporarily Revert "Add basic loop fusion pass."Eric Christopher2019-04-17188-17222/+0
| | | | | | | | As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-0/+70
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Fix test filenameMatt Arsenault2019-04-021-0/+0
| | | | llvm-svn: 357441
* AMDGPU: Remove dx10-clamp from subtarget featuresMatt Arsenault2019-03-292-0/+197
| | | | | | | | | | | | | | | | | | Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302
* [NewPM] Fix a nasty bug with analysis invalidation in the new PM.Chandler Carruth2019-03-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly *but* it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137
* [X86] Filter out tuning feature flags and a few ISA feature flags when ↵Craig Topper2019-02-192-0/+58
| | | | | | | | | | | | | | | | checking for function inline compatibility. Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining. There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present. Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible. I've based the implementation here of the similar code in AMDGPU. Differential Revision: https://reviews.llvm.org/D58371 llvm-svn: 354355
* AMDGPU: Ignore CodeObjectV3 when inliningMatt Arsenault2019-02-121-0/+13
| | | | | | | | | | This was inhibiting inlining of library functions when clang was invoking the inliner directly. This is covering a bit of a mess with subtarget feature handling, and this shouldn't be a subtarget feature. The behavior is different depending on whether you are using a -mattr flag in clang, or llc, opt. llvm-svn: 353899
* Provide reason messages for unviable inliningYevgeny Rouban2019-02-011-0/+10
| | | | | | | | | | | | | InlineCost's isInlineViable() is changed to return InlineResult instead of bool. This provides messages for failure reasons and allows to get more specific messages for cases where callsites are not viable for inlining. Reviewed By: xbolva00, anemet Differential Revision: https://reviews.llvm.org/D57089 llvm-svn: 352849
* [Inliner] Assert that the computed inline threshold is non-negative.Easwaran Raman2019-01-091-2/+5
| | | | | | | | | | Reviewers: chandlerc Subscribers: haicheng, llvm-commits Differential Revision: https://reviews.llvm.org/D56409 llvm-svn: 350751
* Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.Michael Kruse2018-12-203-9/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725
* [SampleFDO] handle ProfileSampleAccurate when initializing function entry countWei Mi2018-12-131-47/+0
| | | | | | | | | | | | | | ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088
* [Inliner] Modify the merging of min-legal-vector-width attribute to better ↵Craig Topper2018-11-291-1/+16
| | | | | | | | | | handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute. llvm-svn: 347841
* [Inliner] Add test for merging of min-legal-vector-width function attribute.Craig Topper2018-11-291-0/+29
| | | | | | This should have been added in r337844, but apparently was I failed to 'git add' the file. llvm-svn: 347840
* [Inliner] Penalise inlining of calls with loops at OzDavid Green2018-11-053-0/+231
| | | | | | | | | | | | | | | | | | We currently seem to underestimate the size of functions with loops in them, both in terms of absolute code size and in the difficulties of dealing with such code. (Calls, for example, can be tail merged to further reduce codesize). At -Oz, we can then increase code size by inlining small loops multiple times. This attempts to penalise functions with loops at -Oz by adding a CallPenalty for each top level loop in the function. It uses LI (and hence DT) to calculate the number of loops. As we are dealing with minsize, the inline threshold is small and functions at this point should be relatively small, making the construction of these cheap. Differential Revision: https://reviews.llvm.org/D52716 llvm-svn: 346134
* [PM] keeping history when original SCC split and then merge into itselfWei Mi2018-10-232-38/+117
| | | | | | | | | | | | | | | | | | | | in the same round of SCC update. In https://reviews.llvm.org/rL309784, inline history is added to prevent infinite inlining across multiple run of inliner and SCC update, but the history will only be kept when new SCC is actually generated during SCC update. We found a case that SCC can be split and then merge into itself in the same round of SCC update, so the same SCC will be pop out from UR.CWorklist and then added back immediately, without any new SCC generated, that is why the existing patch cannot catch the infinite inline case. What the patch does is even if no new SCC is generated, if only the current SCC appears in UR.CWorklist again, then keep the inline history. Differential Revision: https://reviews.llvm.org/D52915 llvm-svn: 345103
* Make YAML quote forward slashes.Zachary Turner2018-10-122-7/+7
| | | | | | | | | | | | | | | | | If you have the string /usr/bin, prior to this patch it would not be quoted by our YAML serializer. But a string like C:\src would be, due to the presence of a backslash. This makes the quoting rules of basically every single file path different depending on the path syntax (posix vs. Windows). While technically not required by the YAML specification to quote forward slashes, when the behavior of paths is inconsistent it makes it difficult to portably write FileCheck lines that will work with either kind of path. Differential Revision: https://reviews.llvm.org/D53169 llvm-svn: 344359
* Revert "Make YAML quote forward slashes."Zachary Turner2018-10-122-7/+7
| | | | | | | | | | This reverts commit b86c16ad8c97dadc1f529da72a5bb74e9eaed344. This is being reverted because I forgot to write a useful commit message, so I'm going to resubmit it with an actual commit message. llvm-svn: 344358
* Make YAML quote forward slashes.Zachary Turner2018-10-122-7/+7
| | | | llvm-svn: 344357
* [TailCallElim] Enable marking of calls with byval as tailsRobert Lougher2018-10-081-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In r339636 the alias analysis rules were changed with regards to tail calls and byval arguments. Previously, tail calls were assumed not to alias allocas from the current frame. This has been updated, to not assume this for arguments with the byval attribute. This patch aligns TailCallElim with the new rule. Tail marking can now be more aggressive and mark more calls as tails, e.g.: define void @test() { %f = alloca %struct.foo call void @bar(%struct.foo* byval %f) ret void } define void @test2(%struct.foo* byval %f) { call void @bar(%struct.foo* byval %f) ret void } define void @test3(%struct.foo* byval %f) { %agg.tmp = alloca %struct.foo %0 = bitcast %struct.foo* %agg.tmp to i8* %1 = bitcast %struct.foo* %f to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false) call void @bar(%struct.foo* byval %agg.tmp) ret void } The problematic case where a byval parameter is captured by a call is still handled correctly, and will not be marked as a tail (see PR7272). llvm-svn: 343986
OpenPOWER on IntegriCloud