summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [SROA] Propagate !range metadata when moving loads.Davide Italiano2017-11-271-0/+20
| | | | | | | | | | | | | This tries to propagate !range metadata to a pre-existing load when a load is optimized out. This is done instead of adding an assume because converting loads to and from assumes creates a lot of IR. Patch by Ariel Ben-Yehuda. Differential Revision: https://reviews.llvm.org/D37216 llvm-svn: 319096
* [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend ↵Sanjay Patel2017-11-271-11/+13
| | | | | | | | | | | on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094
* Inliner: Don't mark notail calls with the 'tail' attributeArnold Schwaighofer2017-11-271-0/+15
| | | | | | | | | | | | enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2, TCK_NoTail = 3 }; TCK_NoTail is greater than TCK_Tail so taking the min does not do the correct thing. rdar://35639547 llvm-svn: 319075
* [InstSimplify] add fcmp with negative constant tests; NFCSanjay Patel2017-11-271-0/+46
| | | | | | This is a superset of the tests proposed with D40012 to show another potential improvement. llvm-svn: 319041
* [CGP] Fix handling of null pointer values in optimizeMemoryInstJohn Brawn2017-11-271-0/+52
| | | | | | | | | | | The current way that trivial addressing modes are detected incorrectly thinks that null pointers are non-trivial, leading to an infinite loop where we keep duplicating the same select. Fix this by aware of null when deciding if an addressing mode is trivial. Differential Revision: https://reviews.llvm.org/D40447 llvm-svn: 319019
* [CodeGenPrepare] Check that erased sunken address are not reusedSimon Dardis2017-11-242-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | CodeGenPrepare sinks address computations from one basic block to another and attempts to reuse address computations that have already been sunk. If the same address computation appears twice with the first instance as an operand of a load whose result is an operand to a simplifable select, CodeGenPrepare simplifies the select and recursively erases the now dead instructions. CodeGenPrepare then attempts to use the erased address computation for the second load. Fix this by erasing the cached address value if it has zero uses before looking for the address value in the sunken address map. This partially resolves PR35209. Thanks to Alexander Richardson for reporting the issue! This fixed version relands r318032 which was reverted in r318049 due to sanitizer buildbot failures. Reviewers: john.brawn Differential Revision: https://reviews.llvm.org/D39841 llvm-svn: 318956
* [CGP] Make optimizeMemoryInst able to combine more kinds of ExtAddrMode fieldsJohn Brawn2017-11-241-1/+351
| | | | | | | | | | | | | This patch extends the recent work in optimizeMemoryInst to make it able to combine more ExtAddrMode fields than just the BaseReg. This fixes some benchmark regressions introduced by r309397, where GVN PRE is hoisting a getelementptr such that it can no longer be combined into the addressing mode of the load or store that uses it. Differential Revision: https://reviews.llvm.org/D38133 llvm-svn: 318949
* [Hexagon] Remove trailing spaces, NFCKrzysztof Parzyszek2017-11-221-1/+1
| | | | llvm-svn: 318875
* [SCCP] Pick the right lattice value for constants.Davide Italiano2017-11-221-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | After the dataflow algorithm proves that an argument is constant, it replaces it value with the integer constant and drops the lattice value associated to the DEF. e.g. in the example we have @f() that's called twice: call @f(undef, ...) call @f(2, ...) `undef` MEET 2 = 2 so we replace the argument and all its uses with the constant 2. Shortly after, tryToReplaceWithConstantRange() tries to get the lattice value for the argument we just replaced, causing an assertion. This function is a little peculiar as it runs when we're doing replacement and not as part of the solver but still queries the solver. The fix is that of checking whether we replaced the value already and get a temporary lattice value for the constant. Thanks to Zhendong Su for the report! Fixes PR35357. llvm-svn: 318817
* Rename test/Transforms/CountingFunctionInserter -> EntryExitInstrumenterHans Wennborg2017-11-211-0/+0
| | | | | | The pass was renamed in r318195. llvm-svn: 318784
* EntryExitInstrumenter: support __cyg_profile_func_enter_bareHans Wennborg2017-11-211-0/+5
| | | | | | It works just like __cyg_profile_func_enter but takes no arguments. llvm-svn: 318783
* [InstCombine] Test for PR35354: unable to vectorize loop with std::maxAlexey Bataev2017-11-211-0/+57
| | | | | | on floats, NFC. llvm-svn: 318764
* SLPVectorizer.cpp: Avoid std::stable_sort(properlyDominates()).NAKAMURA Takumi2017-11-211-0/+153
| | | | | | | properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++. Instead, I introduced RPOT. In most cases, it works for CSE. llvm-svn: 318743
* Add heuristics for irreducible loop metadata under PGOHiroshi Yamauchi2017-11-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add the following heuristics for irreducible loop metadata: - When an irreducible loop header is missing the loop header weight metadata, give it the minimum weight seen among other headers. - Annotate indirectbr targets with the loop header weight metadata (as they are likely to become irreducible loop headers after indirectbr tail duplication.) These greatly improve the accuracy of the block frequency info of the Python interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to better register allocation under PGO. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39980 llvm-svn: 318693
* [SROA] Correctly invalidate analyses when dead instructions deletedTeresa Johnson2017-11-201-0/+97
| | | | | | | | | | | | | | | Summary: SROA can fail in rewriting alloca but still rewrite a phi resulting in dead instruction elimination. The Changed flag was not being set correctly, resulting in downstream passes using stale analyses. The included test case will assert during the second BDCE pass as a result. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39921 llvm-svn: 318677
* [LV] Model masking in VPlan, introducing VPInstructionsGil Rapaport2017-11-201-1/+1
| | | | | | | | | | | | | | | This patch adds a new abstraction layer to VPlan and leverages it to model the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details. Differential Revision: https://reviews.llvm.org/D38676 llvm-svn: 318645
* [IRCE] Smart range intersectionMax Kazantsev2017-11-204-44/+462
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rL316552, we ban intersection of unsigned latch range with signed range check and vice versa, unless the entire range check iteration space is known positive. It was a correct functional fix that saved us from dealing with ambiguous values, but it also appeared to be a very restrictive limitation. In particular, in the following case: loop: %iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch] %iv.offset = add i32 %iv, 10 %rc = icmp slt i32 %iv.offset, %len br i1 %rc, label %latch, label %deopt latch: %iv.next = add i32 %iv, 11 %cond = icmp i32 ult %iv.next, 100 br it %cond, label %loop, label %exit Here, the unsigned iteration range is `[0, 100)`, and the safe range for range check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned min/max functions for range intersection. Given this, we wanted to avoid dealing with `-10` because it is interpreted as a very big unsigned value. Semantically, range check's safe range goes through unsigned border, so in fact it is two disjoint ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited this case saying that we are not allowed to intersect such ranges. What semantics of this safe range actually means is that we can start from `-10` and go up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that `%len - 10` is a reasonably big positive value). In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous. In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the original safe space which is represented by one continuous range that does not go through wrap. In order to reach this, we use modified SCEV substraction function. It can be imagined as a function that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration space. This allows us to perform IRCE in many situations when we deal with IV space and range check of different types (in terms of signed/unsigned). We apply this approach for both matching and not matching types of IV iteration space and the range check. One implication of this is that now IRCE became smarter in detection of empty safe ranges. For example, in this case: loop: %iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch] %iv.offset = sub i32 %iv, 10 %rc = icmp ult i32 %iv.offset, %len br i1 %rc, label %latch, label %deopt latch: %iv.next = add i32 %iv, 11 %cond = icmp i32 ult %iv.next, 100 br it %cond, label %loop, label %exit If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`, we could end up executing entire loop in safe preloop while the main loop was still generated, but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow, we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization. Differential Revision: https://reviews.llvm.org/D39954 llvm-svn: 318639
* [CGP] Fix the crash caused by enable of complex addr modeSerguei Katkov2017-11-201-0/+35
| | | | | | | | | | | | | We must collect all AddModes even if they are the same. This is due to Original value is different but we need all original values collected as they are used as anchors in common phi finding. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40166 llvm-svn: 318638
* [LibCallSimplifier] allow splat vectors for pow(x, 0.5) -> sqrt() transformsSanjay Patel2017-11-191-2/+3
| | | | llvm-svn: 318629
* [LibCallSimplifier] partly fix pow(x, 0.5) -> sqrt() transformsSanjay Patel2017-11-191-4/+4
| | | | | | | | | | | | | | As the first test shows, we could transform an llvm intrinsic which never sets errno into a libcall which could set errno (even though it's marked readnone?), so that's not ideal. It's possible that we can also transform a libcall which could set errno to an intrinsic given the fast-math-flags constraint, but that's deferred to determine exactly which set of FMF are needed. Differential Revision: https://reviews.llvm.org/D40150 llvm-svn: 318628
* [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnanSanjay Patel2017-11-191-50/+16
| | | | | | | | | | | | | | | | | | | The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions: FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans) FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) | isnan(Y) So we can simplify patterns like this: (fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y (fcmp uno (known NNAN), X) || (fcmp uno X, Y) --> fcmp uno X, Y It might be better to split this into (X uno 0) | (Y uno 0) as a canonicalization, but that would be another patch. Differential Revision: https://reviews.llvm.org/D40130 llvm-svn: 318627
* [CallSiteSplitting] Remove some indirection (NFC).Florian Hahn2017-11-182-6/+30
| | | | | | | | | | | | | | | | Summary: With this patch I tried to reduce the complexity of the code sightly, by removing some indirection. Please let me know what you think. Reviewers: junbuml, mcrosier, davidxl Reviewed By: junbuml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40037 llvm-svn: 318593
* [LICM] Fix PR35342Jun Bum Lim2017-11-171-0/+26
| | | | | | | | | | | | | | Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks. Reviewers: efriedma, mcrosier, igor-laevsky Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40184 llvm-svn: 318551
* [PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching,Chandler Carruth2017-11-173-19/+2878
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | making it no longer even remotely simple. The pass will now be more of a "full loop unswitching" pass rather than anything substantively simpler than any other approach. I plan to rename it accordingly once the dust settles. The key ideas of the new loop unswitcher are carried over for non-trivial unswitching: 1) Fully unswitch a branch or switch instruction from inside of a loop to outside of it. 2) Update the CFG and IR. This avoids needing to "remember" the unswitched branches as well as avoiding excessively cloning and reliance on complex parts of simplify-cfg to cleanup the cfg. 3) Update the analyses (where we can) rather than just blowing them away or relying on something else updating them. Sadly, #3 is somewhat compromised here as the dominator tree updates were too complex for me to want to reason about. I will need to make another attempt to do this now that we have a nice dynamic update API for dominators. However, we do adhere to #3 w.r.t. LoopInfo. This approach also adds an important principls specific to non-trivial unswitching: not *all* of the loop will be duplicated when unswitching. This fact allows us to compute the cost in terms of how much *duplicate* code is inserted rather than just on raw size. Unswitching conditions which essentialy partition loops will work regardless of the total loop size. Some remaining issues that I will be addressing in subsequent commits: - Handling unstructured control flow. - Unswitching 'switch' cases instead of just branches. - Moving to the dynamic update API for dominators. Some high-level, interesting limitationsV that folks might want to push on as follow-ups but that I don't have any immediate plans around: - We could be much more clever about not cloning things that will be deleted. In fact, we should be able to delete *nothing* and do a minimal number of clones. - There are many more interesting selection criteria for which branch to unswitch that we might want to look at. One that I'm interested in particularly are a set of conditions which all exit the loop and which can be merged into a single unswitched test of them. Differential revision: https://reviews.llvm.org/D34200 llvm-svn: 318549
* [IRCE] Remove folding of two range checks into RANGE_CHECK_BOTHMax Kazantsev2017-11-171-6/+17
| | | | | | | | | | | | | The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER` into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which happens when we calculate safe iteration space. Effectively, the result of intersection of these ranges doesn't differ from the range of merged range check. We chose to remove duplicating logic in favor of code simplicity. Differential Revision: https://reviews.llvm.org/D39589 llvm-svn: 318508
* Current implementation of Value::replaceUsesExceptBlockAddr() uses UseListDmitry Mikulin2017-11-171-0/+26
| | | | | | | | | | | | | iterator to walk the list which keeps changing inside the loop. When the UseList contains several uses with the same user, we end processing the same user more than once, which leads to an assert. With this fix, unique users are saved and processed later to avoid processing duplicates. Differential Revision: https://reviews.llvm.org/D39864 llvm-svn: 318477
* [InstCombine] add tests for pow(); NFCSanjay Patel2017-11-161-16/+38
| | | | | | Also, increase test diversity (and show another bug) by varying the types. llvm-svn: 318430
* [InstCombine] add tests for 'afn' FMF; NFCSanjay Patel2017-11-161-0/+21
| | | | llvm-svn: 318423
* [InstCombine] regenerate test checks; NFCSanjay Patel2017-11-161-9/+9
| | | | llvm-svn: 318420
* [InstCombine] regenerate test checks; NFCSanjay Patel2017-11-161-205/+242
| | | | | | Also, remove some unnecessary bits. I don't think we need fcmp in any test here either? llvm-svn: 318418
* [InstCombine] regenerate test checks; NFCSanjay Patel2017-11-161-7/+8
| | | | llvm-svn: 318417
* [InstCombine] regenerate test checks; NFCSanjay Patel2017-11-161-82/+97
| | | | llvm-svn: 318416
* Let llvm.invariant.group.barrier accepts pointer to any address spaceYaxun Liu2017-11-162-7/+7
| | | | | | | | | | | llvm.invariant.group.barrier may accept pointers to arbitrary address space. This patch let it accept pointers to i8 in any address space and returns pointer to i8 in the same address space. Differential Revision: https://reviews.llvm.org/D39973 llvm-svn: 318413
* [InstSimplify] add tests for fcmp ord/uno; NFCSanjay Patel2017-11-161-0/+217
| | | | llvm-svn: 318408
* Remove stray comma in sink-addrmode testJohn Brawn2017-11-161-1/+1
| | | | | | | The extra comma meant it wasn't correctly checking that we weren't getting an extra getelementptr. llvm-svn: 318406
* [InstCombine] include 'sub' in the list of narrow-able binopsSanjay Patel2017-11-161-12/+8
| | | | | | | | | | | | | | | | | | | | // trunc (binop X, C) --> binop (trunc X, C') // trunc (binop (ext X), Y) --> binop X, (trunc Y) I'm grouping sub with the other binops because that makes the code simpler and the transforms are valid: https://rise4fun.com/Alive/UeF ...so even though we don't expect a sub with constant Op1 or any of the other opcodes with constant Op0 due to canonicalization rules, we might as well handle those situations if non-canonical code somehow reaches this point (it should just make instcombine more efficient in reaching its end goal). This should solve the problem that later manifests in the vectorizers in PR35295: https://bugs.llvm.org/show_bug.cgi?id=35295 llvm-svn: 318404
* [IRCE] Fix SCEVExpander's usage in IRCEMax Kazantsev2017-11-161-0/+135
| | | | | | | | | | | | When expanding exit conditions for pre- and postloops, we may end up expanding a recurrency from the loop to in its loop's preheader. This produces incorrect IR. This patch ensures that IRCE uses SCEVExpander correctly and only expands code which is safe to expand in this particular location. Differentian Revision: https://reviews.llvm.org/D39234 llvm-svn: 318381
* [InstCombine] add sub narrowing tests; NFCSanjay Patel2017-11-151-0/+56
| | | | | | | This might be the root cause of PR35295: https://bugs.llvm.org/show_bug.cgi?id=35295 llvm-svn: 318342
* [InstCombine] trunc (binop X, C) --> binop (trunc X, C')Sanjay Patel2017-11-152-64/+43
| | | | | | | | | Note that one-use and shouldChangeType() are checked ahead of the switch. Without the narrowing folds, we can produce inferior vector code as shown in PR35299: https://bugs.llvm.org/show_bug.cgi?id=35299 llvm-svn: 318323
* [InstCombine] Salvage debug info during initial DCEReid Kleckner2017-11-151-0/+70
| | | | | | | | | | | InstCombine salvages debug info for every instruction it erases from its worklist, but it wasn't doing it during its initial DCE when populating its worklist. This fixes that. This should help improve availability of 'this' in optimized debug info when casts are necessary. llvm-svn: 318320
* [InstCombine] add tests for missing trunc folds; NFCSanjay Patel2017-11-151-0/+285
| | | | | | | | As noted in PR35299: https://bugs.llvm.org/show_bug.cgi?id=35299 ...this is likely the root cause for a mis-vectorization transform. llvm-svn: 318319
* [SLP] Added more missed optimization remarksAdam Nemet2017-11-156-7/+229
| | | | | | | | | | | | | | | | | | | | Summary: Added more remarks to SLP pass, in particular "missed" optimization remarks. Also proposed several tests for new functionality. Patch by Vladimir Miloserdov! For reference you may look at: https://reviews.llvm.org/rL302811 Reviewers: anemet, fhahn Reviewed By: anemet Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits Differential Revision: https://reviews.llvm.org/D38367 llvm-svn: 318307
* [PassManager, SimplifyCFG] add test for PR34603 / D38566; NFCSanjay Patel2017-11-151-0/+40
| | | | | | This is a recommit of r316908 which was reverted by r317444. llvm-svn: 318300
* [(new) Pass Manager] instantiate SimplifyCFG with the same options as the old PMSanjay Patel2017-11-151-54/+25
| | | | | | | | | | | | | | | | | | This is a recommit of r316869 which was speculatively reverted with r317444 and subsequently shown to not be the cause of PR35210. That crash should be fixed after r318237. Original commit message: The old PM sets the options of what used to be known as "latesimplifycfg" on the instantiation after the vectorizers have run, so that's what we'redoing here. FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not set the "late" options. I'm not sure if that's intentional or not. Differential Revision: https://reviews.llvm.org/D39407 llvm-svn: 318299
* Fix llvm/test/Transforms/LoopRotate/pr35210.ll in rL318237, it uses debug ↵NAKAMURA Takumi2017-11-151-0/+1
| | | | | | options. llvm-svn: 318273
* [InstCombine] Simplify binops that are only used by a select and are fed by ↵Craig Topper2017-11-151-0/+52
| | | | | | | | | | | | | | | | | | | a select with the same condition. Summary: This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select. As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero? Reviewers: spatel, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39999 llvm-svn: 318267
* Revert r318193 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' ↵Hans Wennborg2017-11-152-120/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | elements in integer binary ops." It crashes building sqlite; see reply on the llvm-commits thread. > [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. > > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > *dst++ = *src++; > *dst++ = *src++ + 1; > *dst++ = *src++ + 2; > *dst++ = *src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318239
* [LoopRotate] processLoop should return true even if it just simplified the ↵Craig Topper2017-11-151-0/+70
| | | | | | | | | | | | | | loop latch without making any other changes Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened. PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch. Fixes PR35210. Differential Revision: https://reviews.llvm.org/D40035 llvm-svn: 318237
* Make salvageDebugInfo of casts work for dbg.declare and dbg.addrReid Kleckner2017-11-141-0/+87
| | | | | | | | | | | | | | | | | | | | | Summary: Instcombine (and probably other passes) sometimes want to change the type of an alloca. To do this, they generally create a new alloca with the desired type, create a bitcast to make the new pointer type match the old pointer type, replace all uses with the cast, and then simplify the casts. We already knew how to salvage dbg.value instructions when removing casts, but we can extend it to cover dbg.addr and dbg.declare. Fixes a debug info quality issue uncovered in Chromium in http://crbug.com/784609 Reviewers: aprantl, vsk Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40042 llvm-svn: 318203
* Rename CountingFunctionInserter and use for both mcount and cygprofile ↵Hans Wennborg2017-11-141-10/+69
| | | | | | | | | | | | | | | | | | | | | | calls, before and after inlining Clang implements the -finstrument-functions flag inherited from GCC, which inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit. This is useful for getting a trace of how the functions in a program are executed. Normally, the calls remain even if a function is inlined into another function, but it is useful to be able to turn this off for users who are interested in a lower-level trace, i.e. one that reflects what functions are called post-inlining. (We use this to generate link order files for Chromium.) LLVM already has a pass for inserting similar instrumentation calls to mcount(), which it does after inlining. This patch renames and extends that pass to handle calls both to mcount and the cygprofile functions, before and/or after inlining as controlled by function attributes. Differential Revision: https://reviews.llvm.org/D39287 llvm-svn: 318195
OpenPOWER on IntegriCloud