summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis
Commit message (Collapse)AuthorAgeFilesLines
* [ValueTracking] Add a basic version of isKnownNonInfinity and use it to ↵Benjamin Kramer2019-11-191-4/+69
| | | | detect more NoNaNs
* [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" ↵Evgeniy Brevnov2019-11-191-6/+19
| | | | | | | | | | | | | | | | | | | should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151 Summary: Dependence anlysis has a mechanism to cache results. Thus for particular memory access the cache keep track of side effects in basic blocks. The problem is that for invariant loads dependepce analysis legally ignores many dependencies due to a special semantic rules for such loads. But later results calculated for invariant load retrived from the cache for general case acceses. As a result we have wrong dependence information causing GVN to do illegal transformation. Fixes, T42151. Proposed solution is to disable caching of invariant loads. I think such loads a pretty rare and it doesn't make sense to extend caching mechanism for them. Reviewers: reames, chandlerc, skatkov, morisset, jdoerfert Reviewed By: reames Subscribers: hiraditya, test, jdoerfert, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64405
* [LoopCacheAnalysis]: Fix assertion failure during cost computationRachel Craik2019-11-151-0/+3
| | | | | | | | Ensure the stride and trip count have the same type before multiplying them during reference cost calculation Reviewed By: jdoefert Differential Revision: https://reviews.llvm.org/D70192
* [SVFS] Inject TLI Mappings in VFABI attribute.Francesco Petrogalli2019-11-152-0/+19
| | | | | | | | | | | | | | This patch introduces a function pass to inject the scalar-to-vector mappings stored in the TargetLIbraryInfo (TLI) into the Vector Function ABI (VFABI) variants attribute. The test is testing the injection for three vector libraries supported by the TLI (Accelerate, SVML, MASSV). The pass does not change any of the analysis associated to the function. Differential Revision: https://reviews.llvm.org/D70107
* Add missing includes needed to prune LLVMContext.h include, NFCReid Kleckner2019-11-1411-1/+12
| | | | | These are a pre-requisite to removing #include "llvm/Support/Options.h" from LLVMContext.h: https://reviews.llvm.org/D70280
* Sink all InitializePasses.h includesReid Kleckner2019-11-1350-10/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* Revert 57dd4b0 "[ValueTracking] Allow context-sensitive nullness check for ↵Hans Wennborg2019-11-132-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | non-pointers" This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced repro is small enough to fit here: $ cat /tmp/a.c unsigned char f(unsigned char *p) { unsigned char result = 0; for (int shift = 0; shift < 1; ++shift) result |= p[0] << (shift * 8); return result; } $ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f: f: # @f .cfi_startproc # %bb.0: # %entry xorl %eax, %eax retq That's nicely optimized, but I don't think it's the right result :-) > Same as D60846 but with a fix for the problem encountered there which > was a missing context adjustment in the handling of PHI nodes. > > The test that caused D60846 to be reverted was added in e15ab8f277c7. > > Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam > > Subscribers: hiraditya, bollu, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D69571 This reverts commit 57dd4b03e4806bbb4760ab6150940150d884df20.
* [VFABI] Add LLVM internal mangling for vector functions.Francesco Petrogalli2019-11-131-9/+20
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a custom ISA for vector functions for internal use in LLVM. The <isa> token is set to "_LLVM_", and it is not attached to any specific instruction Vector ISA, or Vector Function ABI. The ISA is used as a token for handling Vector Function ABI-style vectorization for those vector functions that are not directly associated to any existing Vector Function ABI (for example, some of the vector functions exposed by TargetLibraryInfo). The demangling function for this ISA in a Vector Function ABI context is set to be the same as the common one shared between X86 and AArch64. Reviewers: jdoerfert, sdesmalen, simoll Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70089
* Temporarily Revert "Reapply [LVI] Normalize pointer behavior" as it's broken ↵Eric Christopher2019-11-121-97/+89
| | | | | | | | python 3.6. Reverting to figure out if it's a problem in python or the compiler for now. This reverts commit 885a05f48a5d320946c89590b73a764e5884fe4f.
* [GlobalsAA] Restrict ModRef result if any internal method has its address taken.Alina Sbirlea2019-11-121-6/+13
| | | | | | | | | | | | | Summary: If there are any internal methods whose address was taken, conclude there is nothing known in relation of any other internal method and a global. Reviewers: nlopes, sanjoy.google Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69690
* [InstCombine] Skip scalable vectors in combineLoadToOperationTypeDiana Picus2019-11-121-4/+3
| | | | | | | | | | | | | | | | Don't try to canonicalize loads to scalable vector types to loads of integers. This removes one assertion when trying to use a TypeSize as a parameter to DataLayout::isLegalInteger. It does not handle the second part of the function (which looks at bitcasts). This patch also contains a NFC fix for Load Analysis, where a variable initialization that would cause the same assertion is moved closer to its use. This allows us to run the new test for InstCombine without having to teach LocationSize to play nicely with scalable vectors. Differential Revision: https://reviews.llvm.org/D70075
* [VFABI] Read/Write functions for the VFABI attribute.Francesco Petrogalli2019-11-122-2/+22
| | | | | | | | | | | | | | The attribute is stored at the `FunctionIndex` attribute set, with the name "vector-function-abi-variant". The get/set methods of the attribute have assertion to verify that: 1. Each name in the attribute is a valid VFABI mangled name. 2. Each name in the attribute correspond to a function declared in the module. Differential Revision: https://reviews.llvm.org/D69976
* Add InstCombine/InstructionSimplify support for Freeze Instructionaqjune2019-11-122-0/+30
| | | | | | | | | | | | | | | Summary: - Add llvm::SimplifyFreezeInst - Add InstCombiner::visitFreeze - Add llvm tests Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel Reviewed By: reames, lebedev.ri Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits Differential Revision: https://reviews.llvm.org/D29013
* Fix -Wparentheses warning. NFCI.Simon Pilgrim2019-11-111-1/+2
|
* [NFC]: Fix PVS Studio warning in LoopNestAnalysisTsang Whitney W.H2019-11-101-3/+3
| | | | | | | | | | | | | | | | | | Summary:This patch fixes the following warnings uncovered by PVS Studio: /home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 353 warn V612 An unconditional 'return' within a loop. /home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 456 err V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. Authored By:etiotto Reviewer:Meinersbur, kbarton, bmahjour, Whitney, xbolva00 Reviewed By:xbolva00 Subscribers:hiraditya, llvm-commits Tag:LLVM Differential Revision:https://reviews.llvm.org/D69821
* ThinLTO : Import always_inline functions irrespective of the thresholdTeresa Johnson2019-11-081-2/+4
| | | | | | | | | | | | | | | | Summary: A user can force a function to be inlined by specifying the always_inline attribute. Currently, thinlto implementation is not aware of always_inline functions and does not guarantee import of such functions, which in turn can prevent inlining of such functions. Patch by Bharathi Seshadri <bseshadr@cisco.com> Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70014
* [DDG] Data Dependence Graph - Pi Blockbmahjour2019-11-082-11/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG nodes that are part of a strongly-connected component of the graph. Replacing all the SCCs with pi-blocks results in an acyclic representation of the DDG. For example if we have: {a -> b}, {b -> c, d}, {c -> a} the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows: {p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a} In this implementation the edges between nodes that are part of the pi-block are preserved. The crossing edges (edges where one end of the edge is in the set of nodes belonging to an SCC and the other end is outside that set) are replaced with corresponding edges to/from the pi-block node instead. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D68827
* Reapply [LVI] Normalize pointer behaviorNikita Popov2019-11-081-89/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix cache invalidation by not guarding the dereferenced pointer cache erasure by SeenBlocks. SeenBlocks is only populated when actually caching a value in the block, which doesn't necessarily have to happen just because dereferenced pointers were calculated. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change). Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914
* Revert "[LVI] Normalize pointer behavior"Nikita Popov2019-11-081-94/+89
| | | | | | | This reverts commit 15bc4dc9a8949f9cffd46ec647baf0818d28fb28. clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few compile-time regressions in test-suite, will investigate.
* [LVI] Normalize pointer behaviorNikita Popov2019-11-081-89/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change). Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914
* Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵Hans Wennborg2019-11-071-26/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | order recurrences." It broke Chromium, causing "Instruction does not dominate all uses!" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a reproducer. > If the recurrence PHI node has a single user, we can sink any > instruction without side effects, given that all users are dominated by > the instruction computing the incoming value of the next iteration > ('Previous'). We can sink instructions that may cause traps, because > that only causes the trap to occur later, but not on any new paths. > > With the relaxed check, we also have to make sure that we do not have a > direct cycle (meaning PHI user == 'Previous), which indicates a > reduction relation, which potentially gets missed by > ReductionDescriptor. > > As follow-ups, we can also sink stores, iff they do not alias with > other instructions we move them across and we could also support sinking > chains of instructions and multiple users of the PHI. > > Fixes PR43398. > > Reviewers: hsaito, dcaballe, Ayal, rengolin > > Reviewed By: Ayal > > Differential Revision: https://reviews.llvm.org/D69228
* [WC] Fix a subtle bug in our definition of widenable branchPhilip Reames2019-11-061-0/+5
| | | | | | | | | | | | We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses. The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating *all other users* of the WC in the same way, we have broken the required semantics. Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form. In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch. Differential Revision: https://reviews.llvm.org/D69916
* [LoopPred] Fix two subtle issues found by inspectionPhilip Reames2019-11-061-0/+11
| | | | | | | | | | | | This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify. Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile. Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case. I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above. Differential Revision: https://reviews.llvm.org/D69695
* [TTI][LV] preferPredicateOverEpilogueSjoerd Meijer2019-11-061-0/+6
| | | | | | | | | | | | | | | | | We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
* [InstSimplify] use FMF to improve fcmp+select foldSanjay Patel2019-11-041-7/+10
| | | | | This is part of a series of patches needed to solve PR39535: https://bugs.llvm.org/show_bug.cgi?id=39535
* [SCEV] Fixed 'Uninitialized variable 'ContainsAddRec' used.' warning. NFCI.Dávid Bolvanský2019-11-031-1/+1
|
* [MemorySSA] Fixed null check after dereferencing warning. NFCI.Dávid Bolvanský2019-11-031-0/+1
|
* [LV] Generalize conditions for sinking instrs for first order recurrences.Florian Hahn2019-11-021-14/+26
| | | | | | | | | | | | | | | | | | | | | | | | | If the recurrence PHI node has a single user, we can sink any instruction without side effects, given that all users are dominated by the instruction computing the incoming value of the next iteration ('Previous'). We can sink instructions that may cause traps, because that only causes the trap to occur later, but not on any new paths. With the relaxed check, we also have to make sure that we do not have a direct cycle (meaning PHI user == 'Previous), which indicates a reduction relation, which potentially gets missed by ReductionDescriptor. As follow-ups, we can also sink stores, iff they do not alias with other instructions we move them across and we could also support sinking chains of instructions and multiple users of the PHI. Fixes PR43398. Reviewers: hsaito, dcaballe, Ayal, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D69228
* [PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.Hiroshi Yamauchi2019-10-312-4/+6
| | | | | | | | | | | | | | | | Summary: (Split of off D67120) TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile guided size optimization. Reviewers: davidxl Subscribers: eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69580
* [ValueTracking] Allow context-sensitive nullness check for non-pointersJohannes Doerfert2019-10-312-7/+12
| | | | | | | | | | | | | | | Same as D60846 but with a fix for the problem encountered there which was a missing context adjustment in the handling of PHI nodes. The test that caused D60846 to be reverted was added in e15ab8f277c7. Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69571
* [FIX] Make LSan happy by *not* leaking memoryJohannes Doerfert2019-10-311-4/+13
| | | | | | | I left a memory leak in a printer pass which made LSan sad so I remove the memory leak now to make LSan happy. Reported and tested by vlad.tsyrklevich.
* [MustExecute] Silence clang warning about unused captured 'this'Mikael Holmen2019-10-311-2/+2
| | | | | | | | | | | | New code introduced in fe799c97fa caused clang to complain with ../lib/Analysis/MustExecute.cpp:360:34: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] GetterTy<LoopInfo> LIGetter = [this](const Function &F) { ^~~~ ../lib/Analysis/MustExecute.cpp:365:44: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] GetterTy<PostDominatorTree> PDTGetter = [this](const Function &F) { ^~~~ 2 errors generated.
* [MustExecute] Forward iterate over conditional branchesJohannes Doerfert2019-10-311-1/+187
| | | | | | | | | | | | | | | | | | | Summary: If a conditional branch is encountered we can try to find a join block where the execution is known to continue. This means finding a suitable block, e.g., the immediate post dominator of the conditional branch, and proofing control will always reach that block. This patch implements different techniques that work with and without provided analysis. Reviewers: uenoku, sstefan1, hfinkel Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68933
* [Alignment][NFC] getMemoryOpCost uses MaybeAlignGuillaume Chatelet2019-10-251-5/+5
| | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
* [SCEV] Expose and use maximum constant exit counts for individual loop exitsPhilip Reames2019-10-241-3/+15
| | | | | | | | We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API. The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform. The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit. I coudn't construct a reasonably stable test case. This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case. I also noticed the loop in IndVars is O(#Exits ^ 2). This doesn't change with this patch. A future patch will cache this result inside of SCEV to avoid requering.
* Fix Clang -Wcovered-switch-default warning by moving llvm_unreachable ↵David Blaikie2019-10-241-5/+2
| | | | default to after the switch
* [SCEV] Start reworking backedge taken count APIs to unify max handling [NFC]Philip Reames2019-10-241-13/+21
| | | | This is a first step in figuring out a proper API for maximum (non constant) exit counts. This may evolve a bit as we get experience with the API needs; suggestions very welcome. This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
* Fix MSVC "switch statement contains 'default' but no 'case' labels" warning. ↵Simon Pilgrim2019-10-241-7/+4
| | | | NFCI.
* [LVI][NFC] Factor solveBlockValueSaturatingIntrinsic() out of ↵Roman Lebedev2019-10-231-11/+25
| | | | | | solveBlockValueIntrinsic() Now that there's SaturatingInst class, this is cleaner.
* [LVI][CVP] LazyValueInfoImpl::solveBlockValueBinaryOp(): use no-wrap flags ↵Roman Lebedev2019-10-231-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from `add` op Summary: This was suggested in https://reviews.llvm.org/D69277#1717210 In this form (this is what was suggested, right?), the results aren't staggering (especially since given LVI cross-block focus) this does catch some things (as per test-suite), but not too much: | statistic | old | new | delta | % change | | correlated-value-propagation.NumAddNSW | 4981 | 4982 | 1 | 0.0201% | | correlated-value-propagation.NumAddNW | 12125 | 12126 | 1 | 0.0082% | | correlated-value-propagation.NumCmps | 1199 | 1202 | 3 | 0.2502% | | correlated-value-propagation.NumDeadCases | 112 | 111 | -1 | -0.8929% | | correlated-value-propagation.NumMulNSW | 275 | 278 | 3 | 1.0909% | | correlated-value-propagation.NumMulNUW | 1323 | 1326 | 3 | 0.2268% | | correlated-value-propagation.NumMulNW | 1598 | 1604 | 6 | 0.3755% | | correlated-value-propagation.NumNSW | 7158 | 7167 | 9 | 0.1257% | | correlated-value-propagation.NumNUW | 13304 | 13310 | 6 | 0.0451% | | correlated-value-propagation.NumNW | 20462 | 20477 | 15 | 0.0733% | | correlated-value-propagation.NumOverflows | 4 | 7 | 3 | 75.0000% | | correlated-value-propagation.NumPhis | 15366 | 15381 | 15 | 0.0976% | | correlated-value-propagation.NumSExt | 6273 | 6277 | 4 | 0.0638% | | correlated-value-propagation.NumShlNSW | 1172 | 1171 | -1 | -0.0853% | | correlated-value-propagation.NumShlNUW | 2793 | 2794 | 1 | 0.0358% | | correlated-value-propagation.NumSubNSW | 730 | 736 | 6 | 0.8219% | | correlated-value-propagation.NumSubNUW | 2044 | 2046 | 2 | 0.0978% | | correlated-value-propagation.NumSubNW | 2774 | 2782 | 8 | 0.2884% | | instcount.NumAddInst | 277586 | 277569 | -17 | -0.0061% | | instcount.NumAndInst | 66056 | 66054 | -2 | -0.0030% | | instcount.NumBrInst | 709147 | 709146 | -1 | -0.0001% | | instcount.NumCallInst | 528579 | 528576 | -3 | -0.0006% | | instcount.NumExtractValueInst | 18307 | 18301 | -6 | -0.0328% | | instcount.NumOrInst | 102660 | 102665 | 5 | 0.0049% | | instcount.NumPHIInst | 318008 | 318007 | -1 | -0.0003% | | instcount.NumSelectInst | 46373 | 46370 | -3 | -0.0065% | | instcount.NumSExtInst | 79496 | 79488 | -8 | -0.0101% | | instcount.NumShlInst | 40654 | 40657 | 3 | 0.0074% | | instcount.NumTruncInst | 62251 | 62249 | -2 | -0.0032% | | instcount.NumZExtInst | 68211 | 68221 | 10 | 0.0147% | | instcount.TotalBlocks | 843910 | 843909 | -1 | -0.0001% | | instcount.TotalInsts | 7387448 | 7387423 | -25 | -0.0003% | Reviewers: nikic, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69321
* [Alignment][NFC] Finish transition for `Loads`Guillaume Chatelet2019-10-213-51/+48
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69253 llvm-svn: 375419
* Use Align for TFL::TransientStackAlignmentGuillaume Chatelet2019-10-211-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69216 llvm-svn: 375398
* [SCEV] Simplify umin/max of zext and sext of the same valuePhilip Reames2019-10-191-1/+34
| | | | | | | | This is a common idiom which arises after induction variables are widened, and we have two or more exit conditions. Interestingly, we don't have instcombine or instsimplify support for this either. Differential Revision: https://reviews.llvm.org/D69006 llvm-svn: 375349
* [SCEV] Removing deprecated comment in ScalarEvolutionExpanderVictor Campos2019-10-181-3/+0
| | | | | | | Removing a comment in the ScalarEvolutionExpander.cpp file that was about the class SCEVSDivExpr, which has been long gone from LLVM. llvm-svn: 375232
* Reland: Dead Virtual Function EliminationOliver Stannard2019-10-171-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
* [Alignment][NFC] Value::getPointerAlignment returns MaybeAlignGuillaume Chatelet2019-10-152-16/+19
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68398 llvm-svn: 374889
* Revert "Dead Virtual Function Elimination"Jorge Gorbe Moya2019-10-141-32/+0
| | | | | | This reverts commit 9f6a873268e1ad9855873d9d8007086c0d01cf4f. llvm-svn: 374844
* [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]Sam Parker2019-10-141-4/+6
| | | | | | | | | Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
* recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵Zi Xuan Wu2019-10-121-2/+10
| | | | | | | | | | | | | | | | | | | | | | | separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
* Dead Virtual Function EliminationOliver Stannard2019-10-111-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 374539
OpenPOWER on IntegriCloud