summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
Commit message (Collapse)AuthorAgeFilesLines
* TTI: Improve default costs for addrspacecastMatt Arsenault2019-06-031-2/+2
| | | | | | | | | | For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436
* Use the DataLayout::typeSizeEqualsStoreSize helper. NFCBjorn Pettersson2019-05-241-3/+2
| | | | | | | | | Just a minor refactoring to use the new helper method DataLayout::typeSizeEqualsStoreSize(). This is done when checking if getTypeSizeInBits is equal/non-equal to getTypeStoreSizeInBits. llvm-svn: 361613
* [CodeGenPrepare] Ensure we get a non-null result from getTrueOrFalseValue. NFCI.Simon Pilgrim2019-05-091-1/+3
| | | | llvm-svn: 360328
* [CodeGenPrepare] Don't split the store if it is volatileQingShan Zhang2019-05-081-0/+4
| | | | | | | | We shouldn't split the store when it is volatile. Differential Revision: https://reviews.llvm.org/D61169 llvm-svn: 360228
* [NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use itRoman Lebedev2019-05-051-2/+1
| | | | | | | | | | | | | | | | | | | | | Summary: It is a common thing to loop over every `PHINode` in some `BasicBlock` and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block. `replaceSuccessorsPhiUsesWith()` already had code to do that, it just wasn't a function. So outline it into a new function, and use it. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61013 llvm-svn: 359996
* [NFC] PHINode: introduce replaceIncomingBlockWith() function, use itRoman Lebedev2019-05-051-5/+2
| | | | | | | | | | | | | | | | | | | | Summary: There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()` and `PHINode::getNumOperands()`, but no function to replace every specified `BasicBlock*` predecessor with some other specified `BasicBlock*`. Clearly, there are a lot of places that could use that functionality. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61011 llvm-svn: 359995
* [CodeGenPrepare] limit overflow intrinsic matching to a single basic block ↵Sanjay Patel2019-05-041-28/+21
| | | | | | | | | | | | | | | | | | | | | (2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969
* Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic ↵Evgeniy Stepanov2019-05-031-42/+47
| | | | | | | | block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908
* [CodeGenPrepare] limit overflow intrinsic matching to a single basic blockSanjay Patel2019-05-031-47/+42
| | | | | | | | | | | | | | | | | | | Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879
* [CGP] Look through bitcasts when duplicating returns for tail callsFrancis Visoiu Mistrih2019-04-231-1/+3
| | | | | | | | | | | | | | | | | | | | | The simple case of: ``` int *callee(); void *caller(void *a) { if (a == NULL) return callee(); return a; } ``` would generate a regular call instead of a tail call because we don't look through the bitcast of the call to `callee` when duplicating the return blocks. Differential Revision: https://reviews.llvm.org/D60837 llvm-svn: 359041
* Include what's used in a few cpp files - these were getting transitiveEric Christopher2019-04-121-0/+1
| | | | | | includes from MCDwarf.h. llvm-svn: 358254
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-2/+2
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [CGP] Reset DT when optimizing select instructionsTeresa Johnson2019-03-271-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: A recent fix (r355751) caused a compile time regression because setting the ModifiedDT flag in optimizeSelectInst means that each time a select instruction is optimized the function walk in runOnFunction stops and restarts again (which was needed to build a new DT before we started building it lazily in r356937). Now that the DT is built lazily, a simple fix is to just reset the DT at this point, rather than restarting the whole function walk. In the future other places that set ModifiedDT may want to switch to just resetting the DT directly. But that will require an evaluation to ensure that they don't otherwise need to restart the function walk. Reviewers: spatel Subscribers: jdoerfert, llvm-commits, xur Tags: #llvm Differential Revision: https://reviews.llvm.org/D59889 llvm-svn: 357111
* [CGP] Build the DominatorTree lazilyTeresa Johnson2019-03-251-34/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In r355512 CGP was changed to build the DominatorTree only once per function traversal, to avoid repeatedly building it each time it was accessed. This solved one compile time issue but introduced another. In the second case, we now were building the DT unnecessarily many times when we performed many function traversals (i.e. more than once per function when running CGP because of changes made each time). Change to saving the DT in the CodeGenPrepare object, and building it lazily when needed. It is reset whenever we need to rebuild it. The case that exposed the issue there are 617 functions, and we walk them (i.e. execute the "while (MadeChange)" loop in runOnFunction) a total of 12083 times (so previously we were building the DT 12083 times). With this patch we only build the DT 844 times (average of 1.37 times per function). We dropped the total time to compile this file from 538.11s without this patch to 339.63s with it. There is still an issue as CGP is taking much longer than all other passes even with this patch, and before a recent compiler release cut at r355392 the total time to this compile was only 97 sec with a huge reduction in CGP time. I suspect that one of the other recent changes to CGP led to iterating each function many more times on average, but I need to do some more investigation. Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59696 llvm-svn: 356937
* [CGP] Make several static functions member functions (NFC)Teresa Johnson2019-03-241-19/+25
| | | | | | | This is extracted from D59696 as suggested in the review. It is preparation for making the DominatorTree a member variable. llvm-svn: 356857
* [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)Sanjay Patel2019-03-211-2/+6
| | | | | | | | | | | | | | | | | | This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665
* [CGP] fix formatting; NFCSanjay Patel2019-03-201-3/+4
| | | | llvm-svn: 356572
* [CGP] convert chain of 'if' to 'switch'; NFCSanjay Patel2019-03-201-14/+13
| | | | | | | | This should be extended, but CGP does some strange things, so I'm intentionally not changing the potential order of any transforms yet. llvm-svn: 356566
* [CodeGenPrepare] avoid crashing from replacing a phi twiceMikael Holmen2019-03-151-2/+6
| | | | | | | | | | | | | | | | | | | | | | Summary: This is a fix to bug 41052: https://bugs.llvm.org/show_bug.cgi?id=41052 While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi. Patch by: JesperAntonsson Reviewers: skatkov, aprantl Reviewed By: aprantl Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59358 llvm-svn: 356260
* [CGP] add another bailout for degenerate code (PR41064)Sanjay Patel2019-03-141-1/+5
| | | | | | | | | | | | This is almost the same as: rL355345 ...and should prevent any potential crashing from examples like: https://bugs.llvm.org/show_bug.cgi?id=41064 ...although the bug was masked by: rL355823 ...and I'm not sure how to repro the problem after that change. llvm-svn: 356218
* CodeGenPrep: preserve inbounds attribute when sinking GEPs.Tim Northover2019-03-121-3/+27
| | | | | | | | | Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926
* [CGP] Fix UB when GEP is bound to trivial PHINodeEugene Leviant2019-03-121-0/+1
| | | | | | Differential revision: https://reviews.llvm.org/D59140 llvm-svn: 355904
* [CGP] Limit distance between overflow math and cmpSam Parker2019-03-111-0/+11
| | | | | | | | | | | | | | | Inserting an overflowing arithmetic intrinsic can increase register pressure by producing two values at a point where only one is needed, while the second use maybe several blocks away. This increase in pressure is likely to be more detrimental on performance than rematerialising one of the original instructions. So, check that the arithmetic and compare instructions are no further apart than their immediate successor/predecessor. Differential Revision: https://reviews.llvm.org/D59024 llvm-svn: 355823
* [CGP] fix comments; NFCSanjay Patel2019-03-101-2/+2
| | | | llvm-svn: 355791
* [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInstRong Xu2019-03-081-16/+10
| | | | | | | | | | | | | | | r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751
* [CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC)Teresa Johnson2019-03-061-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In r354298 a DominatorTree construction was added via new function combineToUSubWithOverflow, which was subsequently restructured into replaceMathCmpWithIntrinsic in r354689. We are hitting a very long compile time due to this repeated construction, once per math cmp in the function. We shouldn't need to build the DominatorTree more than once per function, except when a transformation invalidates it. There is already a boolean flag that is returned from these methods indicating whether the DT has been modified. We can simply build the DT once per Function walk in CodeGenPrepare::runOnFunction, since any time a change is made we break out of the Function walk and restart it. I modified the code so that both replaceMathCmpWithIntrinsic as well as mergeSExts (which was also building a DT) use the DT constructed by the run method. From -mllvm -time-passes: Before this patch: CodeGen Prepare user time is 328s With this patch: CodeGen Prepare user time is 21s Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58995 llvm-svn: 355512
* [CodeGenPrepare] avoid crashing on non-canonical/degenerate codeSanjay Patel2019-03-041-0/+5
| | | | | | | | | | | | The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345
* [CGP] add special-cases to form unsigned add with overflow (PR40486)Sanjay Patel2019-02-241-8/+27
| | | | | | | | | | | | | | | | | | | | There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746
* [CGP] move overflow intrinsic insertion to common location; NFCISanjay Patel2019-02-221-17/+28
| | | | | | | | | | | We need to enhance the uaddo matching to handle special-cases as seen in PR40486 and PR31754. That means we won't necessarily have a def-use pattern, so we'll need to check dominance to determine where to place the intrinsic (as we already do for usubo). This preliminary patch is just rearranging the code, so the planned follow-up to improve uaddo will be more clear. llvm-svn: 354689
* [CGP] match a special-case of unsigned subtract overflowSanjay Patel2019-02-201-0/+5
| | | | | | | This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519
* [CGP] form usub with overflow from sub+icmpSanjay Patel2019-02-181-13/+78
| | | | | | | | | | | | | | | | | The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* [CGP] Add support for sinking operands to their users, if they are free.Florian Hahn2019-02-051-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152
* [CGP] use IRBuilder to simplify codeSanjay Patel2019-02-041-26/+25
| | | | | | | | | | | | | | | | This is no-functional-change-intended although there could be intermediate variations caused by a difference in the debug info produced by setting that from the builder's insertion point. I'm updating the IR test file associated with this code just to show that the naming differences from using the builder are visible. The motivation for adding a helper function is that we are likely to extend this code to deal with other overflow ops. llvm-svn: 353056
* [CGP] adjust target constraints for forming uaddoSanjay Patel2019-02-031-8/+11
| | | | | | | | | | | | | | | | | | | There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
* [CGP] refactor optimizeCmpExpression (NFCI)Sanjay Patel2019-02-031-36/+38
| | | | | | | | | | | | | | | | This is not truly NFC because we are bailing out without a TLI now. That should not be a real concern though because there should be a TLI in any real-world scenario. That seems better than passing around a pointer and then checking it for null-ness all over the place. The motivation is to fix what appears to be an unintended restriction on the uaddo transform - hasMultipleConditionRegisters() shouldn't be reason to limit the transform. llvm-svn: 352988
* [opaque pointer types] Pass function types to CallInst creation.James Y Knight2019-02-011-1/+1
| | | | | | | | | This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909
* Lower widenable_conditions in CGPPhilip Reames2019-01-311-0/+14
| | | | | | | | This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection. Differential Revision: https://reviews.llvm.org/D57473 llvm-svn: 352779
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""David L. Jones2019-01-311-18/+4
| | | | | | | | This change reverts r351626. The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.) llvm-svn: 352722
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-301-1/+1
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* [CodeGenPrepare] Handle all debug calls in dupRetToEnableTailCallOpts()Jonas Paulsson2019-01-291-4/+2
| | | | | | | | | | | | This patch makes sure that a debug value that is after the bitcast in dupRetToEnableTailCallOpts() is also skipped. The reduced test case is from SPEC-2006 on SystemZ. Review: Vedant Kumar, Wolfgang Pieb https://reviews.llvm.org/D57050 llvm-svn: 352462
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Reapply "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-191-4/+18
| | | | | | Original commit: r351582 llvm-svn: 351626
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""Roman Tereshin2019-01-191-17/+4
| | | | | | | | | | This reverts commit r351618. Compiler RT + ASAN tests are failing for PowerPC. Not sure how would I reproduce these on macOS, so reverting (again) until I do. llvm-svn: 351619
* Reapply "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-191-4/+17
| | | | | | Original commit: r351582 llvm-svn: 351618
* Revert "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-181-13/+4
| | | | | | | | This reverts commit r351582. Bots are failing. Reverting this to fix and re-commit later. llvm-svn: 351598
* [CGP] Check for existing inttotpr before creating new oneRoman Tereshin2019-01-181-4/+13
| | | | | | | | | | | | | | | | Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of the same integer value while sinking address computations, but rather CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking that related pointers have nothing to do with each other. This problem blocks LoadStoreVectorizer from vectorizing some of the loads / stores in a downstream target. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56838 llvm-svn: 351582
* [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.Eli Friedman2018-12-191-3/+4
| | | | | | | | | | | | | | Creating the IR builder, then modifying the CFG, leads to an IRBuilder where the BB and insertion point are inconsistent, so new instructions have the wrong parent. Modified an existing test because the test wasn't covering anything useful (the "invoke" was not actually an invoke by the time we hit the code in question). Differential Revision: https://reviews.llvm.org/D55729 llvm-svn: 349693
* [Debuginfo] Prevent CodeGenPrepare from dropping debuginfo references.Wolfgang Pieb2018-12-111-0/+15
| | | | | | | | | | | | | | This fixes PR39845. CodeGenPrepare employs a transactional model when performing optimizations, i.e. it changes the IR to attempt an optimization and rolls back the change when it finds the change inadequate. It is during the rollback that references to locals were dropped from debug value intrinsics. This patch reinstates debuginfo references during rollbacks. Reviewers: aprantl, vsk Differential Revision: https://reviews.llvm.org/D55396 llvm-svn: 348896
* [CGP] Improve compile time for complex addressing modeSerguei Katkov2018-11-291-106/+58
| | | | | | | | | | | | This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54932 llvm-svn: 347839
OpenPOWER on IntegriCloud