summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [CGP] Limit distance between overflow math and cmpSam Parker2019-03-111-0/+11
| | | | | | | | | | | | | | | Inserting an overflowing arithmetic intrinsic can increase register pressure by producing two values at a point where only one is needed, while the second use maybe several blocks away. This increase in pressure is likely to be more detrimental on performance than rematerialising one of the original instructions. So, check that the arithmetic and compare instructions are no further apart than their immediate successor/predecessor. Differential Revision: https://reviews.llvm.org/D59024 llvm-svn: 355823
* [CGP] fix comments; NFCSanjay Patel2019-03-101-2/+2
| | | | llvm-svn: 355791
* [CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInstRong Xu2019-03-081-16/+10
| | | | | | | | | | | | | | | r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751
* [CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC)Teresa Johnson2019-03-061-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In r354298 a DominatorTree construction was added via new function combineToUSubWithOverflow, which was subsequently restructured into replaceMathCmpWithIntrinsic in r354689. We are hitting a very long compile time due to this repeated construction, once per math cmp in the function. We shouldn't need to build the DominatorTree more than once per function, except when a transformation invalidates it. There is already a boolean flag that is returned from these methods indicating whether the DT has been modified. We can simply build the DT once per Function walk in CodeGenPrepare::runOnFunction, since any time a change is made we break out of the Function walk and restart it. I modified the code so that both replaceMathCmpWithIntrinsic as well as mergeSExts (which was also building a DT) use the DT constructed by the run method. From -mllvm -time-passes: Before this patch: CodeGen Prepare user time is 328s With this patch: CodeGen Prepare user time is 21s Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58995 llvm-svn: 355512
* [CodeGenPrepare] avoid crashing on non-canonical/degenerate codeSanjay Patel2019-03-041-0/+5
| | | | | | | | | | | | The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345
* [CGP] add special-cases to form unsigned add with overflow (PR40486)Sanjay Patel2019-02-241-8/+27
| | | | | | | | | | | | | | | | | | | | There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746
* [CGP] move overflow intrinsic insertion to common location; NFCISanjay Patel2019-02-221-17/+28
| | | | | | | | | | | We need to enhance the uaddo matching to handle special-cases as seen in PR40486 and PR31754. That means we won't necessarily have a def-use pattern, so we'll need to check dominance to determine where to place the intrinsic (as we already do for usubo). This preliminary patch is just rearranging the code, so the planned follow-up to improve uaddo will be more clear. llvm-svn: 354689
* [CGP] match a special-case of unsigned subtract overflowSanjay Patel2019-02-201-0/+5
| | | | | | | This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519
* [CGP] form usub with overflow from sub+icmpSanjay Patel2019-02-181-13/+78
| | | | | | | | | | | | | | | | | The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298
* Implementation of asm-goto support in LLVMCraig Topper2019-02-081-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563
* [CGP] Add support for sinking operands to their users, if they are free.Florian Hahn2019-02-051-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152
* [CGP] use IRBuilder to simplify codeSanjay Patel2019-02-041-26/+25
| | | | | | | | | | | | | | | | This is no-functional-change-intended although there could be intermediate variations caused by a difference in the debug info produced by setting that from the builder's insertion point. I'm updating the IR test file associated with this code just to show that the naming differences from using the builder are visible. The motivation for adding a helper function is that we are likely to extend this code to deal with other overflow ops. llvm-svn: 353056
* [CGP] adjust target constraints for forming uaddoSanjay Patel2019-02-031-8/+11
| | | | | | | | | | | | | | | | | | | There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
* [CGP] refactor optimizeCmpExpression (NFCI)Sanjay Patel2019-02-031-36/+38
| | | | | | | | | | | | | | | | This is not truly NFC because we are bailing out without a TLI now. That should not be a real concern though because there should be a TLI in any real-world scenario. That seems better than passing around a pointer and then checking it for null-ness all over the place. The motivation is to fix what appears to be an unintended restriction on the uaddo transform - hasMultipleConditionRegisters() shouldn't be reason to limit the transform. llvm-svn: 352988
* [opaque pointer types] Pass function types to CallInst creation.James Y Knight2019-02-011-1/+1
| | | | | | | | | This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909
* Lower widenable_conditions in CGPPhilip Reames2019-01-311-0/+14
| | | | | | | | This ensures that if we make it to the backend w/o lowering widenable_conditions first, that we generate correct code. Doing it in CGP - instead of isel - let's us fold control flow before hitting block local instruction selection. Differential Revision: https://reviews.llvm.org/D57473 llvm-svn: 352779
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""David L. Jones2019-01-311-18/+4
| | | | | | | | This change reverts r351626. The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.) llvm-svn: 352722
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-301-1/+1
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* [CodeGenPrepare] Handle all debug calls in dupRetToEnableTailCallOpts()Jonas Paulsson2019-01-291-4/+2
| | | | | | | | | | | | This patch makes sure that a debug value that is after the bitcast in dupRetToEnableTailCallOpts() is also skipped. The reduced test case is from SPEC-2006 on SystemZ. Review: Vedant Kumar, Wolfgang Pieb https://reviews.llvm.org/D57050 llvm-svn: 352462
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Reapply "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-191-4/+18
| | | | | | Original commit: r351582 llvm-svn: 351626
* Revert "Reapply "[CGP] Check for existing inttotpr before creating new one""Roman Tereshin2019-01-191-17/+4
| | | | | | | | | | This reverts commit r351618. Compiler RT + ASAN tests are failing for PowerPC. Not sure how would I reproduce these on macOS, so reverting (again) until I do. llvm-svn: 351619
* Reapply "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-191-4/+17
| | | | | | Original commit: r351582 llvm-svn: 351618
* Revert "[CGP] Check for existing inttotpr before creating new one"Roman Tereshin2019-01-181-13/+4
| | | | | | | | This reverts commit r351582. Bots are failing. Reverting this to fix and re-commit later. llvm-svn: 351598
* [CGP] Check for existing inttotpr before creating new oneRoman Tereshin2019-01-181-4/+13
| | | | | | | | | | | | | | | | Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of the same integer value while sinking address computations, but rather CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking that related pointers have nothing to do with each other. This problem blocks LoadStoreVectorizer from vectorizing some of the loads / stores in a downstream target. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56838 llvm-svn: 351582
* [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.Eli Friedman2018-12-191-3/+4
| | | | | | | | | | | | | | Creating the IR builder, then modifying the CFG, leads to an IRBuilder where the BB and insertion point are inconsistent, so new instructions have the wrong parent. Modified an existing test because the test wasn't covering anything useful (the "invoke" was not actually an invoke by the time we hit the code in question). Differential Revision: https://reviews.llvm.org/D55729 llvm-svn: 349693
* [Debuginfo] Prevent CodeGenPrepare from dropping debuginfo references.Wolfgang Pieb2018-12-111-0/+15
| | | | | | | | | | | | | | This fixes PR39845. CodeGenPrepare employs a transactional model when performing optimizations, i.e. it changes the IR to attempt an optimization and rolls back the change when it finds the change inadequate. It is during the rollback that references to locals were dropped from debug value intrinsics. This patch reinstates debuginfo references during rollbacks. Reviewers: aprantl, vsk Differential Revision: https://reviews.llvm.org/D55396 llvm-svn: 348896
* [CGP] Improve compile time for complex addressing modeSerguei Katkov2018-11-291-106/+58
| | | | | | | | | | | | This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54932 llvm-svn: 347839
* Fix disturbing warning - NFCISerge Guelton2018-11-191-1/+1
| | | | llvm-svn: 347186
* [ProfileSummary] Standardize methods and fix commentVedant Kumar2018-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182
* Use a data structure better suited for large sets in SimplificationTracker.Ali Tamur2018-11-121-11/+156
| | | | | | | | | | | | | | | | | Summary: D44571 changed SimplificationTracker to use SmallSetVector to keep phi nodes. As a result, when the number of phi nodes is large, the build time performance suffers badly. When building for power pc, we have a case where there are more than 600.000 nodes, and it takes too long to compile. In this change, I partially revert D44571 to use SmallPtrSet, which does an acceptable job with any number of elements. In the original patch, having a deterministic iteration order was mentioned as a motivation, however I think it only applies to the nodes already matched in MatchPhiSet method, which I did not touch. Reviewers: bjope, skatkov Reviewed By: bjope, skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54007 llvm-svn: 346710
* Add support for llvm.is.constant intrinsic (PR4898)James Y Knight2018-11-071-14/+29
| | | | | | | | | | | | | | | This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322
* CGP: Clear data structures at the end of a loop iteration instead of the ↵Peter Collingbourne2018-10-231-5/+5
| | | | | | | | | | | | beginning. Clearing LargeOffsetGEPMap at the end fixes a bug where if a large offset GEP is in a dead basic block, we fail an assertion when trying to delete the block due to the asserting VH in LargeOffsetGEPMap. Differential Revision: https://reviews.llvm.org/D53464 llvm-svn: 345082
* Fix a use-after-RAUW bug in large GEP splittingKrzysztof Pszeniczny2018-10-191-3/+14
| | | | | | | | | | | | | | | | | | | Summary: Large GEP splitting, introduced in rL332015, uses a `DenseMap<AssertingVH<Value>, ...>`. This causes an assertion to fail (in debug builds) or undefined behaviour to occur (in release builds) when a value is RAUWed. This manifested itself in the 7zip benchmark from the llvm test suite built on ARM with `-fstrict-vtable-pointers` enabled while RAUWing invariant group launders and splits in CodeGenPrepare. This patch merges the large offsets of the argument and the result of an invariant.group strip/launder intrinsic before RAUWing. Reviewers: Prazek, javed.absar, haicheng, efriedma Reviewed By: Prazek, efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51936 llvm-svn: 344802
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-271-2/+1
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* [CodeGen] Enable tail calls for functions with NonNull attributes.David Green2018-09-261-9/+0
| | | | | | | | | | | Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
* [CodeGenPrepare] Preserve debug locs in OptimizeExtractBitsVedant Kumar2018-09-151-1/+6
| | | | | | | | | CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it easier for the backend to emit fancy extract-bits instructions (e.g UBFX). Teach it to preserve debug locations and salvage debug values. llvm-svn: 342319
* [CGP] Ensure splitgep gives deterministic outputDavid Green2018-09-121-1/+1
| | | | | | | | | | | | | | The output of splitLargeGEPOffsets does not appear to be deterministic because of the way that we iterate over a DenseMap. I've changed it to a MapVector for consistent output. The test here isn't particularly great, only showing a consmetic difference in output. The original reproducer is much larger but show a diffierence in instruction ordering, leading to different codegen. Differential Revision: https://reviews.llvm.org/D51851 llvm-svn: 342043
* Revert "[CodeGenPrepare] Scan past debug intrinsics to find select ↵David Blaikie2018-08-281-4/+3
| | | | | | | | | | | | | | | candidates (NFC)" This causes crashes due to the interleaved dbg.value intrinsics being left at the end of basic blocks, causing the actual terminators (br, etc) to be not where they should be (not at the end of the block), leading to later crashes. Further discussion on the original commit thread. This reverts commit r340368. llvm-svn: 340794
* [CodeGenPrepare] Set debug locs when folding a comparison into a ↵Vedant Kumar2018-08-221-0/+4
| | | | | | | | | uadd.with.overflow CGP can replace a branch + select with a uadd.with.overflow. Teach it to set debug locations as it does this. llvm-svn: 340432
* [CodeGenPrepare] Set debug loc when widening a switch conditionVedant Kumar2018-08-221-0/+1
| | | | | | | Set a debug location on the cast instruction used to widen a switch condition. llvm-svn: 340379
* [CodeGenPrepare] Set debug locations when splitting selectsVedant Kumar2018-08-221-1/+5
| | | | | | | When splitting a select into a diamond, set debug locations on newly-created branch instructions and phi nodes. llvm-svn: 340371
* [CodeGenPrepare] Clean up dbg.value use-before-def as late as possibleVedant Kumar2018-08-211-5/+4
| | | | | | | | | | | CodeGenPrepare has a strategy for moving dbg.values so that a value's definition always dominates its debug users. This cleanup was happening too early (before certain CGP transforms were run), resulting in some dbg.value use-before-def errors. Perform this cleanup as late as possible to avoid use-before-def. llvm-svn: 340370
* [CodeGenPrepare] Scan past debug intrinsics to find select candidates (NFC)Vedant Kumar2018-08-211-3/+4
| | | | | | | | | | In optimizeSelectInst, when scanning for candidate selects to rewrite into branches, scan past debug intrinsics. This makes the debug-enabled and non-debug paths through optimizeSelectInst more congruent. NFC because every select is eventually visited either way. llvm-svn: 340368
* [CodeGenPrepare] Exit earlier when optimizing selects (NFC)Vedant Kumar2018-08-211-2/+5
| | | | | | | When optimizing for size, this allows optimizeSelectInst to skip a linear scan and exit early. llvm-svn: 340367
* [CodeGenPrepare] Add BothExtension type to PromotedInstsGuozhi Wei2018-08-151-7/+49
| | | | | | | | | | | | This patch fixes PR38125. Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough. This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended. Differential Revision: https://reviews.llvm.org/D49512 llvm-svn: 339822
* [CGP] Fix GEP issue with out of range APInt constant values not fitting in ↵Simon Pilgrim2018-08-131-2/+7
| | | | | | | | int64_t Test case reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7173 llvm-svn: 339556
* Fix inconsistency with/without debug information (-g)Jonas Devlieghere2018-08-071-1/+1
| | | | | | | | | | | | | This fixes an inconsistency in code generation when compiling with or without debug information (-g). When debug information is available in an empty block, the original test would fail, resulting in possibly different code. Patch by: Jeroen Dobbelaere Differential revision: https://reviews.llvm.org/D49467 llvm-svn: 339129
* [CodeGen] Fix inconsistent declaration parameter nameFangrui Song2018-07-161-7/+7
| | | | llvm-svn: 337200
* Use Type::isIntOrPtrTy where possible, NFCVedant Kumar2018-07-061-5/+3
| | | | | | | | | | | It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() || T.isPointerTy()`. I used Python's re.sub with this regex to update users: r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)' llvm-svn: 336462
OpenPOWER on IntegriCloud