summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Redo store splitting in CodeGenPrepare.Wei Mi2016-12-221-0/+117
| | | | | | | | | | | | This is a succeeding patch of https://reviews.llvm.org/D22840 to address the issue when a value to be merged into an int64 pair is in a different BB. Redoing the store splitting in CodeGenPrepare so we can match the pattern across multiple BBs and move some instructions into the same BB. We still keep the code in dag combine so that we can catch cases that show up after DAG combining runs. Differential Revision: https://reviews.llvm.org/D25914 llvm-svn: 290365
* [Analysis] Centralize objectsize lowering logic.George Burgess IV2016-12-201-12/+2
| | | | | | | | | We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-161-31/+137
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
* Inline stripInvariantGroupMetadata out of existenceSanjoy Das2016-12-161-7/+2
| | | | | | | As a one liner function, I don't think it is pulling its weight in terms of helping readability. llvm-svn: 289987
* Fix CodeGenPrepare::stripInvariantGroupMetadataSanjoy Das2016-12-161-2/+1
| | | | | | | | | | | | `dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. llvm-svn: 289973
* Revert "[CodeGenPrep] Skip merging empty case blocks"Jun Bum Lim2016-12-161-137/+31
| | | | | | This reverts commit r289951. llvm-svn: 289960
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-161-31/+137
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
* IR: Change the gep_type_iterator API to avoid always exposing the "current" ↵Peter Collingbourne2016-12-021-1/+1
| | | | | | | | | | | | | type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458
* Revert r287553: [CodeGenPrep] Skip merging empty case blocksJoerg Sonnenberger2016-11-281-135/+32
| | | | | | | It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052
* [CodeGenPrepare] Don't sink non-cheap addrspacecasts.Justin Lebar2016-11-211-0/+8
| | | | | | | | | | | | | | | | | | | | Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591
* [CodeGenPrepare] Rewrite a loop in terms of llvm::none_of. NFC.Justin Lebar2016-11-211-11/+3
| | | | | | | | | | Reviewers: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26924 llvm-svn: 287590
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-11-211-32/+135
| | | | | | | | | | | | | | | | Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553
* Fix comment typos. NFC.Simon Pilgrim2016-11-201-1/+1
| | | | | | Identified by Pedro Giffuni in PR27636. llvm-svn: 287490
* Hoist check for TLI above all of the attempts to use it (including oneChandler Carruth2016-11-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | of which that is hidden inside a separate function call) and helpfully before building expensive transaction infrastructure. This will avoid crashing when running CGP in a generic mode if we ever managed to hit this case. Note that I spent some time looking at alternatives. CGP is actually used without a TM or TLI in order to do some target-independent testing. Further, all of the neighboring optimization techniques actually have some paths that are effective even in the absence of TLI so this seemed the correct scope at which to check and bypass logic. It still isn't clear that long-term support for missing TM/TLI is the right cost/benefit tradeoff for CGP -- we seem to get relatively little for it and the code is just littered with checks (and assumptions which I suspect are still missing some checks). This at least fixes the potential bug in this code spotted by PVS-Studio, so we've got that going for us. ;] llvm-svn: 285987
* Use profile info to set function section prefix to group hot/cold functions.Dehao Chen2016-10-181-2/+20
| | | | | | | | | | | | | | | | Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533
* [CodeGenPrepare] When moving a zext near to its associated load, do not ↵Andrea Di Biagio2016-10-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | retain the original debug location. CodeGenPrepare knows how to move a zext of a load into the same basic block where the load lives. The goal is to help ISel match a zero-extending load instead of two separated instructions. CGP attempts to move a zext computation even if it lives in a basic block that does not post-dominate the load's basic block. That means, the hoisted zext may be speculated. Preserving the zext location would hurt the debugging experience and the quality of sample pgo. With this patch, when moving a zext near to its associated load, CGP no longer propagates the zext's debug location. Instead, CGP conservatively reuses the same debug location for the load and the zext. An alternative approach would be to assign an artificial line-0 location to the zext. However we don't want to over-use the 'line-0' for this particular case because it would have a size cost in the line-table section for no additional benefit. Differential Revision: https://reviews.llvm.org/D25611 llvm-svn: 284377
* Preserve the debug location when CodeGenPrepare sinks a compare instruction ↵Wolfgang Pieb2016-10-061-0/+2
| | | | | | | | | | | | into the basic block of a user. Patch by Andrea DiBiagio. Differential Revision: https://reviews.llvm.org/D24632 llvm-svn: 283500
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-1/+1
| | | | llvm-svn: 283004
* Fix the bug introduced in r281252.Dehao Chen2016-09-121-1/+1
| | | | llvm-svn: 281253
* Lower consecutive select instructions correctly.Dehao Chen2016-09-121-23/+75
| | | | | | | | | | | | Summary: If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095 Reviewers: davidxl Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24147 llvm-svn: 281252
* [CGP] Be less conservative about tail-duplicating a ret to allow tail callsMichael Kuperstein2016-09-081-10/+6
| | | | | | | | | | | | | | | | | CGP tail-duplicates rets into blocks that end with a call that feed the ret. This puts the call in tail position, potentially allowing the DAG builder to lower it as a tail call. To avoid tail duplication in cases where we won't form the tail call, CGP tried to predict whether this is going to be possible, and avoids doing it when lowering as a tail call will definitely fail. However, it was being too conservative by always throwing away calls to functions with a signext/zeroext attribute on the return type. Instead, we can use the same logic the builder uses to determine whether the attributes work out. Differential Revision: https://reviews.llvm.org/D24315 llvm-svn: 280894
* Don't reuse a variable name in a nested scope. NFC.Michael Kuperstein2016-09-071-6/+6
| | | | llvm-svn: 280853
* [Profile] preserve branch metadata lowering select in CGPXinliang David Li2016-09-031-3/+8
| | | | | | | | | | CGP currently drops select's MD_prof profile data when generating conditional branch which can lead to bad code layout. The patch fixes the issue. Differential Revision: http://reviews.llvm.org/D24169 llvm-svn: 280600
* Use the range variant of find instead of unpacking begin/endDavid Majnemer2016-08-111-2/+1
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433
* CodeGenPrep: use correct function to determine Global's alignment.Tim Northover2016-07-181-1/+1
| | | | | | | | | Elsewhere (particularly computeKnownBits) we assume that a global will be aligned to the value returned by Value::getPointerAlignment. This is used to boost the alignment on memcpy/memset, so any target-specific request can only increase that value. llvm-svn: 275866
* Clarify that we match BSwap in InstCombine and BitReverse in CGP. NFC.Chad Rosier2016-05-251-1/+1
| | | | | | | | Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom, so the ordering of the MatchBSwaps and MatchBitReversals arguments are consistent with the function name. llvm-svn: 270715
* Rename getLargestLegalIntTypeSize to getLargestLegalIntTypeSizeInBits(). NFC.Jun Bum Lim2016-05-131-1/+1
| | | | | | | | | | | | Summary: Rename DataLayout::getLargestLegalIntTypeSize to DataLayout::getLargestLegalIntTypeSizeInBits() to prevent similar mistakes fixed in r269433. Reviewers: joker.eph, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20248 llvm-svn: 269456
* [CGP] avoid crashing from weightlessnessSanjay Patel2016-05-091-3/+5
| | | | | | | It's possible that we have branch weights with 0 values. In that case, don't try to create an impossible BranchProbability. llvm-svn: 268935
* [CodeGenPrepare] Don't sink a cast past its userDavid Majnemer2016-04-271-0/+5
| | | | | | | | | | The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767
* [CodeGenPrepare] use branch weight metadata to decide if a select should be ↵Sanjay Patel2016-04-261-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | turned into a branch This is part of solving PR27344: https://llvm.org/bugs/show_bug.cgi?id=27344 CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the IR further with a select in place. For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly predictable branch. Even the most limited HW branch predictors will be correct on this branch almost all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all the times the branch was predicted correctly. As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's MispredictPenalty. Or we could just let targets override the default by implementing the hook with that and other target-specific options. Note that trying to statically determine mispredict rates for close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 50/50 taken/not-taken might still be 100% predictable. Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable() branch weight default values are 4 and 64. A proposal to change that is in D19435. Differential Revision: http://reviews.llvm.org/D19488 llvm-svn: 267572
* [CodeGenPrepare] don't convert an unpredictable select into control flowSanjay Patel2016-04-261-1/+2
| | | | | | | Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504
* replace duplicated static functions for profile metadata access with ↵Sanjay Patel2016-04-231-25/+2
| | | | | | BranchInst member function; NFCI llvm-svn: 267295
* Re-commit optimization bisect support (r267022) without new pass manager ↵Andrew Kaylor2016-04-221-1/+1
| | | | | | | | | | support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
* Revert "Initial implementation of optimization bisect support."Vedant Kumar2016-04-221-1/+1
| | | | | | | | This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
* Initial implementation of optimization bisect support.Andrew Kaylor2016-04-211-1/+1
| | | | | | | | | | | | This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022
* Calculate __builtin_object_size when pointer depends on a conditionPetar Jovanovic2016-04-131-3/+12
| | | | | | | | | | | | | | | | This patch fixes calculating of builtin_object_size if it depends on a condition. Before this patch compiler did not know how to calculate the object size when it finds a condition that cannot be eliminated. This patch enables calculating of builtin_object_size even in case when condition cannot be eliminated by choosing minimum or maximum value as a result from condition. Choosing minimum or maximum value from condition is based on the second argument of __builtin_object_size function. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D18438 llvm-svn: 266193
* use range-loops; NFCISanjay Patel2016-04-111-13/+8
| | | | llvm-svn: 265985
* Don't delete empty preheaders in CodeGenPrepare if it would create a ↵Chuang-Yu Cheng2016-04-051-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397
* [CodeGenPrepare] Fix r265264 (again).Peter Zotov2016-04-031-3/+3
| | | | | | | Don't require TLI for SinkCmpExpression, like it wasn't before r265264. llvm-svn: 265271
* [CodeGenPrepare] Fix r265264.Peter Zotov2016-04-031-3/+3
| | | | | | | The case where there was no TargetLowering was not handled, leading to null pointer dereferences. llvm-svn: 265265
* [CodeGenPrepare] Avoid sinking soft-FP comparisonsPeter Zotov2016-04-031-5/+9
| | | | | | | | | | | | | | | Sinking comparisons in CGP can undo the job of hoisting them done earlier by LICM, and soft-FP makes this an expensive mistake. A common pattern that produces floating point comparisons uniform over a loop is an explicit check for division by zero. If the divisor is hoisted out of the loop, the comparison can also be, but hoisting the function that unwinds is never legal, since it may cause side effects in the loop body prior to the unwinding to not be executed. Differential Revision: http://reviews.llvm.org/D18744 llvm-svn: 265264
* Keep CodeGenPrepare from preserving the domtree.George Burgess IV2016-03-221-1/+2
| | | | | | | | | | CGP modifies the domtree in some cases, so saying that it preserves the domtree is a lie. We'll be able to selectively preserve it with the new pass manager. Differential Revision: http://reviews.llvm.org/D16893 llvm-svn: 264099
* Minor code cleanups. NFC.Junmo Park2016-03-111-3/+3
| | | | llvm-svn: 263200
* [CGP] Duplicate addressing computation in cold paths if required to sink ↵Philip Reames2016-03-091-8/+45
| | | | | | | | | | | | | | addressing mode This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store. In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path. This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had. Differential Revision: http://reviews.llvm.org/D17652 llvm-svn: 263074
* [CodeGenPrepare] Remove load-based heuristicJunmo Park2016-02-251-11/+0
| | | | | | | | | | | | | | Summary: Both the hardware and LLVM have changed since 2012. Now, load-based heuristic don't show big differences any more on OoO cores. There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5). Reviewers: spatel, zansari Differential Revision: http://reviews.llvm.org/D16836 llvm-svn: 261809
* ADT: Stop using getNodePtrUnchecked on end() iteratorsDuncan P. N. Exon Smith2016-02-211-5/+3
| | | | | | | | | | | Stop using `getNodePtrUnchecked()` when building IR. Eventually a dereference will be required to get at the downcast node, since the iterator will only store an `ilist_node_base` of some sort. This should have no functionality change for now, but is a path towards removing some more UB from ilist. llvm-svn: 261495
* CodeGen: Avoid getNodePtrUnchecked() where we need a Value, NFCDuncan P. N. Exon Smith2016-02-211-4/+6
| | | | | | | | | | | | | | | `ilist_iterator<NodeTy>::getNodePtrUnchecked()` is documented as being for internal use only, but CodeGenPrepare was using it anyway. This code relies on pulling out the `Value*` pointer even after the lifetime of the iterator is over. But having this pointer available in ilist_iterator depends on UB in the first place. Instead, safely pull out the `Value*` when the iterator is alive and stop using the internal-only API. There should be no functionality change here. llvm-svn: 261493
* Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith ↵Yaron Keren2016-01-291-1/+1
| | | | | | | | r259192 post commit comment. clang part in r259232, this is the LLVM part of the patch. llvm-svn: 259240
* Minor code cleanups. NFC.Junmo Park2016-01-281-1/+1
| | | | llvm-svn: 259033
* function names start with a lowercase letter; NFCSanjay Patel2016-01-221-8/+8
| | | | llvm-svn: 258552
OpenPOWER on IntegriCloud