summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [SimplifyCFG] Use pointer identity to simplify predicate.Benjamin Kramer2016-02-201-4/+2
| | | | | | No functional change intended. llvm-svn: 261427
* [SimplifyCFG] Merge together cleanuppadsDavid Majnemer2016-02-201-2/+45
| | | | | | | | | | Cleanuppads may be merged together if one is the only predecessor of the other in which case a simple transform can be performed: replace the a cleanupret with a branch and remove an unnecessary cleanuppad. Differential Revision: http://reviews.llvm.org/D17459 llvm-svn: 261390
* Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage ↵Hans Wennborg2016-02-191-106/+31
| | | | | | | | calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368
* [LV] Vectorize first-order recurrencesMatthew Simpson2016-02-192-6/+234
| | | | | | | | | | | | | | | | | | This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346
* [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorizationSilviu Baranga2016-02-191-0/+10
| | | | | | | | | | | | | | | | | | | | Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 llvm-svn: 261331
* [LPM] Factor all of the loop analysis usage updates into a common helperChandler Carruth2016-02-1912-171/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine. We were getting this wrong in small ways and generally being very inconsistent about it across loop passes. Instead, let's have a common place where we do this. One minor downside is that this will require some analyses like SCEV in more places than they are strictly needed. However, this seems benign as these analyses are complete no-ops, and without this consistency we can in many cases end up with the legacy pass manager scheduling deciding to split up a loop pass pipeline in order to run the function analysis half-way through. It is very, very annoying to fix these without just being very pedantic across the board. The only loop passes I've not updated here are ones that use AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer. They seemed less relevant. With this patch, almost all of the problems in PR24804 around loop pass pipelines are fixed. The one remaining issue is that we run simplify-cfg and instcombine in the middle of the loop pass pipeline. We've recently added some loop variants of these passes that would seem substantially cleaner to use, but this at least gets us much closer to the previous state. Notably, the seven loop pass managers is down to three. I've not updated the loop passes using LoopAccessAnalysis because that analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't clear that those transforms want to support those forms anyways. They all run late anyways, so this is harmless. Similarly, LSR is left alone because it already carefully manages its forms and doesn't need to get fused into a single loop pass manager with a bunch of other loop passes. LoopReroll didn't use loop simplified form previously, and I've updated the test case to match the trivially different output. Finally, I've also factored all the pass initialization for the passes that use this technique as well, so that should be done regularly and reliably. Thanks to James for the help reviewing and thinking about this stuff, and Ben for help thinking about it as well! Differential Revision: http://reviews.llvm.org/D17435 llvm-svn: 261316
* [AA] Preserve the AA results wrapper pass as well as BasicAA in a fewChandler Carruth2016-02-192-0/+6
| | | | | | | | | | | | | more places to prevent gratuitous re-"runs" of these passes. The passes themselves don't do any work when run, but we keep spending time scheduling and running these needlessly when we really don't need to do so. This is the first patch towards fixing the really horrible loop pass pipeline fragmentation pointed out by Sanjoy in PR24804. llvm-svn: 261302
* Bug fix: use dyn_cast_or_null instead of dyn_castLawrence Hu2016-02-191-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D17154 llvm-svn: 261299
* Remove uses of builtin comma operator.Richard Trieu2016-02-189-41/+77
| | | | | | Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270
* [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFCAdam Nemet2016-02-182-0/+227
| | | | | | | | | | | | | This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). Obviously the pass still only used from PPC at this point. Subsequent patches will start driving this from ARM64 as well. Due to the previous patch most lines should show up as moved lines. llvm-svn: 261265
* Reapply commit r259357 with a fix for PR26629Matthew Simpson2016-02-181-12/+237
| | | | | | | | | | Commit r259357 was reverted because it caused PR26629. We were assuming all roots of a vectorizable tree could be truncated to the same width, which is not the case in general. This commit reapplies the patch along with a fix and a new test case to ensure we don't regress because of this issue again. This should fix PR26629. llvm-svn: 261212
* [PM] Port the PostOrderFunctionAttrs pass to the new pass manager andChandler Carruth2016-02-183-39/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | convert one test to use this. This is a particularly significant milestone because it required a working per-function AA framework which can be queried over each function from within a CGSCC transform pass (and additionally a module analysis to be accessible). This is essentially *the* point of the entire pass manager rewrite. A CGSCC transform is able to query for multiple different function's analysis results. It works. The whole thing appears to actually work and accomplish the original goal. While we were able to hack function attrs and basic-aa to "work" in the old pass manager, this port doesn't use any of that, it directly leverages the new fundamental functionality. For this to work, the CGSCC framework also has to support SCC-based behavior analysis, etc. The only part of the CGSCC pass infrastructure not sorted out at this point are the updates in the face of inlining and running function passes that mutate the call graph. The changes are pretty boring and boiler-plate. Most of the work was factored into more focused preperatory patches. But this is what wires it all together. llvm-svn: 261203
* Minor code cleanup. NFC.Junmo Park2016-02-181-1/+1
| | | | llvm-svn: 261200
* [sanitizer-coverage] implement -fsanitize-coverage=trace-pc. This is similar ↵Kostya Serebryany2016-02-171-6/+24
| | | | | | to trace-bb, but has a different API. We already use the equivalent flag in GCC for Linux kernel fuzzing. We may be able to use this flag with AFL too llvm-svn: 261159
* NFC: Fix formatingAmaury Sechet2016-02-171-4/+4
| | | | llvm-svn: 261156
* [LIR] Avoid turning non-temporal stores into memsetHaicheng Wu2016-02-171-0/+4
| | | | | | This is to fix PR26645. llvm-svn: 261149
* Debug Info: Teach LdStHasDebugValue() (Local.cpp) about DIExpressions.Adrian Prantl2016-02-171-17/+16
| | | | | | | | | | | This function is used to check whether a dbg.value intrinsic has already been inserted, but without comparing the DIExpression, it would erroneously fire on split aggregates and only the first scalar would survive. Found via http://reviews.llvm.org/D16867. <rdar://problem/24456528> llvm-svn: 261145
* Create masked gather and scatter intrinsics in Loop Vectorizer.Elena Demikhovsky2016-02-171-96/+184
| | | | | | | | | Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140
* Fix load alignement when unpacking aggregates structsAmaury Sechet2016-02-171-12/+26
| | | | | | | | | | | | Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it. Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17326 llvm-svn: 261139
* Revert "Reapply commit r258404 with fix."David Majnemer2016-02-171-231/+11
| | | | | | This reverts commit r259357, it caused PR26629. llvm-svn: 261137
* [ObjCARC] Handle ARCInstKind::ClaimRV in OptimizeIndividualCalls.Frederic Riss2016-02-171-0/+1
| | | | | | | | | When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the ARC optimizer in r258970, one case was missed which would lead the optimizer to execute an llvm_unreachable. In this case, just handle ClaimRV in the same way we handle RetainRV. llvm-svn: 261134
* Define the ThinLTO Pipeline (experimental)Mehdi Amini2016-02-161-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel (even if it is not totally "free": projects like clang are reusing product from the "compile phase" for multiple link, think about libLLVMSupport reused for opt, llc, etc.). This pipeline is based on the proposal in D13443 for full LTO. We didn't move forward on this proposal because the LTO link was far too long after that. We believe that we can afford it with ThinLTO. The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. It is not clear if this part of the pipeline will stay as is, as the split model of ThinLTO does not allow the same benefit as FullLTO without added tricks. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO and it will evolve, but this should provide a good starting point. Reviewers: tejohnson Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261029
* Refactor the PassManagerBuilder: extract a ↵Mehdi Amini2016-02-161-72/+76
| | | | | | | | | | "addFunctionSimplificationPasses()" (NFC) It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261028
* [SCEVExpander] Make findExistingExpansion smarterJunmo Park2016-02-161-3/+5
| | | | | | | | | | | | | Summary: Extending findExistingExpansion can use existing value in ExprValueMap. This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark) Reviewers: mzolotukhin, sanjoy, zzheng Differential Revision: http://reviews.llvm.org/D15559 llvm-svn: 260938
* [LV] Add support for insertelt/extractelt processing during type truncationSilviu Baranga2016-02-151-0/+14
| | | | | | | | | | | | | | | | | | Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893
* Tweak the LICM code to reuse the first sub-loop instead of creating a new oneRoman Gareev2016-02-151-14/+32
| | | | | | | | | | | | | LICM starts with an *empty* AST, and then merges in each sub-loop. While the add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for sub-loop 1. If the AST starts off empty, we can just clone/move the contents of the subloop into the containing AST. Reviewed-by: Philip Reames <listmail@philipreames.com> Differential Revision: http://reviews.llvm.org/D16753 llvm-svn: 260892
* Use ArrayRef to hide SmallVector details, kill a useless vector copy along ↵Benjamin Kramer2016-02-131-3/+2
| | | | | | the way. llvm-svn: 260824
* [attrs] Move the norecurse deduction to operate on the node set ratherChandler Carruth2016-02-131-15/+16
| | | | | | | | | | | | | | | | | | than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813
* [Cloning] Clone every Function's Debug InfoKeno Fischer2016-02-132-2/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: Export the CloneDebugInfoMetadata utility, which clones all debug info associated with a function into the first module. Also use this function in CloneModule on each function we clone (the CloneFunction entrypoint already does this). Without this, cloning a module will lead to DI quality regressions, especially since r252219 reversed the Function <-> DISubprogram edge (before we could get lucky and have this edge preserved if the DISubprogram itself was, e.g. due to location metadata). This was verified to fix missing debug information in julia and a unittest to verify the new behavior is included. Patch by Yichao Yu! Thanks! Reviewers: loladiro, pcc Differential Revision: http://reviews.llvm.org/D17165 llvm-svn: 260791
* [LIR] Allow merging of memsets in negatively strided loops.Chad Rosier2016-02-121-5/+7
| | | | | | Last part of PR25166. llvm-svn: 260732
* Fix typo in comment.Justin Lebar2016-02-121-1/+1
| | | | llvm-svn: 260731
* [SimplifyCFG] Don't fold conditional branches that contain calls to ↵Justin Lebar2016-02-121-14/+6
| | | | | | | | | | | | | | | | convergent functions. Summary: Performing this optimization duplicates the call to the convergent function and adds new control-flow dependencies, which is a no-no. Reviewers: jingyue Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17128 llvm-svn: 260730
* [LoopRotate] Don't perform loop rotation if the loop header calls a ↵Justin Lebar2016-02-121-0/+5
| | | | | | | | | | | | | | | | | convergent function. Summary: Calls to convergent functions can be duplicated, but only if the duplicates are not control-flow dependent on any additional values. Loop rotation doesn't meet the bar. Reviewers: jingyue Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune Differential Revision: http://reviews.llvm.org/D17127 llvm-svn: 260729
* Remove unused variableDavid Majnemer2016-02-121-1/+0
| | | | llvm-svn: 260722
* [GVN] Common code for local and non-local load availability [NFCI]Philip Reames2016-02-121-248/+148
| | | | | | | | | | | | The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case. The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first. Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread. Differential Revision: http://reviews.llvm.org/D16608 llvm-svn: 260711
* [LIR] Partially revert r252926(NFC), which introduced a very subtle change.Chad Rosier2016-02-121-8/+8
| | | | | | | | | | | | | In short, before r252926 we were comparing an unsigned (StoreSize) against an a APInt (Stride), which is fine and well. After we were zero extending the Stride and then converting to an unsigned, which is not the same thing. Obviously, Stides can also be negative. This commit just restores the original behavior. AFAICT, it's not possible to write a test case to expose the issue because the code already has checks to make sure the StoreSize can't overflow an unsigned (which prevents the Stride from overflowing an unsigned as well). llvm-svn: 260706
* [InstCombine] Don't aggressively replace xor with icmpDavid Majnemer2016-02-121-17/+20
| | | | | | | | | | | | | | | | | | For some cases, InstCombine replaces the sequence of xor/sub instruction followed by cmp instruction into a single cmp instruction. However, this replacement may result suboptimal result especially when the xor/sub has more than one use, as discussed in bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465). This patch make the replacement happen only when xor/sub has only one use. Differential Revision: http://reviews.llvm.org/D16915 Patch by Taewook Oh! llvm-svn: 260695
* [attrs] Simplify the convergent removal to directly use the pre-builtChandler Carruth2016-02-121-22/+10
| | | | | | | | | | | | | | node set rather than walking the SCC directly. This directly exposes the functions and has already had null entries filtered out. We also don't need need to handle optnone as it has already been handled in the caller -- we never try to remove convergent when there are optnone functions in the SCC. With this change, the code for removing convergent should work with the new pass manager and a different SCC analysis. llvm-svn: 260668
* [attrs] Consolidate the test for a non-SCC, non-convergent function callChandler Carruth2016-02-121-20/+14
| | | | | | | | | | | | | | | | with the test for a non-convergent intrinsic call. While it is possible to use the call records to search for function calls, we're going to do an instruction scan anyways to find the intrinsics, we can handle both cases while scanning instructions. This will also make the logic more amenable to the new pass manager which doesn't use the same call graph structure. My next patch will remove use of CallGraphNode entirely and allow this code to work with both the old and new pass manager. Fortunately, it should also get strictly simpler without changing functionality. llvm-svn: 260666
* [attrs] Run clang-format over a newly added routine in function-attrsChandler Carruth2016-02-121-12/+16
| | | | | | before I update it to be friendly with the new pass manager. llvm-svn: 260653
* [msan] Put msan constructor in a comdat.Evgeniy Stepanov2016-02-122-14/+27
| | | | | | | | | | | | | | MSan adds a constructor to each translation unit that calls __msan_init, and does nothing else. The idea is to run __msan_init before any instrumented code. This results in multiple constructors and multiple .init_array entries in the final binary, one per translation unit. This is absolutely unnecessary; one would be enough. This change moves the constructors to a comdat group in order to drop the extra ones. llvm-svn: 260632
* [SLP] Add debug output for extract cost (NFC)Matthew Simpson2016-02-111-4/+6
| | | | llvm-svn: 260614
* Re-apply r238452, the bug was in clang and was fixed in r260567.Quentin Colombet2016-02-111-5/+10
| | | | | | | | | | | | | | | | Original commit message: [InstCombine] Fold IntToPtr and PtrToInt into preceding loads. Currently we only fold a BitCast into a Load when the BitCast is its only user. Do the same for any no-op cast. Patch by Philip Pfaffe! Differential Revision: http://reviews.llvm.org/D9152 llvm-svn: 260612
* Revert "Refactor the PassManagerBuilder: extract a ↵Mehdi Amini2016-02-111-77/+73
| | | | | | | | | | "addFunctionSimplificationPasses()"" This reverts commit r260603. I didn't intend to push it :( From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260607
* Revert "Define the ThinLTO Pipeline"Mehdi Amini2016-02-111-43/+1
| | | | | | | | This reverts commit r260604. I didn't intend to push this now. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260606
* Define the ThinLTO PipelineMehdi Amini2016-02-111-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel. This pipeline is based on the proposal in D13443 for full LTO. We ] didn't move forward on this proposal because the link was far too long after that. This patch refactor the "function simplification" passes that are part of the inliner loop in a helper function (this part is NFC and can be commited separately to simplify the diff). The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO but this should provide a good starting point. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits, dexonsmith Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260604
* Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()"Mehdi Amini2016-02-111-73/+77
| | | | | | | | It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260603
* Set load alignment on aggregate loads.Pete Cooper2016-02-111-1/+2
| | | | | | | | | | | | | | | | When optimizing a extractvalue(load), we generate a load from the aggregate type. This load didn't have alignment set and so would get the alignment of the type. This breaks when the type is packed and so the alignment should be lower. For example, loading { int, int } would give us alignment of 4, but the original load from this type may have an alignment of 1 if packed. Reviewed by David Majnemer Differential revision: http://reviews.llvm.org/D17158 llvm-svn: 260587
* Fixed typo in r260530Jun Bum Lim2016-02-111-5/+5
| | | | llvm-svn: 260541
* [InstCombine] Simplify a known nonzero incoming value of PHIJun Bum Lim2016-02-111-0/+36
| | | | | | | | | | | | | | | | | | | | Summary: When a PHI is used only to be compared with zero, it is possible to replace an incoming value with any non-zero constant if the incoming value can be proved as a known nonzero value. For example, in below code, we can replace the incoming value %v with any non-zero constant based on the fact that the PHI is only used to be compared with zero and %v is a known non-zero value: %v = select %cond, 1, 2 %p = phi [%v, BB] ... %c = icmp eq, %p, 0 Reviewers: mcrosier, jmolloy, sanjoy Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16240 llvm-svn: 260530
OpenPOWER on IntegriCloud