summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Revert [SimplifyCFG] Relax restriction for folding unconditional branchesSerguei Katkov2018-02-051-4/+1
| | | | | | | | | | | The patch causes the failure of the test compiler-rt/test/profile/Linux/counter_promo_nest.c To unblock buildbot, revert the patch while investigation is in progress. Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324214
* [SimplifyCFG] Relax restriction for folding unconditional branchesSerguei Katkov2018-02-051-1/+4
| | | | | | | | | | | | | | | | | | The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324208
* Re-apply [SCEV] Fix isLoopEntryGuardedByCond usageSerguei Katkov2018-02-051-8/+11
| | | | | | | | | | | | | | | | | ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. To bugs additionally fixed: assert is moved after the check whether loop is not a nullptr. Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow is guarded by isAvailableAtLoopEntry. Reviewers: sanjoy, mkazantsev, anna, dorit, reames Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42417 llvm-svn: 324204
* [InlineFunction] Set arg attrs even if there only are VarArg attrs.Florian Hahn2018-02-041-1/+1
| | | | | | | | | | | | When using the partial inliner, we might have attributes for forwarded varargs, but the CodeExtractor does not create an empty argument attribute set for regular arguments in that case, because it does not know of the additional arguments. So in case we have attributes for VarArgs, we also have to make sure we create (empty) attributes for all regular arguments. This fixes PR36210. llvm-svn: 324197
* [LV] Use Demanded Bits and ValueTracking for reduction type-shrinkingChad Rosier2018-02-042-76/+158
| | | | | | | | | | | | | | | The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195
* [InstCombine] Allow common type conversions to i8/i16/i32David Green2018-02-031-1/+9
| | | | | | | | | | | This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 324174
* [InstCombine] Use getDestAlignment in SimplifyMemSet (NFC)Daniel Neilson2018-02-021-2/+2
| | | | | | | | Summary: Small NFC change to change the name of the function used getting and setting the alignment of a memset. llvm-svn: 324148
* [InstCombine] simplify logic for swapMayExposeCSEOpportunities; NFCISanjay Patel2018-02-021-23/+9
| | | | llvm-svn: 324122
* [InstCombine] fix typos, formatting; NFCSanjay Patel2018-02-021-7/+6
| | | | llvm-svn: 324118
* [ThinLTO] - Add comment. NFC.George Rimar2018-02-021-0/+2
| | | | | | Was requested during review of D42798. llvm-svn: 324095
* [GlobalOpt] Include padding in debug fragmentsMikael Holmen2018-02-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When creating the debug fragments for a SRA'd variable, use the types' allocation sizes. This fixes issues where the pass would emit too small fragments, placed at the wrong offset, for padded types. An example of this is long double on x86. The type is represented using x86_fp80, which is 10 bytes, but the value is aligned to 12/16 bytes. The padding is included in the type's DW_AT_byte_size attribute; therefore, the fragments should also include that. Newer GCC releases (I tested 7.2.0) emit 12/16-byte pieces for long double. Earlier releases, e.g. GCC 5.5.0, behaved as LLVM did, i.e. by emitting a 10-byte piece, followed by an empty 2/6-byte piece for the padding. Failing to cover all `DW_AT_byte_size' bytes of a value with non-empty pieces results in the value being printed as <optimized out> by GDB. Patch by: David Stenberg Reviewers: aprantl, JDevlieghere Reviewed By: aprantl, JDevlieghere Subscribers: llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D42807 llvm-svn: 324066
* Add missing includesDavid Blaikie2018-02-021-0/+3
| | | | llvm-svn: 324040
* [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 instSanjay Patel2018-02-011-5/+13
| | | | | | | | | | | | | | | | This is the enhancement suggested in D42536 to fix a shortcoming in regular InstCombine's canEvaluate* functionality. When we have multiple uses of a value, but they're all in one instruction, we can allow that expression to be narrowed or widened for the same cost as a single-use value. AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified away if the operands are the same value; add becomes shl; shifts with a variable shift amount aren't handled. Differential Revision: https://reviews.llvm.org/D42739 llvm-svn: 324014
* Revert commit rL323951David Green2018-02-011-6/+2
| | | | | | | Looks like it's causing timeouts out on at least ppc64le buildbots. llvm-svn: 323959
* [InstCombine] Allow common type conversions to i8/i16/i32David Green2018-02-011-2/+6
| | | | | | | | | | | This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 323951
* [LSR] Don't force bases of foldable formulae to the final type.Mikael Holmen2018-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946
* [GlobalOpt] Improve common case efficiency of static global initializer ↵Amara Emerson2018-01-311-2/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | evaluation For very, very large global initializers which can be statically evaluated, the code would create vectors of temporary Constants, modifying them in place, before committing the resulting Constant aggregate to the global's initializer value. This had effectively O(n^2) complexity in the size of the global initializer and would cause memory and non-termination issues compiling some workloads. This change performs the static initializer evaluation and creation in batches, once for each global in the evaluated IR memory. The existing code is maintained as a last resort when the initializers are more complex than simple values in a large aggregate. This should theoretically by NFC, no test as the example case is massive. The existing test cases pass with this, as well as the llvm test suite. To give an example, consider the following C++ code adapted from the clang regression tests: struct S { int n = 10; int m = 2 * n; S(int a) : n(a) {} }; template<typename T> struct U { T *r = &q; T q = 42; U *p = this; }; U<S> e; The global static constructor for 'e' will need to initialize 'r' and 'p' of the outer struct, while also initializing the inner 'q' structs 'n' and 'm' members. This batch algorithm will simply use general CommitValueTo() method to handle the complex nested S struct initialization of 'q', before processing the outermost members in a single batch. Using CommitValueTo() to handle member in the outer struct is inefficient when the struct/array is very large as we end up creating and destroy constant arrays for each initialization. For the above case, we expect the following IR to be generated: %struct.U = type { %struct.S*, %struct.S, %struct.U* } %struct.S = type { i32, i32 } @e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e, i64 0, i32 1), %struct.S { i32 42, i32 84 }, %struct.U* @e } The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex constant expression, while the other two elements of @e are "simple". Differential Revision: https://reviews.llvm.org/D42612 llvm-svn: 323933
* Utils: Fix DomTree update for entry blockMatt Arsenault2018-01-311-5/+14
| | | | | | | If SplitBlockPredecessors was used on a function entry block, it wouldn't update the dominator tree. llvm-svn: 323928
* [AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf ↵Amjad Aboud2018-01-311-4/+12
| | | | | | | | | | | node correctly. This covers the case where TruncInst leaf node is a constant expression. See PR36121 for more details. Differential Revision: https://reviews.llvm.org/D42622 llvm-svn: 323926
* [GlobalOpt] Fix exponential compile-time with selects.Eli Friedman2018-01-311-17/+16
| | | | | | | | | | | | | | | If you have a long chain of select instructions created from something like `int* p = &g; if (foo()) p += 4; if (foo2()) p += 4;` etc., a naive recursive visitor will recursively visit each select twice, which is O(2^N) in the number of select instructions. Use the visited set to cut off recursion in this case. (No testcase because this doesn't actually change the behavior, just the time.) Differential Revision: https://reviews.llvm.org/D42451 llvm-svn: 323910
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-311-0/+12
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPsMarek Olsak2018-01-311-0/+2
| | | | | | | | | | | | | | Summary: !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things happen. Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D42744 llvm-svn: 323907
* [InstCombine] reduce code duplication for canEvaluate* functions; NFCISanjay Patel2018-01-311-47/+43
| | | | | | We'd have to make the change suggested in D42536 3x otherwise. llvm-svn: 323877
* [AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks.Amjad Aboud2018-01-313-5/+19
| | | | | | | | | Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis. See PR36134 for more details. Differential Revision: https://reviews.llvm.org/D42683 llvm-svn: 323862
* LTO: Drop comdats when converting definitions to declarations.Peter Collingbourne2018-01-311-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D42715 llvm-svn: 323844
* Teach ValueMapper to use ODR uniqued types when availableTeresa Johnson2018-01-301-4/+15
| | | | | | | | | | | | | | | | | Summary: This is exposed during ThinLTO compilation, when we import an alias by creating a clone of the aliasee. Without this fix the debug type is unnecessarily cloned and we get a duplicate, undoing the uniquing. Fixes PR36089. Reviewers: mehdi_amini, pcc Subscribers: eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D41669 llvm-svn: 323813
* Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass ↵Zaara Syeda2018-01-301-6/+158
| | | | | | | | | | | | | | | | | | | | | to mark candidates with coldcc attribute. This recommits r322721 reverted due to sanitizer memory leak build bot failures. Original commit message: This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 323778
* [RS4GC] Handle call/invoke instructions as base defining values of vectorsDaniel Neilson2018-01-301-0/+6
| | | | | | | | | | | | | | | | | | Summary: There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions, and the former does not. This appears to be simple oversight. This patch remedies the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector(). Reviewers: DaniilSuchkov, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42653 llvm-svn: 323764
* [DSE] make sure memory is not modified before partial store merging (PR36129)Sanjay Patel2018-01-301-1/+2
| | | | | | | | | | | | We missed a critical check in D30703. We must make sure that no intermediate store is sitting between the stores that we want to merge. This should fix: https://bugs.llvm.org/show_bug.cgi?id=36129 Differential Revision: https://reviews.llvm.org/D42663 llvm-svn: 323759
* [JumpThreading][NFC] Rename LoadInst variablesBrian M. Rzycki2018-01-291-43/+46
| | | | | | | | | | | | | | | | | | Summary: The JumpThreading pass has several locations where to the variable name LI refers to a LoadInst type. This is confusing and inhibits the ability to use LI for LoopInfo as a member of the JumpThreading class. Minor formatting and comments were also altered to reflect this change. Reviewers: dberlin, kuba, spop, sebpop Reviewed by: sebpop Subscribers: sebpop, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42601 llvm-svn: 323695
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-291-133/+382
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662
* [ThinLTO] - Stop internalizing and drop non-prevailing symbols.George Rimar2018-01-291-20/+30
| | | | | | | | | | | Implementation marks non-prevailing symbols as not live in the summary. Then them are dropped in backends. Fixes https://bugs.llvm.org/show_bug.cgi?id=35938 Differential revision: https://reviews.llvm.org/D42107 llvm-svn: 323633
* [CVP] Don't Replace incoming values from unreachable blocks with undef.Davide Italiano2018-01-291-24/+4
| | | | | | | | | | This pretty much reverts r322006, except that we keep the test, because we work around the issue exposed in a different way (a recursion limit in value tracking). There's still probably some sequence that exposes this problem, and the proper way to fix that for somebody who has time is outlined in the code review. llvm-svn: 323630
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-292-2/+2
| | | | | | "to to" -> "to" llvm-svn: 323628
* Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements ↵Alexey Bataev2018-01-271-369/+131
| | | | | | | | as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581
* Revert "[SLP] Removed the warning about unused variable, NFC."Alexey Bataev2018-01-271-1/+1
| | | | | | This reverts commit r323533 to fix possible problems in users code. llvm-svn: 323580
* [InstrProfiling] Don't exit early when an unused intrinsic is foundVedant Kumar2018-01-271-3/+6
| | | | | | This fixes a think-o in r323574. llvm-svn: 323576
* [InstrProfiling] Improve compile time when there is no workVedant Kumar2018-01-261-2/+21
| | | | | | | When there are no uses of profiling intrinsics in a module, and there's no coverage data to lower, InstrProfiling has no work to do. llvm-svn: 323574
* [InstCombine] Preserve debug values for eliminable castsVedant Kumar2018-01-261-1/+15
| | | | | | | | | | | | | | | | | A cast from A to B is eliminable if its result is casted to C, and if the pair of casts could just be expressed as a single cast. E.g here, %c1 is eliminable: %c1 = zext i16 %A to i32 %c2 = sext i32 %c1 to i64 InstCombine optimizes away eliminable casts. This patch teaches it to insert a dbg.value intrinsic pointing to the final result, so that local variables pointing to the eliminable result are preserved. Differential Revision: https://reviews.llvm.org/D42566 llvm-svn: 323570
* [SLP] Removed the warning about unused variable, NFC.Alexey Bataev2018-01-261-1/+1
| | | | llvm-svn: 323533
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-261-131/+369
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530
* [AMDGPU] fix LDS f32 intrinsicsDaniil Fukalov2018-01-261-6/+6
| | | | | | | | | | | | - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516
* [CallSiteSplitting] Fix infinite loop when recording conditions.Florian Hahn2018-01-261-1/+2
| | | | | | | | | Fix infinite loop when recording conditions by correctly marking basic blocks as visited. Fixes https://bugs.llvm.org/show_bug.cgi?id=36105 llvm-svn: 323515
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-263-3/+3
| | | | | | "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508
* [Debug] LCSSA: Insert dbg.value at the first available insertion pointVedant Kumar2018-01-251-1/+3
| | | | | | | | | | | Inserting a dbg.value instruction at the start of a basic block with a landingpad instruction triggers a verifier failure. We should be OK if we insert the instruction a bit later. Speculative fix for the bot failure described here: https://reviews.llvm.org/D42551 llvm-svn: 323482
* [SyntheticCounts] Rewrite the code using only graph traits.Easwaran Raman2018-01-251-6/+17
| | | | | | | | | | | | | | | | | | | Summary: The intent of this is to allow the code to be used with ThinLTO. In Thinlink phase, a traditional Callgraph can not be computed even though all the necessary information (nodes and edges of a call graph) is available. This is due to the fact that CallGraph class is closely tied to the IR. This patch first extends GraphTraits to add a CallGraphTraits graph. This is then used to implement a version of counts propagation on a generic callgraph. Reviewers: davidxl Subscribers: mehdi_amini, tejohnson, llvm-commits Differential Revision: https://reviews.llvm.org/D42311 llvm-svn: 323475
* [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.Vedant Kumar2018-01-251-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Patch by Matt Davis! Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 323472
* [Debug] Add a utility to propagate dbg.value to new PHIs, NFCVedant Kumar2018-01-252-33/+39
| | | | | | | | | | This simply moves an existing utility to Utils for reuse. Split out of: https://reviews.llvm.org/D42551 Patch by Matt Davis! llvm-svn: 323471
* [asan] Fix kernel callback naming in instrumentation module.Evgeniy Stepanov2018-01-251-3/+1
| | | | | | | | | | Right now clang uses "_n" suffix for some user space callbacks and "N" for the matching kernel ones. There's no need for this and it actually breaks kernel build with inline instrumentation. Use the same callback names for user space and the kernel (and also make them consistent with the names GCC uses). Patch by Andrey Konovalov. Differential Revision: https://reviews.llvm.org/D42423 llvm-svn: 323470
* Re-land "[ThinLTO] Add call edges' relative block frequency to per-module ↵Easwaran Raman2018-01-251-2/+3
| | | | | | | | | | | | | | summary." It was reverted after buildbot regressions. Original commit message: This allows relative block frequency of call edges to be passed to the thinlink stage where it will be used to compute synthetic entry counts of functions. llvm-svn: 323460
OpenPOWER on IntegriCloud