summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* Increases full-unroll threshold.Dehao Chen2017-02-182-29/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538
* OptDiag: Allow constructing DiagnosticLocation from DISubprogramsJustin Bogner2017-02-181-2/+1
| | | | | | | | This avoids creating a DILocation just to represent a line number, since creating Metadata is expensive. Creating a DiagnosticLocation directly is much cheaper. llvm-svn: 295531
* [NewGVN] isOnlyReachableViaThisEdge() is dead now. NFCI.Davide Italiano2017-02-171-18/+0
| | | | llvm-svn: 295503
* [NewGVN] createVariableOrConstant is not required anymore. NFCI.Davide Italiano2017-02-171-8/+0
| | | | llvm-svn: 295500
* WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset.Peter Collingbourne2017-02-171-1/+1
| | | | | | | | | | | | | | | A future change will cause this byte offset to be inttoptr'd and then exported via an absolute symbol. On the importing end we will expect the symbol to be in range [0,2^32) so that it will fit into a 32-bit relocation. The problem is that on 64-bit architectures if the offset is negative it will not be in the correct range once we inttoptr it. This change causes us to use a 32-bit integer so that it can be inttoptr'd (which zero extends) into the correct range. Differential Revision: https://reviews.llvm.org/D30016 llvm-svn: 295487
* WholeProgramDevirt: Examine the function body when deciding whether ↵Peter Collingbourne2017-02-171-12/+41
| | | | | | | | | | functions are readnone. The goal is to get an analysis result even for de-refineable functions. Differential Revision: https://reviews.llvm.org/D29803 llvm-svn: 295472
* [LV] Remove constant restriction for vector phi creationMatthew Simpson2017-02-171-44/+37
| | | | | | | | | | | | We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456
* InstCombine: fix extraction when performing vector/array punningEugene Leviant2017-02-171-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D29491 llvm-svn: 295429
* [JumpThreading] Re-enable JumpThreading for guardsSanjoy Das2017-02-172-15/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200 due to the following problem: the feature used the following algorithm for detection of diamond patters: 1. Find a block with 2 predecessors; 2. Check that these blocks have a common single parent; 3. Check that the parent's terminator is a branch instruction. The problem is that these checks are insufficient. They may pass for a non-diamond construction in case if those two predecessors are actually the same block. This may happen if parent's terminator is a br (either conditional or unconditional) to a block that ends with "switch" instruction with exactly two branches going to one block. This patch re-enables the JumpThreading for guards and fixes this issue by adding the check that those found predecessors are actually different blocks. This guarantees that parent's terminator is a conditional branch with exactly 2 different successors, which is now ensured by assertions. It also adds two more tests for this situation (with parent's terminator being a conditional and an unconditional branch). Patch by Max Kazantsev! Reviewers: anna, sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30036 llvm-svn: 295410
* Bug 31948: Fix assertion when bitcasting constantexpr pointersMatt Arsenault2017-02-171-0/+6
| | | | llvm-svn: 295387
* [LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loopsWei Mi2017-02-161-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378
* InstCombine: Canonicalize fast fmuladd to fmul + faddMatt Arsenault2017-02-161-1/+14
| | | | llvm-svn: 295353
* [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus ↵Craig Topper2017-02-162-4/+9
| | | | | | intrinsics like it does 128/256-bit. llvm-svn: 295294
* PMB: Add an importing WPD pass to the start of the ThinLTO backend pipeline.Peter Collingbourne2017-02-151-1/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D30008 llvm-svn: 295260
* Re-apply r295110 and r295144 with a fix for the ASan issue.Peter Collingbourne2017-02-151-98/+156
| | | | llvm-svn: 295241
* [InstCombine] improve formatting; NFCSanjay Patel2017-02-151-6/+3
| | | | llvm-svn: 295237
* AddressSanitizer: don't track swifterror memory addressesArnold Schwaighofer2017-02-151-3/+12
| | | | | | | | | | | | | | They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295230
* ThreadSanitizer: don't track swifterror memory addressesArnold Schwaighofer2017-02-151-0/+7
| | | | | | | | | | | | | | They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295215
* Revert "[JumpThreading] Thread through guards"Anna Thomas2017-02-152-189/+15
| | | | | | | | | This reverts commit r294617. We fail on an assert while trying to get a condition from an unconditional branch. llvm-svn: 295200
* [InlineFunction] use getFunction(); NFCSanjay Patel2017-02-151-3/+3
| | | | llvm-svn: 295185
* [InlineFunction] use getCaller(); NFCISanjay Patel2017-02-151-3/+2
| | | | llvm-svn: 295181
* [InlineFunction] use range-for loop; NFCISanjay Patel2017-02-151-10/+8
| | | | llvm-svn: 295179
* Revert r295110 and r295144.Daniel Jasper2017-02-151-156/+98
| | | | | | | This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162
* SimplifyCFG: Register cloned assume intrinsics with assumption cache when ↵Peter Collingbourne2017-02-151-3/+10
| | | | | | | | creating critical edge. Differential Revision: https://reviews.llvm.org/D29976 llvm-svn: 295145
* WholeProgramDevirt: Separate the code that applies optzns from the code that ↵Peter Collingbourne2017-02-151-48/+86
| | | | | | | | | | | decides whether to apply them. NFCI. The idea is that the apply* functions will also be called when importing devirt optimizations. Differential Revision: https://reviews.llvm.org/D29745 llvm-svn: 295144
* Fix a bug in caller's BFI update code after inlining.Easwaran Raman2017-02-141-3/+10
| | | | | | | | | | | | | | | | Multiple blocks in the callee can be mapped to a single cloned block since we prune the callee as we clone it. The existing code iterates over the value map and clones the block frequency (and eventually scales the frequencies of the cloned blocks). Value map's iteration is not deterministic and so the cloned block might get the frequency of any of the original blocks. The fix is to set the max of the original frequencies to the cloned block. The first block in the sequence must have this max frequency and, in the call context, subsequent blocks must have its frequency. Differential Revision: https://reviews.llvm.org/D29696 llvm-svn: 295115
* [LV] Rename Induction to PrimaryInduction. NFC.Michael Kuperstein2017-02-141-12/+12
| | | | llvm-svn: 295111
* WholeProgramDevirt: Change internal vcall data structures to match summary.Peter Collingbourne2017-02-141-74/+94
| | | | | | | | | | | | | | | | | | Group calls into constant and non-constant arguments up front, and use uint64_t instead of ConstantInt to represent constant arguments. The goal is to allow the information from the summary to fit naturally into this data structure in a future change (specifically, it will be added to CallSiteInfo). This has two side effects: - We disallow VCP for constant integer arguments of width >64 bits. - We remove the restriction that the bitwidth of a vcall's argument and return types must match those of the vfunc definitions. I don't expect either of these to matter in practice. The first case is uncommon, and the second one will lead to UB (so we can do anything we like). Differential Revision: https://reviews.llvm.org/D29744 llvm-svn: 295110
* [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions ↵Taewook Oh2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | created in SplitBlockPredecessors Summary: When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of ``` 1 typedef struct _node_t { 2 struct _node_t *next; 3 } node_t; 4 5 extern node_t *root; 6 7 int foo() { 8 node_t *node, *tmp; 9 int ret = 0; 10 11 node = tmp = root->next; 12 while (node != root) { 13 while (node) { 14 tmp = node; 15 node = node->next; 16 ret++; 17 } 18 } 19 20 return ret; 21 } ``` , below is the basicblock corresponding to line 12 after Reassociate expressions pass: ``` while.cond: ; preds = %while.cond2, %entry %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ] %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ] tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21 tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31 %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33 br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35 ``` As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is ``` !21 = !DILocation(line: 9, column: 7, scope: !6) ``` , which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction. Reviewers: dblaikie, samsonov, eugenis Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29867 llvm-svn: 295106
* Re-apply "[profiling] Remove dead profile name vars after emitting name data"Vedant Kumar2017-02-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This reverts 295092 (re-applies 295084), with a fix for dangling references from the array of coverage names passed down from frontends. I missed this in my initial testing because I only checked test/Profile, and not test/CoverageMapping as well. Original commit message: The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295099
* Revert "[profiling] Remove dead profile name vars after emitting name data"Vedant Kumar2017-02-141-3/+0
| | | | | | | | This reverts commit r295084. There is a test failure on: http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/ llvm-svn: 295092
* [profiling] Remove dead profile name vars after emitting name dataVedant Kumar2017-02-141-0/+3
| | | | | | | | | | | | | | | | The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295084
* Do not apply redundant LastCallToStaticBonusTaewook Oh2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075
* Correct a typo, s/hosting/hoisting/Brian Cain2017-02-141-1/+1
| | | | llvm-svn: 295066
* Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-141-10/+47
| | | | | | | | | | | This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
* [SLP] Fix for PR31879: vectorize repeated scalar ops that don't get putAlexey Bataev2017-02-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056
* Revert "[LoopVectorize] Added address space check when analysing interleaved ↵Karl-Johan Karlsson2017-02-141-20/+14
| | | | | | | | | accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042
* [LoopVectorize] Added address space check when analysing interleaved accessesKarl-Johan Karlsson2017-02-141-14/+20
| | | | | | | | | | | | | | | | | Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038
* Test commit permissionKarl-Johan Karlsson2017-02-141-1/+1
| | | | | | Removing whitespace. llvm-svn: 295037
* [LSR] Pointers with different address spaces are considered incompatible.Mikael Holmen2017-02-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | Summary: Function isCompatibleIVType is already used as a guard before the call to SE.getMinusSCEV(OperExpr, PrevExpr); in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions to be of the same type, so we now consider two pointers with different address spaces to be incompatible, since it is possible that the pointers in fact have different sizes. Reviewers: qcolombet, eli.friedman Reviewed By: qcolombet Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29885 llvm-svn: 295033
* ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible ↵Peter Collingbourne2017-02-141-13/+79
| | | | | | | | functions to merged module. Differential Revision: https://reviews.llvm.org/D29701 llvm-svn: 295021
* [LICM] Make store promotion work in the face of unordered atomicsPhilip Reames2017-02-141-5/+27
| | | | | | | | | | | | | | | | | | | | | | | Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled. Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct. It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following: while(b) { load i128 ... if (can lower i128 atomic) store atomic i128 ... else store i128 } It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result. Differential Revision: https://reviews.llvm.org/D15592 llvm-svn: 295015
* FunctionAttrs: Factor out a function for querying memory access of a ↵Peter Collingbourne2017-02-141-16/+21
| | | | | | | | | | | specific copy of a function. NFC. This will later be used by ThinLTOBitcodeWriter to add copies of readnone functions to the regular LTO module. Differential Revision: https://reviews.llvm.org/D29695 llvm-svn: 295008
* [FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back ↵Sanjay Patel2017-02-131-0/+53
| | | | | | | | | | | | | | | | | | | to its parent function As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html ...we should be able to propagate 'nonnull' info from a callsite back to its parent. The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430), but this won't solve that problem. The transform is currently disabled by default while we wait for clang to work-around potential security problems: http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html Differential Revision: https://reviews.llvm.org/D27855 llvm-svn: 294998
* IR: Type ID summary extensions for WPD; thread summary into WPD pass.Peter Collingbourne2017-02-132-9/+86
| | | | | | | | | | Make the whole thing testable by adding YAML I/O support for the WPD summary information and adding some negative tests that exercise the YAML support. Differential Revision: https://reviews.llvm.org/D29782 llvm-svn: 294981
* Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-131-20/+12
| | | | | | | | This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973
* [LV] Extend trunc optimization to all IVs with constant integer stepsMatthew Simpson2017-02-131-12/+20
| | | | | | | | | | | | | | This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967
* [SLP] Fix for PR31690: Allow using of extra values in horizontalAlexey Bataev2017-02-131-18/+130
| | | | | | | | | | | | | | | | | | | | | | | reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934
* NewGVN: Reverse order of congruence class elimination to maximize trivial ↵Daniel Berlin2017-02-121-2/+2
| | | | | | deadness llvm-svn: 294926
* NewGVN: Use shouldSwapOperands in one more placeDaniel Berlin2017-02-121-1/+1
| | | | llvm-svn: 294925
OpenPOWER on IntegriCloud