summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine][SSE4A] Remove broken INSERTQI range combining optimizationSimon Pilgrim2015-10-131-45/+4
| | | | | | | | As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index. The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization. llvm-svn: 250160
* [GlobalsAA] Turn GlobalsAA on again by defaultJames Molloy2015-10-131-1/+1
| | | | | | | | Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again. Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off. llvm-svn: 250157
* [IndVars] NFC Cleanup.Sanjoy Das2015-10-131-66/+62
| | | | | | | | - Rename methods according to the LLVM Coding Style - Merge adjacent anonymous namespace block - Use `auto` in two places llvm-svn: 250152
* Revert 250089 due to bot failure. It failed when building clang itself with PGO.Manman Ren2015-10-131-118/+5
| | | | llvm-svn: 250145
* TransformUtils: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-1325-254/+253
| | | | | | | | | | | Continuing the work from last week to remove implicit ilist iterator conversions. First related commit was probably r249767, with some more motivation in r249925. This edition gets LLVMTransformUtils compiling without the implicit conversions. No functional change intended. llvm-svn: 250142
* Update the branch weight metadata in JumpThreading pass.Cong Hou2015-10-121-5/+118
| | | | | | | | In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). Differential revision: http://reviews.llvm.org/D10979 llvm-svn: 250089
* GlobalOpt does not treat externally_initialized globals correctlyOliver Stannard2015-10-122-1/+5
| | | | | | | | GlobalOpt currently merges stores into the initialisers of internal, externally_initialized globals, but should not do so as the value of the global may change between the initialiser and any code in the module being run. llvm-svn: 250035
* [LoopVectorize] Shrink integer operations into the smallest type possibleJames Molloy2015-10-121-11/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed on them. For targets with native i8 or i16 operations, usually InstCombine can shrink the arithmetic type down again. However InstCombine refuses to create illegal types, so for targets without i8 or i16 registers, the lengthening and shrinking remains. Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when their scalar equivalents do not, so during vectorization it is important to remove these lengthens and truncates when deciding the profitability of vectorization. The algorithm this uses starts at truncs and icmps, trawling their use-def chains until they terminate or instructions outside the loop are found (or unsafe instructions like inttoptr casts are found). If the use-def chains starting from different root instructions (truncs/icmps) meet, they are unioned. The demanded bits of each node in the graph are ORed together to form an overall mask of the demanded bits in the entire graph. The minimum bitwidth that graph can be truncated to is the bitwidth minus the number of leading zeroes in the overall mask. The intention is that this algorithm should "first do no harm", so it will never insert extra cast instructions. This is why the use-def graphs are unioned, so that subgraphs with different minimum bitwidths do not need casts inserted between them. This algorithm works hard to reduce compile time impact. DemandedBits are only queried if there are extends of illegal types and if a truncate to an illegal type is seen. In the general case, this results in a simple linear scan of the instructions in the loop. No non-noise compile time impact was seen on a clang bootstrap build. llvm-svn: 250032
* [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IRSimon Pilgrim2015-10-111-0/+53
| | | | | | We now have lowering support for XOP PCOM/PCOMU instructions. llvm-svn: 249977
* [IndVars] Use `auto`; NFCSanjoy Das2015-10-101-6/+4
| | | | llvm-svn: 249944
* Generalize convergent check to handle invokes as well as calls.Owen Anderson2015-10-091-4/+4
| | | | llvm-svn: 249892
* Teach LoopUnswitch not to perform non-trivial unswitching on loops ↵Owen Anderson2015-10-091-0/+14
| | | | | | | | | | | containing convergent operations. Doing so could cause the post-unswitching convergent ops to be control-dependent on the unswitch condition where they were not before. This check could be refined to allow unswitching where the convergent operation was already control-dependent on the unswitch condition. llvm-svn: 249874
* Refine the definition of convergent to only disallow the addition of new ↵Owen Anderson2015-10-091-1/+2
| | | | | | | | | | control dependencies. This covers the common case of operations that cannot be sunk. Operations that cannot be hoisted should already be handled properly via the safe-to-speculate rules and mechanisms. llvm-svn: 249865
* Make HeaderLineno a local variable.Dehao Chen2015-10-091-12/+8
| | | | | | | | http://reviews.llvm.org/D13576 As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol. llvm-svn: 249848
* [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.Andrea Di Biagio2015-10-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass MemCpyOpt doesn't check if a store instruction is nontemporal. As a consequence, adjacent nontemporal stores are always merged into a memset call. Example: ;;; define void @foo(<4 x float>* nocapture %p) { entry: store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0 %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1 store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0 ret void } !0 = !{i32 1} ;;; In this example, the two nontemporal stores are combined to a memset of zero which does not preserve the nontemporal hint. Later on the backend (tested on a x86-64 corei7) expands that memset call into a sequence of two normal 16-byte aligned vector stores. opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o - Before: xorps %xmm0, %xmm0 movaps %xmm0, 16(%rdi) movaps %xmm0, (%rdi) With this patch, we no longer merge nontemporal stores into calls to memset. In this example, llc correctly expands the two stores into two movntps: xorps %xmm0, %xmm0 movntps %xmm0, 16(%rdi) movntps %xmm0, (%rdi) In theory, we could extend the usage of !nontemporal metadata to memcpy/memset calls. However a change like that would only have the effect of forcing the backend to expand !nontemporal memsets back to sequences of store instructions. A memset library call would not have exactly the same semantic of a builtin !nontemporal memset call. So, SelectionDAG will have to conservatively expand it back to a sequence of !nontemporal stores (effectively undoing the merging). Differential Revision: http://reviews.llvm.org/D13519 llvm-svn: 249820
* [EarlyCSE] Address post commit review for r249523.Arnaud A. de Grandmaison2015-10-091-10/+10
| | | | llvm-svn: 249814
* [RS4GC] Refactoring to make a later change easier, NFCISanjoy Das2015-10-081-19/+22
| | | | | | | | | | | | | | Summary: These non-semantic changes will help make a later change adding support for deopt operand bundles more streamlined. Reviewers: reames, swaroop.sridhar Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13491 llvm-svn: 249779
* [PlaceSafeopints] Extract out `callsGCLeafFunction`, NFCSanjoy Das2015-10-082-28/+18
| | | | | | | | | | | | | Summary: This will be used in a later change to RewriteStatepointsForGC. Reviewers: reames, swaroop.sridhar Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13490 llvm-svn: 249777
* [RS4GC] Don't copy ADT's unneccessarily, NFCISanjoy Das2015-10-081-3/+3
| | | | | | | | | | | | Summary: Use `const auto &` instead of `auto` in `makeStatepointExplicit`. Reviewers: reames, swaroop.sridhar Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13454 llvm-svn: 249776
* New MSan mapping layout (llvm part).Evgeniy Stepanov2015-10-081-7/+15
| | | | | | | | | | | | | | | | | | This is an implementation of https://github.com/google/sanitizers/issues/579 It has a number of advantages over the current mapping: * Works for non-PIE executables. * Does not require ASLR; as a consequence, debugging MSan programs in gdb no longer requires "set disable-randomization off". * Supports linux kernels >=4.1.2. * The code is marginally faster and smaller. This is an ABI break. We never really promised ABI stability, but this patch includes a courtesy escape hatch: a compile-time macro that reverts back to the old mapping layout. llvm-svn: 249753
* Add Triple::isAndroid().Evgeniy Stepanov2015-10-082-2/+2
| | | | | | | This is a simple refactoring that replaces Triple.getEnvironment() checks for Android with Triple.isAndroid(). llvm-svn: 249750
* [InstCombine] transform masking off of an FP sign bit into a fabs() ↵Sanjay Patel2015-10-081-4/+19
| | | | | | | | | | | | | | | | | | intrinsic call (PR24886) This is a partial fix for PR24886: https://llvm.org/bugs/show_bug.cgi?id=24886 Without this IR transform, the backend (x86 at least) was producing inefficient code. This patch is making 2 assumptions: 1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic. 2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef. Differential Revision: http://reviews.llvm.org/D13076 llvm-svn: 249702
* [RS4GC] Use AssertingVH for RematerializedValueMapTy, NFCISanjoy Das2015-10-071-1/+2
| | | | | | | | | | Reviewers: reames, swaroop.sridhar Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13489 llvm-svn: 249620
* [IndVars] Preserve LCSSA in `eliminateIdentitySCEV`Sanjoy Das2015-10-071-0/+3
| | | | | | | | | | | | | | | | | | | | Summary: After r249211, SCEV can see through some LCSSA phis. Add a `replacementPreservesLCSSAForm` check before replacing uses of these phi nodes with a simplified use of the induction variable to avoid breaking LCSSA. Fixes 25047. Depends on D13460. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13461 llvm-svn: 249575
* [EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads.Arnaud A. de Grandmaison2015-10-071-14/+23
| | | | | | | | | | | | | | | | | Summary: Some target intrinsics can access multiple elements, using the pointer as a base address (e.g. AArch64 ld4). When trying to CSE such instructions, it must be checked the available value comes from a compatible instruction because the pointer is not enough to discriminate whether the value is correct. Reviewers: ssijaric Subscribers: mcrosier, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D13475 llvm-svn: 249523
* [RS4GC] Remove an unnecessary assert & related variablesSanjoy Das2015-10-071-5/+0
| | | | | | | I don't think this assert adds much value, and removing it and related variables avoids an "unused variable" warning in release builds. llvm-svn: 249511
* [RS4GC] Cosmetic cleanup, NFCSanjoy Das2015-10-071-245/+211
| | | | | | | | | | | | | | | | | | | | Summary: A series of cosmetic cleanup changes to RewriteStatepointsForGC: - Rename variables to LLVM style - Remove some redundant asserts - Remove an unsued `Pass *` parameter - Remove unnecessary variables - Use C++11 idioms where applicable - Pass CallSite by value, not reference Reviewers: reames, swaroop.sridhar Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D13370 llvm-svn: 249508
* InstCombine: Fold comparisons between unguessable allocas and other pointersHans Wennborg2015-10-072-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow us to optimize code such as: int f(int *p) { int x; return p == &x; } as well as: int *allocate(void); int f() { int x; int *p = allocate(); return p == &x; } The folding can only be done under certain circumstances. Even though p and &x cannot alias, the comparison must still return true if the pointer representations are equal. If a user successfully generates a p that's a correct guess for &x, comparison should return true even though p is an invalid pointer. This patch argues that if the address of the alloca isn't observable outside the function, the function can act as-if the address is impossible to guess from the outside. The tricky part is keeping the act consistent: if we fold p == &x to false in one place, we must make sure to fold any other comparisons based on those pointers similarly. To ensure that, we only fold when &x is involved exactly once in comparison instructions. Differential Revision: http://reviews.llvm.org/D13358 llvm-svn: 249490
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-069-39/+25
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* [IndVars] Don't break dominance in `eliminateIdentitySCEV`Sanjoy Das2015-10-063-11/+35
| | | | | | | | | | | | | | | | | | | | | | | Summary: After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and Y are related in the dominator tree, even if X is an operand to Y (I've included a toy example in comments, and a real example as a test case). This commit changes `SimplifyIndVar` to require a `DominatorTree`. I don't think this is a problem because `ScalarEvolution` requires it anyway. Fixes PR25051. Depends on D13459. Reviewers: atrick, hfinkel Subscribers: joker.eph, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D13460 llvm-svn: 249471
* [IndVars] Extract out eliminateIdentitySCEV, NFCSanjoy Das2015-10-061-4/+14
| | | | | | | | | | | Summary: Reflow a comment while at it. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13459 llvm-svn: 249470
* [WinEH] Recognize CoreCLR personality functionJoseph Tremoulet2015-10-061-0/+1
| | | | | | | | | | | | | | | Summary: - Add CoreCLR to if/else ladders and switches as appropriate. - Rename isMSVCEHPersonality to isFuncletEHPersonality to better reflect what it captures. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: pgavlin, AndyAyers, llvm-commits Differential Revision: http://reviews.llvm.org/D13449 llvm-svn: 249455
* [EarlyCSE] Constify ParseMemoryInst methods (NFC).Arnaud A. de Grandmaison2015-10-061-9/+9
| | | | llvm-svn: 249400
* [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector ↵Andrea Di Biagio2015-10-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | select masks with ConstantExpr elements (PR24922) If the mask of a select instruction is a ConstantVector, method SimplifyDemandedVectorElts iterates over the mask elements to identify which values are selected from the select inputs. Before this patch, method SimplifyDemandedVectorElts always used method Constant::isNullValue() to check if a value in the mask was zero. Unfortunately that method always returns false when called on a ConstantExpr. This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we avoid calling isNullValue() on it. Fixes PR24922. Differential Revision: http://reviews.llvm.org/D13219 llvm-svn: 249390
* [msan] Correct a typo in poison stack pattern command line description.Evgeniy Stepanov2015-10-051-1/+1
| | | | | | Patch by Jon Eyolfson. llvm-svn: 249331
* MergeFunctions: Clear GlobalNumbers ValueMapArnold Schwaighofer2015-10-051-0/+4
| | | | | | | | | | | Otherwise, the map will observe changes as long as MergeFunctions is alive. This is bad because follow-up passes could replace-all-uses-with on the key of an entry in the map. The value handle callback of ValueMap however asserts that the key type matches. rdar://22971893 llvm-svn: 249327
* inariant.group handling in GVNPiotr Padlewski2015-10-026-33/+36
| | | | | | | | | | | | The most important part required to make clang devirtualization works ( ͡°͜ʖ ͡°). The code is able to find non local dependencies, but unfortunatelly because the caller can only handle local dependencies, I had to add some restrictions to look for dependencies only in the same BB. http://reviews.llvm.org/D12992 llvm-svn: 249196
* [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall ↵Bruno Cardoso Lopes2015-10-011-2/+6
| | | | | | | | | | | | | | | | | | optimization When trying to optimize fortified library functions use the right location to insert new instructions in order to preserve correct def-use order. This fixes an issue where a misplaced instruction definition would happen to be *after* one of its use after a RAUW, forming invalid IR. This behavior was introduced by r227250. Differential Revision: http://reviews.llvm.org/D13301 rdar://problem/22802369 llvm-svn: 249092
* [InstCombine] Remove trivially empty lifetime start/end ranges.Arnaud A. de Grandmaison2015-10-011-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char *, char *); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018
* [NaryReassociate] SeenExprs records WeakVHJingyue Wu2015-10-011-6/+12
| | | | | | | | | | | | | | | | Summary: The instructions SeenExprs records may be deleted during rewriting. FindClosestMatchingDominator should ignore these deleted instructions. Fixes PR24301. Reviewers: grosser Subscribers: grosser, llvm-commits Differential Revision: http://reviews.llvm.org/D13315 llvm-svn: 248983
* Update sample profile propagation algorithm.Dehao Chen2015-10-011-24/+14
| | | | | | http://reviews.llvm.org/D13218 llvm-svn: 248968
* [SLP] Don't vectorize loads of non-packed types (like i1, i2).Michael Zolotukhin2015-09-301-1/+18
| | | | | | | | | | | | | | | | | | Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943
* Fix debug info with SafeStack.Evgeniy Stepanov2015-09-302-4/+12
| | | | llvm-svn: 248933
* DeadCodeElimination: rewrite to be fasterFiona Glaser2015-09-301-31/+45
| | | | | | | | | | Same strategy as simplifyInstructionsInBlock. ~1/3 less time on my test suite. This pass doesn't have many in-tree users, but getting rid of an O(N^2) worst case and making it cleaner should at least make it a viable alternative to ADCE, since it's now consistently somewhat faster. llvm-svn: 248927
* SLPVectorizer: limit the scheduling region size per basic block.Erik Eckstein2015-09-301-9/+56
| | | | | | | | | | | Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917
* [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵Andrea Di Biagio2015-09-301-0/+41
| | | | | | | | | | | | | | | | shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
* http://reviews.llvm.org/D13145Dehao Chen2015-09-301-1/+125
| | | | | | Support hierarachical sample profile format. llvm-svn: 248865
* [safestack] Fix a stupid mix-up in the direct-tls code path.Evgeniy Stepanov2015-09-301-1/+1
| | | | llvm-svn: 248863
* http://reviews.llvm.org/D13231Dehao Chen2015-09-291-46/+54
| | | | | | Change lookup functions to const functions. llvm-svn: 248818
* Revert r248810 which breaks tests.Dehao Chen2015-09-291-3/+2
| | | | llvm-svn: 248814
OpenPOWER on IntegriCloud