summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* Decouple dllexport/dllimport from linkageNico Rieck2014-01-141-1/+1
| | | | | | | | | | | | | | | | | | | Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204
* Revert r199191, "LTO: add API to set strategy for -internalize"NAKAMURA Takumi2014-01-141-21/+16
| | | | | | Please update also Other/link-opts.ll, in next time. llvm-svn: 199197
* LTO: add API to set strategy for -internalizeDuncan P. N. Exon Smith2014-01-141-16/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile*(). <rdar://problem/14334895> llvm-svn: 199191
* [PM] Split DominatorTree into a concrete analysis result object whichChandler Carruth2014-01-1334-137/+161
| | | | | | | | | | | | | | | | | | | | | | | can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104
* [PM] Pull the generic graph algorithms and data structures for dominatorChandler Carruth2014-01-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | trees into the Support library. These are all expressed in terms of the generic GraphTraits and CFG, with no reliance on any concrete IR types. Putting them in support clarifies that and makes the fact that the static analyzer in Clang uses them much more sane. When moving the Dominators.h file into the IR library I claimed that this was the right home for it but not something I planned to work on. Oops. So why am I doing this? It happens to be one step toward breaking the requirement that IR verification can only be performed from inside of a pass context, which completely blocks the implementation of verification for the new pass manager infrastructure. Fixing it will also allow removing the concept of the "preverify" step (WTF???) and allow the verifier to cleanly flag functions which fail verification in a way that precludes even computing dominance information. Currently, that results in a fatal error even when you ask the verifier to not fatally error. It's awesome like that. The yak shaving will continue... llvm-svn: 199095
* [cleanup] Move the Dominators.h and Verifier.h headers into the IRChandler Carruth2014-01-1339-43/+43
| | | | | | | | | | | | | | | | | | directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082
* Re-sort #include lines again, prior to moving headers around.Chandler Carruth2014-01-131-3/+3
| | | | llvm-svn: 199080
* Switch-to-lookup tables: Don't require a result for the defaultHans Wennborg2014-01-121-12/+25
| | | | | | | | | | | | | | | | | | | | | | | case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025
* LoopVectorizer: Enable strided memory accesses versioning per defaultArnold Schwaighofer2014-01-111-1/+1
| | | | | | | | I saw no compile or execution time regressions on x86_64 -mavx -O3. radar://13075509 llvm-svn: 199015
* LoopVectorize.cpp: Appease MSC16.NAKAMURA Takumi2014-01-111-2/+4
| | | | | | | Excuse me, I hope msc16 builders would be fine till its end day. Introduce nullptr then. ;) llvm-svn: 199001
* Extend and simplify the sample profile input file.Diego Novillo2014-01-101-106/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973
* Propagation of profile samples through the CFG.Diego Novillo2014-01-101-67/+605
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. llvm-svn: 198972
* LoopVectorizer: Handle strided memory accesses by versioningArnold Schwaighofer2014-01-101-83/+408
| | | | | | | | | | | | | | | for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950
* Put the functionality for printing a value to a raw_ostream as anChandler Carruth2014-01-095-20/+15
| | | | | | | | | | | | operand into the Value interface just like the core print method is. That gives a more conistent organization to the IR printing interfaces -- they are all attached to the IR objects themselves. Also, update all the users. This removes the 'Writer.h' header which contained only a single function declaration. llvm-svn: 198836
* Fix a bug about generating undef operand when optimising shuffle vector and ↵Hao Liu2014-01-081-2/+3
| | | | | | insert element in instruction combine. llvm-svn: 198730
* Move the LLVM IR asm writer header files into the IR directory, as theyChandler Carruth2014-01-076-6/+6
| | | | | | | | | | | | | | | | | are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. llvm-svn: 198688
* Re-sort all of the includes with ./utils/sort_includes.py so thatChandler Carruth2014-01-0716-26/+24
| | | | | | | | | | subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685
* Reapply r198654 "indvars: sink truncates outside the loop."Andrew Trick2014-01-071-4/+23
| | | | | | | | | | | This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198678
* Revert "indvars: sink truncates outside the loop."Andrew Trick2014-01-071-23/+4
| | | | | | | | This reverts commit r198654. One of the bots reported a SciMark failure. llvm-svn: 198659
* indvars: sink truncates outside the loop.Andrew Trick2014-01-071-4/+23
| | | | | | | | This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198654
* 80 col. comment.Andrew Trick2014-01-071-2/+2
| | | | llvm-svn: 198653
* Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify ↵Andrew Trick2014-01-061-7/+14
| | | | | | | | | | hoists things." Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted. We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted. llvm-svn: 198631
* Add missed cleanup from r198456Alp Toker2014-01-041-4/+6
| | | | | | | All other uses of this macro in LLVM/clang have been moved to the function definition so follow suite (and the usage advice) here too for consistency. llvm-svn: 198516
* Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists ↵Alp Toker2014-01-041-17/+10
| | | | | | | | | | | | | | | | | things." This commit was the source of crasher PR18384: While deleting: label %for.cond127 An asserting value handle still pointed to this value! UNREACHABLE executed at llvm/lib/IR/Value.cpp:671! Reverting to get the builders green, feel free to re-land after fixing up. (Renato has a handy isolated repro if you need it.) This reverts commit r198478. llvm-svn: 198503
* Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things.Andrew Trick2014-01-041-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getSCEV for an ashr instruction creates an intermediate zext expression when it truncates its operand. The operand is initially inside the loop, so the narrow zext expression has a non-loop-invariant loop disposition. LoopSimplify then runs on an outer loop, hoists the ashr operand, and properly invalidate the SCEVs that are mapped to value. The SCEV expression for the ashr is now an AddRec with the hoisted value as the now loop-invariant start value. The LoopDisposition of this wide value was properly invalidated during LoopSimplify. However, if we later get the ashr SCEV again, we again try to create the intermediate zext expression. We get the same SCEV that we did earlier, and it is still cached because it was never mapped to a Value. When we try to create a new AddRec we abort because we're using the old non-loop-invariant LoopDisposition. I don't have a solution for this other than to clear LoopDisposition when LoopSimplify hoists things. I think the long-term strategy should be to perform LoopSimplify on all loops before computing SCEV and before running any loop opts on individual loops. It's possible we may want to rerun LoopSimplify on individual loops, but it should rarely do anything, so rarely require invalidating SCEV. llvm-svn: 198478
* Add a LLVM_DUMP_METHOD macro.Nico Weber2014-01-031-2/+2
| | | | | | | | | | | | | | The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456
* Fix loop rerolling pass failure with non-consant loop lower boundDavid Peixotto2014-01-031-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int *A, int *B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 llvm-svn: 198425
* Disable compare sinking in CodeGenPrepare when multiple condition registers ↵Hal Finkel2014-01-021-1/+2
| | | | | | | | | | | | | | | | | | | are available As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively sinks compares to reduce pressure on the condition register(s), for targets such as PowerPC with multiple condition registers, this may not be the right thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is true. This functionality will be used by the PowerPC backend in an upcoming commit. Especially when the PowerPC backend starts tracking individual condition register bits as separate allocatable entities (which will happen in this upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is significantly suboptimial. llvm-svn: 198354
* indvars: cleanup the IV visitor. It does more than gather sext/zext info.Andrew Trick2014-01-021-25/+33
| | | | llvm-svn: 198353
* Delete unread globals through addrspacecastMatt Arsenault2014-01-021-2/+3
| | | | llvm-svn: 198346
* Fix addrspacecast with metadata globalsMatt Arsenault2014-01-021-3/+5
| | | | llvm-svn: 198345
* indvars: insert truncate at loop boundary to avoid redundant IVs.Andrew Trick2014-01-021-5/+12
| | | | | | | | | When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. llvm-svn: 198338
* Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so.Nico Weber2013-12-291-0/+8
| | | | | | | | (unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since it handles export files in a strange way: It generates a .exports file from a .def file instead of the other way round.) llvm-svn: 198183
* [ASan] Fix the test for __asan_gen_ globals and actually fix ↵Alexander Potapenko2013-12-251-2/+2
| | | | | | | | http://llvm.org/bugs/show_bug.cgi?id=17976 by setting the correct linkage (as stated in the bug). llvm-svn: 198018
* [ASan] Make sure none of the __asan_gen_ global strings end up in the symbol ↵Alexander Potapenko2013-12-251-10/+21
| | | | | | | | | table, add a test. This should fix http://llvm.org/bugs/show_bug.cgi?id=17976 Another test checking for the global variables' locations and prefixes on Darwin will be committed separately. llvm-svn: 198017
* Add support to indvars for optimizing sadd.with.overflow.Andrew Trick2013-12-232-4/+92
| | | | | | | | | | | | | | | | Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. llvm-svn: 197926
* Fix Scalarizer insertion point when replacing PHIs with insertelementsRichard Sandiford2013-12-231-1/+4
| | | | | | | | | | If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. llvm-svn: 197909
* Fix Scalarizer handling of vector GEPs with multiple index operandsRichard Sandiford2013-12-231-11/+32
| | | | | | The old code only worked for one index operand. Also handle "inbounds". llvm-svn: 197908
* [asan] don't unpoison redzones on function exit in use-after-return mode.Kostya Serebryany2013-12-232-15/+57
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Before this change the instrumented code before Ret instructions looked like: <Unpoison Frame Redzones> if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> Now the instrumented code looks like: if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> else <Unpoison Frame Redzones> Reviewers: eugenis Reviewed By: eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2458 llvm-svn: 197907
* [asan] produce fewer stores when poisoning stack shadowKostya Serebryany2013-12-231-20/+19
| | | | llvm-svn: 197904
* Transforms: Don't create bad weights when eliminating dead casesJustin Bogner2013-12-201-1/+1
| | | | | | | | | If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791
* Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose ↵Kay Tiong Khoo2013-12-191-4/+2
| | | | | | links may not be permanent. llvm-svn: 197713
* Improved fix for PR17827 (instcombine of shift/and/compare).Kay Tiong Khoo2013-12-191-22/+32
| | | | | | | | | This change fixes the case of arithmetic shift right - do not attempt to fold that case. This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases. No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations. llvm-svn: 197705
* [dfsan] Simplify code after r197677.Evgeniy Stepanov2013-12-191-19/+14
| | | | llvm-svn: 197679
* Add an explicit insert point argument to SplitBlockAndInsertIfThen.Evgeniy Stepanov2013-12-194-30/+25
| | | | | | | | Currently SplitBlockAndInsertIfThen requires that branch condition is an Instruction itself, which is very inconvenient, because it is sometimes an Operator, or even a Constant. llvm-svn: 197677
* LoopVectorizer: Don't if-convert constant expressions that can trapArnold Schwaighofer2013-12-171-1/+31
| | | | | | | | | | A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449
* Enable double to float shrinking optimizations for binary functions like ↵Yi Jiang2013-12-162-9/+86
| | | | | | 'fmin/fmax'. Fix radar:15283121 llvm-svn: 197434
* Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsersHal Finkel2013-12-121-2/+11
| | | | | | | | | | | | GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage constant users to be visited. The pointers in this array need to be weak handles because when we delete a constant array, we may also be holding a pointer to one of its elements (or an element of one of its elements if we're dealing with an array of arrays) in the worklist. Fixes PR17347. llvm-svn: 197178
* Initialize the barrier pass llvm::initializeIPOHal Finkel2013-12-121-0/+1
| | | | | | | | | The barrier pass is a temporary hack, and should go away soon. Nevertheless, if we don't initialize it, then opt will not understand -barrier, and this will break bugpoint (because when it dumps the passes from the default pass manager -barrier will be there). llvm-svn: 197177
* Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, ↵Yi Jiang2013-12-121-0/+6
| | | | | | x) ―> __exp10(x) llvm-svn: 197109
OpenPOWER on IntegriCloud