summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/IPO
Commit message (Collapse)AuthorAgeFilesLines
...
* Add some missed formattingDavid Blaikie2015-03-141-4/+7
| | | | llvm-svn: 232281
* [opaque pointer type] gep API migration, ArgPromoDavid Blaikie2015-03-141-16/+19
| | | | | | | | | This involved threading the type-to-gep through a data structure, since the code was relying on the pointer type to carry this information. I imagine there will be a lot of this work across the project... slow work chasing each use case, but the assertions will help keep me honest. llvm-svn: 232277
* [opaque pointer type] more gep API migrationDavid Blaikie2015-03-141-4/+3
| | | | llvm-svn: 232274
* [opaque pointer type] more gep API migrationsDavid Blaikie2015-03-141-1/+1
| | | | | | | | | | Adding nullptr to all the IRBuilder stuff because it's the first thing that fails to build when testing without the back-compat functions, so I'll keep having to re-add these locally for each chunk of migration I do. Might as well check them in to save me the churn. Eventually I'll have to migrate these too, but I'm going breadth-first. llvm-svn: 232270
* [opaque pointer type] Start migrating GEP creation to explicitly specify the ↵David Blaikie2015-03-141-5/+5
| | | | | | | | | | | | | | | pointee type I'm just going to migrate these in a pretty ad-hoc & incremental way - providing the backwards compatible API for now, then locally removing it, fixing a few callers, adding it back in and commiting those callers. Rinse, repeat. The assertions should ensure that if I get this wrong we'll find out about it and not just have one giant patch to revert, recommit, revert, recommit, etc. llvm-svn: 232240
* LowerBitSets: Do not export symbols for bit set referenced globals on Darwin.Peter Collingbourne2015-03-141-1/+9
| | | | | | | The linker on that platform may re-order symbols or strip dead symbols, which will break bit set checks. Avoid this by hiding the symbols from the linker. llvm-svn: 232235
* Reapply 'Run LICM pass after loop unrolling pass.'Kevin Qin2015-03-121-1/+12
| | | | | | | | | It's firstly committed at r231630, and reverted at r231635. Function pass InstructionSimplifier is inserted as barrier to make sure loop unroll pass won't affect on LICM pass. llvm-svn: 232011
* Enable loop-rotate before loop-vectorize by defaultMichael Zolotukhin2015-03-101-2/+1
| | | | llvm-svn: 231820
* remove function names from comments; NFCSanjay Patel2015-03-101-11/+9
| | | | llvm-svn: 231801
* DataLayout is mandatory, update the API to reflect it with references.Mehdi Amini2015-03-105-85/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
* Revert r231630 - Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-7/+1
| | | | | | As it broke llvm bootstrap. llvm-svn: 231635
* Run LICM pass after loop unrolling pass.Kevin Qin2015-03-091-1/+7
| | | | | | | | | Runtime unrollng will introduce a runtime check in loop prologue. If the unrolled loop is a inner loop, then the proglogue will be inside the outer loop. LICM pass can help to promote the runtime check out if the checked value is loop invariant. llvm-svn: 231630
* Add a new pass "Loop Interchange"Karthik Bhat2015-03-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass interchanges loops to provide a more cache-friendly memory access. For e.g. given a loop like - for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[j][i] = A[j][i]+B[j][i]; is interchanged to - for(int j=0;j<N;j++) for(int i=0;i<N;i++) A[j][i] = A[j][i]+B[j][i]; This pass is currently disabled by default. To give a brief introduction it consists of 3 stages- LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix. LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time. LoopInterchangeTransform : Which does the actual transform. LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks. TODO: 1) Add support for reductions and lcssa phi. 2) Improve profitability model. 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange. 4) Improve compile time regression found in llvm lnt due to this pass. 5) Fix issues in Dependency Analysis module. A special thanks to Hal for reviewing this code. Review: http://reviews.llvm.org/D7499 llvm-svn: 231458
* Make DataLayout Non-Optional in the ModuleMehdi Amini2015-03-046-82/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270
* LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit ↵Peter Collingbourne2015-03-031-64/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sets. By loading from indexed offsets into a byte array and applying a mask, a program can test bits from the bit set with a relatively short instruction sequence. For example, suppose we have 15 bit sets to lay out: A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits), F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits), L (4 bits), M (3 bits), N (2 bits), O (1 bit) These bits can be laid out in a 16-byte array like this: Byte Offset 0123456789ABCDEF Bit 7 HHHHHHHHHIIIIIII 6 GGGGGGGGGGJJJJJJ 5 FFFFFFFFFFFKKKKK 4 EEEEEEEEEEEELLLL 3 DDDDDDDDDDDDDMMM 2 CCCCCCCCCCCCCCNN 1 BBBBBBBBBBBBBBBO 0 AAAAAAAAAAAAAAAA For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done in 1-2 machine instructions on x86, or 4-6 instructions on ARM. This uses the LPT multiprocessor scheduling algorithm to lay out the bits efficiently. Saves ~450KB of instructions in a recent build of Chromium. Differential Revision: http://reviews.llvm.org/D7954 llvm-svn: 231043
* Convert push_back loops into append calls.Benjamin Kramer2015-02-281-2/+1
| | | | | | No functionality change intended. llvm-svn: 230849
* LowerBitSets: Align referenced globals.Peter Collingbourne2015-02-251-22/+40
| | | | | | | | | | | | | | | | This change aligns globals to the next highest power of 2 bytes, up to a maximum of 128. This makes it more likely that we will be able to compress bit sets with a greater alignment. In many more cases, we can now take advantage of a new optimization also introduced in this patch that removes bit set checks if the bit set is all ones. The 128 byte maximum was found to provide the best tradeoff between instruction overhead and data overhead in a recent build of Chromium. It allows us to remove ~2.4MB of instructions at the cost of ~250KB of data. Differential Revision: http://reviews.llvm.org/D7873 llvm-svn: 230540
* LowerBitSets: Introduce global layout builder.Peter Collingbourne2015-02-241-10/+78
| | | | | | | | | | The builder is based on a layout algorithm that tries to keep members of small bit sets together. The new layout compresses Chromium's bit sets to around 15% of their original size. Differential Revision: http://reviews.llvm.org/D7796 llvm-svn: 230394
* LowerBitSets.cpp: Prune incorrect \param(s). [-Wdocumentation]NAKAMURA Takumi2015-02-221-6/+6
| | | | | | \param should be used as itemized. llvm-svn: 230167
* Introduce bitset metadata format and bitset lowering pass.Peter Collingbourne2015-02-204-0/+531
| | | | | | | | | | | | | | | | | | | | This patch introduces a new mechanism that allows IR modules to co-operatively build pointer sets corresponding to addresses within a given set of globals. One particular use case for this is to allow a C++ program to efficiently verify (at each call site) that a vtable pointer is in the set of valid vtable pointers for the class or its derived classes. One way of doing this is for a toolchain component to build, for each class, a bit set that maps to the memory region allocated for the vtables, such that each 1 bit in the bit set maps to a valid vtable for that class, and lay out the vtables next to each other, to minimize the total size of the bit sets. The patch introduces a metadata format for representing pointer sets, an '@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals and builds the bitsets, and documents the new feature. Differential Revision: http://reviews.llvm.org/D7288 llvm-svn: 230054
* [BDCE] Add a bit-tracking DCE passHal Finkel2015-02-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x |= (4 & foo(5)); x |= (8 & foo(3)); x |= (16 & foo(2)); x |= (32 & foo(1)); x |= (64 & foo(0)); x |= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462
* Run LICM as part of the cleanup phase from the scalar optimizer.James Molloy2015-02-161-0/+1
| | | | | | | Things like LoopUnrolling can produce loop invariant values - make sure we pick them up. llvm-svn: 229419
* Transforms: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-26/+12
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202
* [PM] Remove the old 'PassManager.h' header file at the top level ofChandler Carruth2015-02-132-12/+14
| | | | | | | | | | | | | | | | | | | | LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094
* DeadArgElim: aggregate Return assessment properly.Tim Northover2015-02-111-4/+7
| | | | | | | | | I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It actually depends on the index too, which means we need to be careful about how the results are combined before return. In particular if a single Use returns Live, that counts for the entire object, at the granularity we're considering. llvm-svn: 228885
* Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects.Zachary Turner2015-02-111-0/+4
| | | | | | | | | | This allows IDEs to recognize the entire set of header files for each of the core LLVM projects. Differential Revision: http://reviews.llvm.org/D7526 Reviewed By: Chris Bieneman llvm-svn: 228798
* Don't promote asynch EH invokes of nounwind functions to callsReid Kleckner2015-02-111-1/+3
| | | | | | | | | | | If the landingpad of the invoke is using a personality function that catches asynch exceptions, then it can catch a trap. Also add some landingpads to invalid LLVM IR test cases that lack them. Over-the-shoulder reviewed by David Majnemer. llvm-svn: 228782
* DeadArgElim: arguments affect all returned sub-values by default.Tim Northover2015-02-101-4/+16
| | | | | | | | | | Unless we meet an insertvalue on a path from some value to a return, that value will be live if *any* of the return's components are live, so all of those components must be added to the MaybeLiveUses. Previously we were deleting arguments if sub-value 0 turned out to be dead. llvm-svn: 228731
* DeadArgElim: fix mismatch in accounting of array return types.Tim Northover2015-02-091-39/+47
| | | | | | | | | | | | Some parts of DeadArgElim were only considering the individual fields of StructTypes separately, but others (where insertvalue & extractvalue instructions occur) also looked into ArrayTypes. This one is an actual bug; the mismatch can lead to an argument being considered used by a return sub-value that isn't being tracked (and hence is dead by default). It then gets incorrectly eliminated. llvm-svn: 228559
* DeadArgElim: assess uses of entire return value aggregate.Tim Northover2015-02-091-26/+26
| | | | | | | | | | | | | | | | | | | | | | | Previously, a non-extractvalue use of an aggregate return value meant the entire return was considered live (the algorithm gave up entirely). This was correct, but conservative. It's better to actually look at that Use, making the analysis results apply to all sub-values under consideration. E.g. %val = call { i32, i32 } @whatever() [...] ret { i32, i32 } %val The return is using the entire aggregate (sub-values 0 and 1). We can still simplify @whatever if we can prove that this return is itself unused. Also unifies the logic slightly between aggregate and non-aggregate cases.. llvm-svn: 228558
* Add range adapters predecessors() and successors() for BBsReid Kleckner2015-02-042-7/+6
| | | | | | | Use them in two isolated transforms so we know they work and aren't dead code. llvm-svn: 228173
* [PM] Sink the population of the pass manager with target-specificChandler Carruth2015-01-301-7/+1
| | | | | | | | | | | | analyses back into the LTO code generator. The pass manager builder (and the transforms library in general) shouldn't be referencing the target machine at all. This makes the LTO population work like the others -- the data layout and target transform info need to be pre-populated. llvm-svn: 227576
* Remove unused include.Eric Christopher2015-01-271-1/+0
| | | | llvm-svn: 227170
* [PM] Lift the analyses into the interface forChandler Carruth2015-01-191-1/+1
| | | | | | | | | | SplitLandingPadPredecessors and remove the Pass argument from its interface. Another step to the utilities being usable with both old and new pass managers. llvm-svn: 226426
* [PM] Separate the TargetLibraryInfo object from the immutable pass.Chandler Carruth2015-01-154-10/+13
| | | | | | | | | | | | | | The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157
* Update libdeps since TLI was moved from Target to Analysis in r226078.NAKAMURA Takumi2015-01-151-1/+1
| | | | llvm-svn: 226126
* [PM] Move TargetLibraryInfo into the Analysis library.Chandler Carruth2015-01-154-4/+4
| | | | | | | | | | | | | | | | While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078
* Standardize {pred,succ,use,user}_empty()Ramkumar Ramachandra2015-01-131-2/+2
| | | | | | | | | The functions {pred,succ,use,user}_{begin,end} exist, but many users have to check *_begin() with *_end() by hand to determine if the BasicBlock or User is empty. Fix this with a standard *_empty(), demonstrating a few usecases. llvm-svn: 225760
* [PM] Split the AssumptionTracker immutable pass into two separate APIs:Chandler Carruth2015-01-043-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the *many* utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131
* Sink store based on alias analysisElena Demikhovsky2014-12-151-1/+2
| | | | | | | | | | | | - by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247
* IR: Split Metadata from ValueDuncan P. N. Exon Smith2014-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do *not* have a `Type`. - `MDNode`'s operands are all `Metadata *` (instead of `Value *`). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode*`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode *N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode *N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode *N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the *only* class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802
* Prologue supportPeter Collingbourne2014-12-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch by Ben Gamari! This redefines the `prefix` attribute introduced previously and introduces a `prologue` attribute. There are a two primary usecases that these attributes aim to serve, 1. Function prologue sigils 2. Function hot-patching: Enable the user to insert `nop` operations at the beginning of the function which can later be safely replaced with a call to some instrumentation facility 3. Runtime metadata: Allow a compiler to insert data for use by the runtime during execution. GHC is one example of a compiler that needs this functionality for its tables-next-to-code functionality. Previously `prefix` served cases (1) and (2) quite well by allowing the user to introduce arbitrary data at the entrypoint but before the function body. Case (3), however, was poorly handled by this approach as it required that prefix data was valid executable code. Here we redefine the notion of prefix data to instead be data which occurs immediately before the function entrypoint (i.e. the symbol address). Since prefix data now occurs before the function entrypoint, there is no need for the data to be valid code. The previous notion of prefix data now goes under the name "prologue data" to emphasize its duality with the function epilogue. The intention here is to handle cases (1) and (2) with prologue data and case (3) with prefix data. References ---------- This idea arose out of discussions[1] with Reid Kleckner in response to a proposal to introduce the notion of symbol offsets to enable handling of case (3). [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html Test Plan: testsuite Differential Revision: http://reviews.llvm.org/D6454 llvm-svn: 223189
* Disable header duplication at -Oz in loop-rotate pass.Roman Divacky2014-11-211-1/+2
| | | | llvm-svn: 222562
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-196-16/+19
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* Reformat partially, where I touched for whitespace changes.NAKAMURA Takumi2014-10-281-1/+1
| | | | llvm-svn: 220773
* Untabify and whitespace cleanups.NAKAMURA Takumi2014-10-281-5/+5
| | | | llvm-svn: 220771
* Add an option to the LTO code generator to disable vectorization during LTOArnold Schwaighofer2014-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline. r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and 'SLPVectorize' out of the desire to be able to disable vectorization using the cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned fields default to. Unfortunately, this turns off vectorization because those fields default to false. This commit adds flags to the LTO library to disable lto vectorization which reconciles the desire to optionally disable vectorization during LTO and the desired behavior of defaulting to enabled vectorization. We really want tools to set PassManager flags directly to enable/disable vectorization and not go the route via cl::opt flags *in* PassManagerBuilder.cpp. llvm-svn: 220652
* If requested, apply function merging at -O0 too. It's useful there to reduce ↵Nick Lewycky2014-10-231-6/+10
| | | | | | the time to compile. llvm-svn: 220537
* LTO: respect command-line options that disable vectorization.JF Bastien2014-10-211-2/+4
| | | | | | | | | | | | Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags. Reviewers: nadav, aschwaighofer, yijiang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5884 llvm-svn: 220345
* Add some optional passes around the vectorizer to both better prepareChandler Carruth2014-10-141-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the IR going into it and to clean up the IR produced by the vectorizers. Note that these are *off by default* right now while folks collect data on whether the performance tradeoff is reasonable. In a build of the 'opt' binary, I see about 2% compile time regression due to this change on average. This is in my mind essentially the worst expected case: very little of the opt binary is going to *benefit* from these extra passes. I've seen several benchmarks improve in performance my small amounts due to running these passes, and there are certain (rare) cases where these passes make a huge difference by either enabling the vectorizer at all or by hoisting runtime checks out of the outer loop. My primary motivation is to prevent people from seeing runtime check overhead in benchmarks where the existing passes and optimizers would be able to eliminate that. I've chosen the sequence of passes based on the kinds of things that seem likely to be relevant for the code at each stage: rotaing loops for the vectorizer, finding correlated values, loop invariants, and unswitching opportunities from any runtime checks, and cleaning up commonalities exposed by the SLP vectorizer. I'll be pinging existing threads where some of these issues have come up and will start new threads to get folks to benchmark and collect data on whether this is the right tradeoff or we should do something else. llvm-svn: 219644
OpenPOWER on IntegriCloud