summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* DwarfDebug: Minor refactoring around type unit constructionDavid Blaikie2014-04-262-16/+21
| | | | | | | | | | | | | Sinking addition of the declaration attribute down to where the signature is added. So that if the signature is not added neither is the declaration attribute (this will come in handy when aborting type unit construction to instead emit the type into the CU directly in some cases) Pull out type unit identifier hashing just to simplify the function a little, it'll be getting longer. llvm-svn: 207321
* X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably ↵Benjamin Kramer2014-04-261-0/+19
| | | | | | | | cheap now. Turn vectorization back on. llvm-svn: 207320
* X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available.Benjamin Kramer2014-04-264-14/+56
| | | | llvm-svn: 207318
* X86: Add patterns for MULHU/MULHS of v8i16 and v16i16.Benjamin Kramer2014-04-262-4/+18
| | | | | | | This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317
* Rip out X86-specific vector SDIV lowering, make the corresponding ↵Benjamin Kramer2014-04-262-77/+24
| | | | | | DAGCombiner transform work on vectors. llvm-svn: 207316
* DAGCombiner: Turn divs of vector splats into vectorized multiplications.Benjamin Kramer2014-04-264-24/+64
| | | | | | | | | | | | Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315
* X86: Custom lower v4i32 UMUL_LOHI into 2 pmuludqs.Benjamin Kramer2014-04-261-0/+37
| | | | | | Test will follow soon. llvm-svn: 207314
* Revert r206749 till a final decision about the intrinsics is made.Michael Zolotukhin2014-04-263-239/+0
| | | | llvm-svn: 207313
* [LCG] Rather than removing nodes from the SCC entry set when we processChandler Carruth2014-04-261-6/+7
| | | | | | | | | | them, just skip over any DFS-numbered nodes when finding the next root of a DFS. This allows the entry set to just be a vector as we populate it from a uniqued source. It also removes the possibility for a linear scan of the entry set to actually do the removal which can make things go quadratic if we get unlucky. llvm-svn: 207312
* [LCG] Rotate the full SCC finding algorithm to avoid round-trips throughChandler Carruth2014-04-261-21/+23
| | | | | | | | | | | | | | | the DFS stack for leaves in the call graph. As mentioned in my previous commit, this is particularly interesting for graphs which have high fan out but low connectivity resulting in many leaves. For such graphs, this can remove a large % of the DFS stack traffic even though it doesn't make the stack much smaller. It's a bit easier to formulate this for the full algorithm because that one stops completely for each SCC. For example, I was able to directly eliminate the "Recurse" boolean used to continue an outer loop from the inner loop. llvm-svn: 207311
* [LCG] Hoist the main DFS loop out of the edge removal function. ThisChandler Carruth2014-04-261-74/+70
| | | | | | | | | makes working through the worklist much cleaner, and makes it possible to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge difference, but I think this is approaching as polished as I can make it. llvm-svn: 207310
* RecursivelyDeleteTriviallyDeadInstructions() could removeGerolf Hoflehner2014-04-262-2/+18
| | | | | | | | | | | more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 Repaired r207302. llvm-svn: 207309
* Restore CloneFunction.cpp which got accidentlyGerolf Hoflehner2014-04-261-92/+33
| | | | | | overwritten by previous backout of r207303 llvm-svn: 207308
* [LCG] In the incremental SCC re-formation, lift the node currently beingChandler Carruth2014-04-261-30/+38
| | | | | | | | | | | | | | | | | | | processed in the DFS out of the stack completely. Keep it exclusively in a variable. Re-shuffle some code structure to make this easier. This can have a very dramatic effect in some cases because call graphs tend to look like a high fan-out spanning tree. As a consequence, there are a large number of leaf nodes in the graph, and this technique causes leaf nodes to never even go into the stack. While this only reduces the max depth by 1, it may cause the total number of round trips through the stack to drop by a lot. Now, most of this isn't really relevant for the incremental version. =] But I wanted to prototype it first here as this variant is in ways more complex. As long as I can get the code factored well here, I'll next make the primary walk look the same. There are several refactorings this exposes I think. llvm-svn: 207306
* [LCG] Special case the removal of self edges. These don't impact the SCCChandler Carruth2014-04-261-0/+6
| | | | | | | | graph in any way because we don't track edges in the SCC graph, just nodes. This also lets us add a nice assert about the invariant that we're working on at least a certain number of nodes within the SCC. llvm-svn: 207305
* [DAG] During DAG legalization keep opaque constants even after expanding.Juergen Ributzka2014-04-261-3/+8
| | | | | | | | | | | | | | | | | | | | | The included test case would return the incorrect results, because the expansion of an shift with a constant shift amount of 0 would generate undefined behavior. This is because ExpandShiftByConstant assumes that all shifts by constants with a value of 0 have already been optimized away. This doesn't happen for opaque constants and usually this isn't a problem, because opaque constants won't take this code path - they are not supposed to. In the case that the opaque constant has to be expanded by the legalizer, the legalizer would drop the opaque flag. In this case we hit the limitations of ExpandShiftByConstant and create incorrect code. This commit fixes the legalizer by not dropping the opaque flag when expanding opaque constants and adding an assertion to ExpandShiftByConstant to catch this not supported case in the future. This fixes <rdar://problem/16718472> llvm-svn: 207304
* Revert commit r207302 since build failuresGerolf Hoflehner2014-04-263-51/+94
| | | | | | have been reported. llvm-svn: 207303
* RecursivelyDeleteTriviallyDeadInstructions() could removeGerolf Hoflehner2014-04-262-2/+18
| | | | | | | | | more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 llvm-svn: 207302
* [X86] Implement TargetLowering::getScalingFactorCost hook.Quentin Colombet2014-04-262-0/+19
| | | | | | | | | Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301
* [LCG] Refactor the duplicated code I added in my last commit here intoChandler Carruth2014-04-261-23/+14
| | | | | | | a helper function. Also factor the other two places where we did the same thing into the helper function. =] Much cleaner this way. NFC. llvm-svn: 207300
* [InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shiftAndrea Di Biagio2014-04-261-9/+41
| | | | | | | | | | right intrinsics. A packed logical shift right with a shift count bigger than or equal to the element size always produces a zero vector. In all other cases, it can be safely replaced by a 'lshr' instruction. llvm-svn: 207299
* Add missing include guards and missing #include, found by modules build.Richard Smith2014-04-261-0/+6
| | | | llvm-svn: 207298
* Optimization for certain shufflevector by using insertps.Filipe Cabecinhas2014-04-251-0/+104
| | | | | | | | | | | | | | | | | | | | Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291
* Revert "blockfreq: Approximate irreducible control flow"Duncan P. N. Exon Smith2014-04-251-210/+20
| | | | | | | | | | | | This reverts commit r207286. It causes an ICE on the cmake-llvm-x86_64-linux buildbot [1]: llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function: llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035 [1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio llvm-svn: 207287
* blockfreq: Approximate irreducible control flowDuncan P. N. Exon Smith2014-04-251-20/+210
| | | | | | | | | | | | | | | | | | | | | | Previously, irreducible backedges were ignored. With this commit, irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers. This approximation specifies the headers of irreducible sub-SCCs as its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if we insert a new block that intercepts all the edges to the headers. All backedges and entries to the irreducible SCC point to this imaginary block. This imaginary block has an edge (with even probability) to each header block. The result is now reasonable enough that I've added a number of testcases for irreducible control flow. I've outlined in `BlockFrequencyInfoImpl.h` ways to improve the approximation. <rdar://problem/14292693> llvm-svn: 207286
* Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays.Adrian Prantl2014-04-251-1/+7
| | | | llvm-svn: 207284
* Make sure that rangelists are also relative to the compile unitEric Christopher2014-04-251-2/+9
| | | | | | | | low_pc similar to location lists. Fixes PR19563 llvm-svn: 207283
* R600: Fix function name printing in LowerCallMatt Arsenault2014-04-251-1/+3
| | | | | | | | v2: Check both ExternalSymbol and GlobalAddress Patch by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 207282
* DwarfAccelTable: Store the string symbol in the accelerator table to avoid ↵David Blaikie2014-04-253-35/+39
| | | | | | | | | | duplicate lookup. This also avoids the need for subtly side-effecting calls to manifest strings in the string table at the point where items are added to the accelerator tables. llvm-svn: 207281
* Add an -mattr option to the gold plugin to support subtarget features in LTOTom Roeder2014-04-251-2/+3
| | | | | | | | | | This adds support for an -mattr option to the gold plugin and to llvm-lto. This allows the caller to specify details of the subtarget architecture, like +aes, or +ssse3 on x86. Note that this requires a change to the include/llvm-c/lto.h interface: it adds a function lto_codegen_set_attr and it increments the version of the interface. llvm-svn: 207279
* Fix missing includeAlexey Samsonov2014-04-251-0/+1
| | | | llvm-svn: 207278
* Encapsulate the DWARF string pool in a separate type.David Blaikie2014-04-258-92/+146
| | | | | | | | | Pulls out some more code from some of the rather monolithic DWARF classes. Unlike the address table, the string table won't move up into DwarfDebug - each DWARF file has its own string table (but there can be only one address table). llvm-svn: 207277
* [DWARF parser] Cleanup code in DWARFDebugAranges.Alexey Samsonov2014-04-253-11/+13
| | | | | | No functionality change. llvm-svn: 207276
* [DWARF parser] Cleanup code in DWARFDebugAbbrev.Alexey Samsonov2014-04-253-73/+73
| | | | | | No functionality change. llvm-svn: 207274
* [LoopStrengthReduce] Don't trim formula that uses a subset of required registersAdam Nemet2014-04-251-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider this use from the new testcase: LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 reg({1000,+,-1}<nw><%for.body>) -3003 + reg({3,+,3}<nw><%for.body>) -1001 + reg({1,+,1}<nuw><nsw><%for.body>) -1000 + reg({0,+,1}<nw><%for.body>) -3000 + reg({0,+,3}<nuw><%for.body>) reg({-1000,+,1}<nw><%for.body>) reg({-3000,+,3}<nsw><%for.body>) This is the last use we consider for a solution in SolveRecurse, so CurRegs is a large set. (CurRegs is the set of registers that are needed by the previously visited uses in the in-progress solution.) ReqRegs is { {3,+,3}<nw><%for.body>, {1,+,1}<nuw><nsw><%for.body> } This is the intersection of the regs used by any of the formulas for the current use and CurRegs. Now, the code requires a formula to contain *all* these regs (the comment is simply wrong), otherwise the formula is immediately disqualified. Obviously, no formula for this use contains two regs so they will all get disqualified. The fix modifies the check to allow the formula in this case. The idea is that neither of these formulae is introducing any new registers which is the point of this early pruning as far as I understand. In terms of set arithmetic, we now allow formulas whose used regs are a subset of the required regs not just the other way around. There are few more loops in the test-suite that are now successfully LSRed. I have benchmarked those and found very minimal change. Fixes <rdar://problem/13965777> llvm-svn: 207271
* This reapplies r207235 with an additional bugfixes caught by the msanAdrian Prantl2014-04-258-45/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buildbot - do not insert debug intrinsics before phi nodes. Debug info for optimized code: Support variables that are on the stack and described by DBG_VALUEs during their lifetime. Previously, when a variable was at a FrameIndex for any part of its lifetime, this would shadow all other DBG_VALUEs and only a single fbreg location would be emitted, which in fact is only valid for a small range and not the entire lexical scope of the variable. The included dbg-value-const-byref testcase demonstrates this. This patch fixes this by Local - emitting dbg.value intrinsics for allocas that are passed by reference - dropping all dbg.declares (they are now fully lowered to dbg.values) SelectionDAG - renamed constructors for SDDbgValue for better readability. - fix UserValue::match() to handle indirect values correctly - not inserting an MMI table entries for dbg.values that describe allocas. - lowering dbg.values that describe allocas into *indirect* DBG_VALUEs. CodeGenPrepare - leaving dbg.values for an alloca were they are (see comment) Other - regenerated/updated instcombine.ll testcase and included source rdar://problem/16679879 http://reviews.llvm.org/D3374 llvm-svn: 207269
* DwarfUnit: Remove unused functionDavid Blaikie2014-04-251-4/+0
| | | | llvm-svn: 207264
* DIE: Pass ownership of children via std::unique_ptr rather than raw pointer.David Blaikie2014-04-255-36/+40
| | | | | | | | | | | This should reduce the chance of memory leaks like those fixed in r207240. There's still some unclear ownership of DIEs happening in DwarfDebug. Pushing unique_ptr and references through more APIs should help expose the cases where ownership is a bit fuzzy. llvm-svn: 207263
* DIEEntry: Refer to the specified DIE via reference rather than pointer.David Blaikie2014-04-256-30/+29
| | | | | | | Makes some more cases (the unit tests, specifically), lexically compatible with a change to unique_ptr. llvm-svn: 207261
* DwarfUnit: return by reference from createAndAddDIEDavid Blaikie2014-04-254-53/+53
| | | | | | | Since this doesn't return ownership (the DIE has been added to the specified parent already) nor return null, just return by reference. llvm-svn: 207259
* blockfreq: Further shift logic to LoopDataDuncan P. N. Exon Smith2014-04-251-27/+12
| | | | | | | | | Move a lot of the loop-related logic that was sprinkled around the code into `LoopData`. <rdar://problem/14292693> llvm-svn: 207258
* enable fast isel tablegen files for MipsReed Kotler2014-04-252-1/+2
| | | | | | | | | | Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3498 llvm-svn: 207256
* Return DIE by reference instead of pointer from DwarfUnit::getUnitDieDavid Blaikie2014-04-254-20/+20
| | | | llvm-svn: 207255
* DwarfUnit: Suddently, DIE references, everywhere.David Blaikie2014-04-254-309/+300
| | | | | | | | This'll make changing to unique_ptr ownership of DIEs easier since the usages will now have '*' on them making them textually compatible between unique_ptr and raw pointer. llvm-svn: 207253
* SCC: Change clients to use const, NFCDuncan P. N. Exon Smith2014-04-255-12/+11
| | | | | | | | | | It's fishy to be changing the `std::vector<>` owned by the iterator, and no one actual does it, so I'm going to remove the ability in a subsequent commit. First, update the users. <rdar://problem/14292693> llvm-svn: 207252
* Revert "This reapplies r207130 with an additional testcase+and a missing ↵Adrian Prantl2014-04-258-73/+45
| | | | | | | | check for" This reverts commit 207235 to investigate msan buildbot breakage. llvm-svn: 207250
* Make sure that DSUB does not duplicate the pattern of DSUBUReed Kotler2014-04-251-1/+1
| | | | | | | | | | | | | | Test Plan: Run test suite to make sure there is no regression. https://dmz-portal.mips.com/bb/builders/LLVM%20with%2064bit%20and%20delay%20slot%20optimizer%20and%20direct%20object%20emitter/builds/626 Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3497 llvm-svn: 207247
* ARM: remove @llvm.arm.sevlSaleem Abdulrasool2014-04-252-3/+0
| | | | | | | | This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic which provides a generic, extensible manner for adding hint instructions. This functionality can now be represented as @llvm.arm.hint(i32 5). llvm-svn: 207246
* [inline cold threshold] Command line argument for inline threshold willManman Ren2014-04-251-1/+6
| | | | | | | | | | | override the default cold threshold. When we use command line argument to set the inline threshold, the default cold threshold will not be used. This is in line with how we use OptSizeThreshold. When we want a higher threshold for all functions, we do not have to set both inline threshold and cold threshold. llvm-svn: 207245
* Refactor some common logic in DwarfUnit::constructVariableDIE and pass ↵David Blaikie2014-04-253-17/+23
| | | | | | non-null DIE by reference to DbgVariable::setDIE llvm-svn: 207244
OpenPOWER on IntegriCloud