summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* WinCOFF: Emit common symbols as specified in the COFF specDavid Majnemer2014-04-081-3/+6
| | | | | | | | | | | | | | | | | | Summary: Local common symbols were properly inserted into the .bss section. However, putting external common symbols in the .bss section would give them a strong definition. Instead, encode them as undefined, external symbols who's symbol value is equivalent to their size. Reviewers: Bigcheese, rafael, rnk CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3324 llvm-svn: 205811
* Bug 19348: Check for legal ExtLoad operation before foldingMatt Arsenault2014-04-081-9/+12
| | | | | | | | (aext (zextload x)) -> (aext (truncate (*extload x))) Patch by Stanislav Mekhanoshin! llvm-svn: 205805
* RegAlloc: Account for a variable entry block frequencyDuncan P. N. Exon Smith2014-04-082-13/+58
| | | | | | | | | | | | | | | | | | | | Until r197284, the entry frequency was constant -- i.e., set to 2^14. Although current ToT still has a constant entry frequency, since r197284 that has been an implementation detail (which is soon going to change). - r204690 made the wrong assumption for the CSRCost metric. Adjust callee-saved register cost based on entry frequency. - r185393 made the wrong assumption (although it was valid at the time). Update SpillPlacement.cpp::Threshold to be relative to the entry frequency. Since ToT still has 2^14 entry frequency, this should have no observable functionality change. <rdar://problem/14292693> llvm-svn: 205789
* Put a limit on ScheduleDAGSDNodes::ClusterNeighboringLoads to avoid blowing ↵Andrew Trick2014-04-071-1/+6
| | | | | | | | | | | | | | | | | | | | up compile time. Fixes PR16365 - Extremely slow compilation in -O1 and -O2. The SD scheduler has a quadratic implementation of load clustering which absolutely blows up compile time for large blocks with constant pool loads. The MI scheduler has a better implementation of load clustering. However, we have not done the work yet to completely eliminate the SD scheduler. Some benchmarks still seem to benefit from early load clustering, although maybe by chance. As an intermediate term fix, I just put a nice limit on the number of DAG users to search before finding a match. With this limit there are no binary differences in the LLVM test suite, and the PR16365 test case does not suffer any compile time impact from this routine. llvm-svn: 205738
* Minor change to StackMapLiveness DEBUG output.Andrew Trick2014-04-041-1/+1
| | | | llvm-svn: 205656
* Add DAG parameter to ComputeNumSignBitsForTargetNodeMatt Arsenault2014-04-042-1/+2
| | | | | | | | This way, you can check the number of sign bits in the operands. The depth parameter it already has is pretty useless without this. llvm-svn: 205649
* DAGLegalize: add last-ditch type-legalization for VSELECT.Tim Northover2014-04-042-0/+16
| | | | | | | | | | | | | When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can decide that the result is OK (v1i64 is legal on AArch64, for example) but it still need scalarising because of that v1i1. There was no code to do this though. AArch64 and ARM64 have DAG combines to produce efficient code and prevent that occuring in *most* such situations, but there are edge cases that they miss. This adds a legalization to cope with that. llvm-svn: 205626
* ARM64: handle v1i1 types arising from setcc properly.Tim Northover2014-04-041-3/+15
| | | | | | | | | | | | | | | | | | | | There were several overlapping problems here, and this solution is closely inspired by the one adopted in AArch64 in r201381. Firstly, scalarisation of v1i1 setcc operations simply fails if the input types are legal. This is fixed in LegalizeVectorTypes.cpp this time, and allows AArch64 code to be simplified slightly. Second, vselect with such a setcc feeding into it ends up in ScalarizeVectorOperand, where it's not handled. I experimented with an implementation, but found that whatever DAG came out was rather horrific. I think Hao's DAG combine approach is a good one for quality, though there are edge cases it won't catch (to be fixed separately). Should fix PR19335. llvm-svn: 205625
* Make consistent use of MCPhysReg instead of uint16_t throughout the tree.Craig Topper2014-04-048-8/+8
| | | | llvm-svn: 205610
* [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chanceQuentin Colombet2014-04-041-1/+35
| | | | | | | | | | recoloring cut-offs are encountered and register allocation failed. This is related to PR18747 Patch by MAYUR PANDEY <mayur.p@samsung.com>. llvm-svn: 205601
* Revert r205599, the commit was not intended to have so many changesQuentin Colombet2014-04-041-35/+1
| | | | llvm-svn: 205600
* [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chanceQuentin Colombet2014-04-041-1/+35
| | | | | | | | | | recoloring cut-offs are hit. This is related to PR18747. Patch by MAYUR PANDEY <mayur.p@samsung.com> llvm-svn: 205599
* Fix for PR 19261:Eric Christopher2014-04-031-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | llc doesn't generate nodes for unconditional fall-through branches for targets without FastISel implementation (X86 has it, but can be disabled by "-fast-isel=false") in SelectionDAGBuilder::visitBr(). So for line 4 in the following testcase 1: void foo(int i){ 2: switch(i){ 3: default: 4: break; 5: } 6: return; 7: } there is no corresponding line in .debug_line section, and a debugger cannot set a breakpoint at line 4. Fix this by always emitting a branch when we're not optimizing and add a testcase to ensure that there's code on every line we'd want to break. Patch by Daniil Fukalov. llvm-svn: 205529
* DebugInfo: Use a 64 bit type for the subrangeDavid Blaikie2014-04-031-4/+4
| | | | | | | | | | | While we were encoding 64 bit values (data8) in the subrange itself, using a 32 bit type for the subrange was still confusing the gdb. Oh, and make it unsigned too. As the comment points out, this could be pushed into the frontend so that it would be 32 or 64 bit as appropriate, etc. llvm-svn: 205512
* [CodeGen] Fix peephole optimizer bug introduced in r205481. Fixes PR19318.Lang Hames2014-04-031-9/+11
| | | | | | | | I should have read that comment a little more carefully. ;) Regression test in the works, committing in the mean time to un-break people. llvm-svn: 205511
* Account for scalarization costs in BasicTTI::getMemoryOpCost for extending ↵Hal Finkel2014-04-031-2/+24
| | | | | | | | | | | | | | | | | | | | vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) llvm-svn: 205495
* Fix multi-register costs in BasicTTI::getCastInstrCostHal Finkel2014-04-021-1/+2
| | | | | | | | | | | | | For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. llvm-svn: 205487
* [CodeGen] Teach the peephole optimizer to remember (and exploit) all foldingLang Hames2014-04-021-35/+44
| | | | | | | | opportunities in the current basic block, rather than just the last one seen. <rdar://problem/16478629> llvm-svn: 205481
* Add comments and test case for [DAG] Keep the opaque constant flag when ↵Juergen Ributzka2014-04-021-1/+5
| | | | | | performing unary constant folding operations (r204737). llvm-svn: 205474
* Simplify resolveFrameIndex() signature.Jim Grosbach2014-04-021-1/+1
| | | | | | | | Just pass a MachineInstr reference rather than an MBB iterator. Creating a MachineInstr& is the first thing every implementation did anyway. llvm-svn: 205453
* ARM: Add support for segmented stacksOliver Stannard2014-04-021-0/+3
| | | | | | Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav. llvm-svn: 205430
* clarify commentAdrian Prantl2014-04-021-1/+2
| | | | llvm-svn: 205429
* Adjust comments regarding non-relocated abbrev offset in debug_info.dwoDavid Blaikie2014-04-022-2/+4
| | | | | | | | I'm not sure the comment in the implementation really adds a lot of value (it's clear that we emit zero when no symbol is provided, but it doesn't explain why we would do that). Happy to iterate. llvm-svn: 205386
* Split debug_loc and debug_loc.dwo emission into two separate functionsDavid Blaikie2014-04-022-21/+32
| | | | | | Based on code review feedback from Eric Christopher on r204697 llvm-svn: 205385
* DebugInfo: Introduce DebugLocList to encapsulate a list of DebugLocEntries ↵David Blaikie2014-04-025-12/+39
| | | | | | | | | | | | and an MC Label to refer to them This removes the magic-number-esque code creating/retrieving the same label for a debug_loc entry from two places and removes the last small piece of reusable logic from emitDebugLoc so that there will be less duplication when refactoring it into two functions (one for debug_loc, the other for debug_loc.dwo). llvm-svn: 205382
* Add a doxygen comment to DebugLocEntry::Merge.Adrian Prantl2014-04-011-0/+3
| | | | llvm-svn: 205374
* DebugLocEntry: Actually merge the loc entry when returning true.David Blaikie2014-04-011-1/+5
| | | | | | | | | | Seems we didn't have any test coverage for merging... awesome. So I added some - but hit an llvm-objdump bug while I was there. I'm choosing not to shave that yak right now. Code review feedback/bug catch by Adrian Prantl in r205360. llvm-svn: 205373
* Fix accidental fallthrough in DebugLocEntry::hasSameValueOrLocationDavid Blaikie2014-04-011-5/+10
| | | | | | | | | | No test case (this would invoke UB by examining uninitialized members, etc, at best - and this code is apparently untested anyway - I'm about to fix that) Code review feedback from Adrian Prantl on r205360. llvm-svn: 205367
* Remove unused function DebugLocEntry::isEmptyDavid Blaikie2014-04-011-3/+0
| | | | llvm-svn: 205365
* Refactor out the comparison of the location/value in a DebugLocEntryDavid Blaikie2014-04-011-18/+19
| | | | llvm-svn: 205364
* DebugInfo: Split DebugLocEntry into its own file.David Blaikie2014-04-012-85/+113
| | | | | | | It seems big enough that it deserves its own file - but it is header only, so there's no need for another cpp file, etc. llvm-svn: 205360
* DwarfDebug: Prevent DebugLocEntry merging from coalescing two differentAdrian Prantl2014-04-011-2/+9
| | | | | | | | constants into only the first one. rdar://14874886. llvm-svn: 205357
* Make isSetCCEquivalent respect the TargetBooleanContentsMatt Arsenault2014-04-011-19/+22
| | | | llvm-svn: 205336
* Add helpers for checking if a value is a target boolean constant.Matt Arsenault2014-04-011-0/+48
| | | | llvm-svn: 205335
* DebugInfo: Factor out common functionality for rendering debug_loc and ↵David Blaikie2014-04-012-10/+17
| | | | | | | | | debug_loc.dwo location list entries In preparation for refactoring this function into two, one for debug_loc, one for debug_loc.dwo. llvm-svn: 205324
* Cleanup remaining use of removed variable to fix the buildDavid Blaikie2014-04-011-1/+1
| | | | llvm-svn: 205323
* Simplify debug_loc.dwo handling slightly.David Blaikie2014-04-013-8/+3
| | | | llvm-svn: 205322
* DebugInfo: Avoid creating unnecessary/empty line tables and remove the ↵David Blaikie2014-04-011-11/+2
| | | | | | | | | | | | special case of '0' in DwarfCompileUnit::initStmtList by just always using a label difference This moves one case of raw text checking down into the MCStreamer interfaces in the form of a virtual function, even if we ultimately end up consolidating on the one-or-many line tables issue one day, this is nicer in the interim. This just generally streamlines a bunch of use cases into a common code path. llvm-svn: 205287
* LTO type uniquing: store the Decl field of a DIImportedEntity as a DIRef.Adrian Prantl2014-04-011-1/+1
| | | | | | | | | | No other functionality changes, DIBuilder testcase is included in a paired CFE commit. This relaxes the assertion in isScopeRef to also accept subclasses of DIScope. llvm-svn: 205279
* [Stackmaps] Update the stackmap format to use 64-bit relocations for the ↵Juergen Ributzka2014-03-311-20/+36
| | | | | | | | | | | | function address and properly align all entries. This commit updates the stackmap format to version 1 to indicate the reorganizaion of several fields. This was done in order to align stackmap entries to their natural alignment and to minimize padding. Fixes <rdar://problem/16005902> llvm-svn: 205254
* Change shouldSplitVectorElementType to better match the description.Matt Arsenault2014-03-311-1/+1
| | | | | | Pass the entire vector type, and not just the element. llvm-svn: 205247
* Add an optional ability to expand larger BUILD_VECTORs with shufflesHal Finkel2014-03-311-20/+117
| | | | | | | | | | | | | | | | | | | | | | | This adds the ability to expand large (meaning with more than two unique defined values) BUILD_VECTOR nodes in terms of SCALAR_TO_VECTOR and (legal) vector shuffles. There is now no limit of the size we are capable of expanding this way, although we don't currently do this for vectors with many unique values because of the default implementation of TLI's shouldExpandBuildVectorWithShuffles function. There is currently no functional change to any existing targets because the new capabilities are not used unless some target overrides the TLI shouldExpandBuildVectorWithShuffles function. As a result, I've not included a test case for the new functionality in this commit, but regression tests will (at least) be added soon when I commit support for the PPC QPX vector instruction set. The benefit of committing this now is that it makes the shouldExpandBuildVectorWithShuffles callback, which had to be added for other reasons regardless, fully functional. I suspect that other targets will also benefit from tuning the heuristic. llvm-svn: 205243
* Add a TLI hook to control when BUILD_VECTOR might be expanded using shufflesHal Finkel2014-03-311-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two general methods for expanding a BUILD_VECTOR node: 1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle them together. 2. Build the vector on the stack and then load it. Currently, we use a fixed heuristic: If there are only one or two unique defined values, then we attempt an expansion in terms of SCALAR_TO_VECTOR and vector shuffles (provided that the required shuffle mask is legal). Otherwise, always expand via the stack. Even when SCALAR_TO_VECTOR is not legal, this can still be a good idea depending on what tricks the target can play when lowering the resulting shuffle. If the target can't do anything special, however, and if SCALAR_TO_VECTOR is expanded via the stack, this heuristic leads to sub-optimal code (two stack loads instead of one). Because only the target knows whether the SCALAR_TO_VECTORs and shuffles for a build vector of a particular type are likely to be optimial, this adds a new TLI function: shouldExpandBuildVectorWithShuffles which takes the vector type and the count of unique defined values. If this function returns true, then method (1) will be used, subject to the constraint that all of the necessary shuffles are legal (as determined by isShuffleMaskLegal). If this function returns false, then method (2) is always used. This commit does not enhance the current code to support expanding a build_vector with more than two unique values using shuffles, but I'll commit an implementation of the more-general case shortly. llvm-svn: 205230
* Disable each MachineFunctionPass for 'optnone' functions, unless thatPaul Robinson2014-03-3114-0/+42
| | | | | | | pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
* Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELTHal Finkel2014-03-311-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the loop vectorizer vectorizes code that uses the loop induction variable, we often end up with IR like this: %b1 = insertelement <2 x i32> undef, i32 %v, i32 0 %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer %i = add <2 x i32> %b2, <i32 2, i32 3> If the add in this example is not legal (as is the case on PPC with VSX), it will be scalarized, and we'll end up with a number of extract_vector_elt nodes with the vector shuffle as the input operand, and that vector shuffle is fed by one or more build_vector nodes. By the time that vector operations are expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by looking through the vector shuffle (to make sure that no illegal operations are created), and so the extract_vector_elt -> vector shuffle -> build_vector is never simplified to an operand of the build vector. By looking at build_vectors through a shuffle we fix this particular situation, preventing a vector from being built, only to be deconstructed again (for the scalarized add) -- an expensive proposition when this all needs to be done via the stack. We probably want a more comprehensive fix here where we look back recursively through any shuffles to any build_vectors or scalar_to_vectors, etc. but that can come later. llvm-svn: 205179
* Make use of previously generated stores in ↵Hal Finkel2014-03-301-4/+33
| | | | | | | | | | | | | | | | | SelectionDAGLegalize::ExpandExtractFromVectorThroughStack When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire vector and then load the piece we want. This is fine in isolation, but generating a new store (and corresponding stack slot) for each extraction ends up producing code of poor quality. When we scalarize a vector operation (using SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT for each element in the vector. This used to generate one stored copy of the vector for each element in the vector. Now we search the uses of the vector for a suitable store before generating a new one, which results in much more efficient scalarization code. llvm-svn: 205153
* Avoid storing Twines.Benjamin Kramer2014-03-291-22/+19
| | | | | | While there nested ifs into a helper function. No functionality change. llvm-svn: 205108
* CodeGen: add sensible defaults for the ISD::FROUND operationTim Northover2014-03-291-0/+9
| | | | | | Some exotic types didn't know how to handle FROUND, which ARM64 uses. llvm-svn: 205088
* CodeGenPrep: wrangle IR to exploit AArch64 tbz/tbnz inst.Tim Northover2014-03-292-0/+81
| | | | | | | | | | | | | | | | Given IR like: %bit = and %val, #imm-with-1-bit-set %tst = icmp %bit, 0 br i1 %tst, label %true, label %false some targets can emit just a single instruction (tbz/tbnz in the AArch64 case). However, with ISel acting at the basic-block level, all three instructions need to be together for this to be possible. This adds another transformation to CodeGenPrep to expose these opportunities, if targets opt in via the hook. llvm-svn: 205086
* Provide a target override for the cost of using a callee-saved registerManman Ren2014-03-271-2/+6
| | | | | | | | | for the first time. Thanks Andy for the discussion. rdar://16162005 llvm-svn: 204979
OpenPOWER on IntegriCloud