summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"Duncan P. N. Exon Smith2014-04-181-9/+3
| | | | | | | | | | | This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556
* blockfreq: Rewrite BlockFrequencyInfoImplDuncan P. N. Exon Smith2014-04-181-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is *not* incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548
* Fix bug 19437 - Only add discriminators for DWARF 4 and above.Diego Novillo2014-04-171-9/+1
| | | | | | | | | | | | | | Summary: This prevents the discriminator generation pass from triggering if the DWARF version being used in the module is prior to 4. Reviewers: echristo, dblaikie CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3413 llvm-svn: 206507
* [stack protector] Make the StackProtector pass respect ssp-buffer-size.Josh Magee2014-04-171-3/+3
| | | | | | | | | | | Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size" attribute after all uses of SSPBufferSize. The effect was that the default SSPBufferSize was always used during analysis. I moved the check for the attribute before the analysis; now --param ssp-buffer-size= works correctly again. Differential Revision: http://reviews.llvm.org/D3349 llvm-svn: 206486
* Atomics: promote ARM's IR-based atomics pass to CodeGen.Tim Northover2014-04-173-0/+338
| | | | | | | | | | | | Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485
* [c++11] Tidy up AsmPrinter.cpp.Jim Grosbach2014-04-161-95/+77
| | | | | | | Range'ify loops and tidy up some by-reference handling. No functional change. llvm-svn: 206422
* DAGCombiner: don't optimise non-existant litpool loadTim Northover2014-04-161-1/+3
| | | | | | | | | | | This particular DAG combine is designed to kick in when both ConstantFPs will end up being loaded via a litpool, however those nodes have a semi-legal status, dictated by isFPImmLegal so in some cases there wouldn't have been a litpool in the first place. Don't try to be clever in those circumstances. Picked up while merging some AArch64 tests. llvm-svn: 206365
* Convert SelectionDAG::getVTList to use ArrayRefCraig Topper2014-04-165-18/+19
| | | | llvm-svn: 206357
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-169-32/+37
| | | | | | instead of comparing to nullptr. llvm-svn: 206356
* Make FastISel::SelectInstruction return before target specific fast-isel codeAkira Hatanaka2014-04-151-2/+8
| | | | | | | | | | | handles Intrinsic::trap if TargetOptions::TrapFuncName is set. This fixes a bug in which the trap function was not taken into consideration when a program was compiled without optimization (at -O0). <rdar://problem/16291933> llvm-svn: 206323
* Revert r191049/r191059 as it can produce wrong code (see PR17975).Robert Lougher2014-04-151-21/+0
| | | | | | It has already been reverted on the 3.4 branch in r196521. llvm-svn: 206311
* verify-di: Implement DebugInfoVerifierDuncan P. N. Exon Smith2014-04-151-1/+9
| | | | | | | | | | | | | | | | | | | | | Implement DebugInfoVerifier, which steals verification relying on DebugInfoFinder from Verifier. - Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps DebugInfoVerifier. Uses -verify-di command-line flag. - Change verifyModule() to invoke DebugInfoVerifier as well as Verifier. - Add a call to createDebugInfoVerifierPass() wherever there was a call to createVerifierPass(). This implementation as a module pass should sidestep efficiency issues, allowing us to turn debug info verification back on. <rdar://problem/15500563> llvm-svn: 206300
* FastISel: constrain the RegClass of operands when emitting instructions.Tim Northover2014-04-151-8/+47
| | | | | | | | | | | ARM64 suffered multiple -verify-machineinstr failures (principally over the xsp/xzr issue) because FastISel was completely ignoring which subset of the general-purpose registers each instruction required. More fixes are coming in ARM64 specific FastISel, but this should cover the generic problems. llvm-svn: 206283
* Break PseudoSourceValue out of the Value hierarchy. It is now the root of ↵Nick Lewycky2014-04-1511-231/+143
| | | | | | its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255
* Use unique_ptr to manage TypePromotionActions owned by TypePromotionTransaction.David Blaikie2014-04-151-28/+19
| | | | llvm-svn: 206250
* Use unique_ptr to manage ownership of GCFunctionInfos in GCStrategyDavid Blaikie2014-04-151-10/+2
| | | | llvm-svn: 206249
* Use unique_ptr for the result of Registry entries.David Blaikie2014-04-152-8/+6
| | | | llvm-svn: 206248
* Use unique_ptr to manage ownership of GCStrategy objects in GCMetadataDavid Blaikie2014-04-152-19/+12
| | | | llvm-svn: 206246
* Use std::unique_ptr for DIE childrenDavid Blaikie2014-04-145-43/+31
| | | | | | | | | Got bored, removed some manual memory management. Pushed references (rather than pointers) through a few APIs rather than replacing *x with x.get(). llvm-svn: 206222
* Re-apply r206096 after investigating the gdb buildbot failure.Adrian Prantl2014-04-141-9/+14
| | | | | | | | | | | | | | | Thanks to dblaikie for updating the testcase! Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions, therefore, their declaration cannot have one DW_AT_linkage_name. The specific instances however can and should have that attribute. This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE() to emit linkage names for C/Dtors. rdar://problem/16362674. llvm-svn: 206210
* Don't assert in BasicTTI::getMemoryOpCost for non-simple typesHal Finkel2014-04-141-6/+8
| | | | | | | | BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). llvm-svn: 206150
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-1493-1017/+1025
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* Retire llvm::array_endof in favor of non-member std::end.Benjamin Kramer2014-04-121-1/+1
| | | | | | While there make array_lengthof constexpr if we have support for it. llvm-svn: 206112
* PR13337: Omit DW_TAG_restrict_type when compiling for DWARF2David Blaikie2014-04-121-0/+4
| | | | | | | DWARF3 introduced DW_TAG_restrict_type, so avoid using it in prior versions. llvm-svn: 206105
* Revert "Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,"Adrian Prantl2014-04-121-14/+9
| | | | | | | This reverts commit 206096 while I investigate why this broke the gdb buildbot. llvm-svn: 206103
* Use dwarf::Tag rather than unsigned for DIE::Tag to make debugging easier.David Blaikie2014-04-124-8/+13
| | | | | | | | | | | Nice to be able to just print out the Tag and have the debugger print dwarf::DW_TAG_subprogram or whatever, rather than an int. It's a bit finicky (for example DIDescriptor::getTag still returns unsigned) because some places still handle real dwarf tags + our fake tags (one day we'll remove the fake tags, hopefully). llvm-svn: 206098
* Debug info: (bugfix) C++ C/Dtors can be compiled to multiple functions,Adrian Prantl2014-04-121-9/+14
| | | | | | | | | | | | therefore, their declaration cannot have one DW_AT_linkage_name. The specific instances however can and should have that attribute. This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE() to emit linkage names for C/Dtors. rdar://problem/16362674. llvm-svn: 206096
* Reenable use of TBAA during CodeGenHal Finkel2014-04-122-14/+2
| | | | | | | | | | | | | | | | | | | | We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093
* Add the ability to use GEPs for address sinking in CGPHal Finkel2014-04-121-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092
* blockfreq: Rename BlockFrequencyImpl to BlockFrequencyInfoImplDuncan P. N. Exon Smith2014-04-111-2/+2
| | | | | | | | | | | | This is a shared implementation class for BlockFrequencyInfo and MachineBlockFrequencyInfo, not for BlockFrequency, a related (but distinct) class. No functionality change. <rdar://problem/14292693> llvm-svn: 206083
* [RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive ↵Quentin Colombet2014-04-111-1/+1
| | | | | | | | | | | search option. fexhaustive-register-search => exhaustive-register-search 'f' is a Clang thing! This is related to PR18747. llvm-svn: 206075
* [RegAllocGreedy][Last Chance Recoloring] Addition ofQuentin Colombet2014-04-111-6/+14
| | | | | | | | | | | -fexhaustive-register-search option to allow an exhaustive search during last chance recoloring. This is related to PR18747 Patch by MAYUR PANDEY <mayur.p@samsung.com>. llvm-svn: 206072
* [Register Coalescer] Fix wrong live-range information with rematerialization.Quentin Colombet2014-04-111-0/+21
| | | | | | | | | | | | | | | When rematerializing an instruction that defines a super register that would be used by a physical subregisters we use the related physical super register for the definition. To keep the live-range information accurate, all the defined subregisters must be marked as dead def, otherwise the register allocation may miss some interferences. Working on a reduced test-case! <rdar://problem/16582185> llvm-svn: 206060
* Debug info: Store the DIVariable in DebugLocEntry also for constants,Adrian Prantl2014-04-112-9/+11
| | | | | | | | so DwarfDebug::emitDebugLocEntry can emit them with the correct signedness. rdar://problem/15928306 llvm-svn: 206042
* Move ExtractVectorElements to SelectionDAG.Matt Arsenault2014-04-111-0/+16
| | | | | | | This seems generally useful, and makes sense to go along with SplitVector. llvm-svn: 206041
* SelectionDAG: Use helper function to improve legalization of ISD::MULTom Stellard2014-04-111-0/+17
| | | | | | | | The TargetLowering::expandMUL() helper contains lowering code extracted from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more ISD::MUL patterns without having to use a library call. llvm-svn: 206037
* SelectionDAG: Factor ISD::MUL lowering code out of DAGTypeLegalizerTom Stellard2014-04-112-67/+113
| | | | | | | | | | | This code has been moved to a new function in the TargetLowering class called expandMUL(). The purpose of this is to be able to share lowering code between the SelectionDAGLegalize and DAGTypeLegalizer classes. No functionality changed intended. llvm-svn: 206036
* Implement depth_first and inverse_depth_first range factory functions.David Blaikie2014-04-111-11/+9
| | | | | | | | | | | | | | Also updated as many loops as I could find using df_begin/idf_begin - strangely I found no uses of idf_begin. Is that just used out of tree? Also a few places couldn't use df_begin because either they used the member functions of the depth first iterators or had specific ordering constraints (I added a comment in the latter case). Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T> where you needed iterator_range<idf_iterator<T>>) llvm-svn: 206016
* [c++11] Range'ify use list loops in InstrEmitter.Jim Grosbach2014-04-111-9/+3
| | | | llvm-svn: 206015
* [c++11] Range'ify use list loops in DAGCombiner.Jim Grosbach2014-04-111-18/+7
| | | | llvm-svn: 206014
* Move the segmented stack switch to a function attributeReid Kleckner2014-04-102-1/+6
| | | | | | | | | This removes the -segmented-stacks command line flag in favor of a per-function "split-stack" attribute. Patch by Luqman Aden and Alex Crichton! llvm-svn: 205997
* Debug info: Factor the retrieving of the DIVariable from a MachineInstrAdrian Prantl2014-04-101-3/+2
| | | | | | into a function. llvm-svn: 205973
* Fix to support properly cleaning up failed address sinking against constantsJim Grosbach2014-04-101-2/+3
| | | | | | | | | | | As it turns out the source of the sunkaddr can be a constant, in which case there is not an instruction to delete, causing the cleanup code introduced in r204833 to crash. This patch adds a dynamic check to ensure the deleted value is in fact an instruction and not a constant. Patch by Louis Gerbarg <lgg@apple.com> llvm-svn: 205941
* SelectionDAG: Don't constant fold target-specific nodes.Jim Grosbach2014-04-091-0/+6
| | | | | | | | | | | | | | FoldConstantArithmetic() only knows how to deal with a few target independent ISD opcodes. Bail early if it sees a target-specific ISD node. These node do funny things with operand types which may break the assumptions of the code that follows, and there's no actual folding that can be done anyway. For example, non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a 128-bit v4i32 vector regardless of what the first operand type is and that breaks the assumption that the operand types must match. rdar://16530923 llvm-svn: 205937
* [DAGCombiner] DAG combine does not know how to combine indexed loads withQuentin Colombet2014-04-091-2/+5
| | | | | | | | | | | sign/zero/any extensions. However a few places were not checking properly the property of the load and were turning an indexed load into a regular extended load. Therefore the indexed value was lost during the process and this was triggering an assertion. <rdar://problem/16389332> llvm-svn: 205923
* WinCOFF: Emit common symbols as specified in the COFF specDavid Majnemer2014-04-081-3/+6
| | | | | | | | | | | | | | | | | | Summary: Local common symbols were properly inserted into the .bss section. However, putting external common symbols in the .bss section would give them a strong definition. Instead, encode them as undefined, external symbols who's symbol value is equivalent to their size. Reviewers: Bigcheese, rafael, rnk CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3324 llvm-svn: 205811
* Bug 19348: Check for legal ExtLoad operation before foldingMatt Arsenault2014-04-081-9/+12
| | | | | | | | (aext (zextload x)) -> (aext (truncate (*extload x))) Patch by Stanislav Mekhanoshin! llvm-svn: 205805
* RegAlloc: Account for a variable entry block frequencyDuncan P. N. Exon Smith2014-04-082-13/+58
| | | | | | | | | | | | | | | | | | | | Until r197284, the entry frequency was constant -- i.e., set to 2^14. Although current ToT still has a constant entry frequency, since r197284 that has been an implementation detail (which is soon going to change). - r204690 made the wrong assumption for the CSRCost metric. Adjust callee-saved register cost based on entry frequency. - r185393 made the wrong assumption (although it was valid at the time). Update SpillPlacement.cpp::Threshold to be relative to the entry frequency. Since ToT still has 2^14 entry frequency, this should have no observable functionality change. <rdar://problem/14292693> llvm-svn: 205789
* Put a limit on ScheduleDAGSDNodes::ClusterNeighboringLoads to avoid blowing ↵Andrew Trick2014-04-071-1/+6
| | | | | | | | | | | | | | | | | | | | up compile time. Fixes PR16365 - Extremely slow compilation in -O1 and -O2. The SD scheduler has a quadratic implementation of load clustering which absolutely blows up compile time for large blocks with constant pool loads. The MI scheduler has a better implementation of load clustering. However, we have not done the work yet to completely eliminate the SD scheduler. Some benchmarks still seem to benefit from early load clustering, although maybe by chance. As an intermediate term fix, I just put a nice limit on the number of DAG users to search before finding a match. With this limit there are no binary differences in the LLVM test suite, and the PR16365 test case does not suffer any compile time impact from this routine. llvm-svn: 205738
* Minor change to StackMapLiveness DEBUG output.Andrew Trick2014-04-041-1/+1
| | | | llvm-svn: 205656
OpenPOWER on IntegriCloud