summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* And... fix the build some more.David Blaikie2014-11-011-1/+1
| | | | llvm-svn: 221036
* Just iterate the DwarfCompileUnits rather than trying to filter them out of ↵David Blaikie2014-11-011-49/+46
| | | | | | the list of all units. llvm-svn: 221034
* Add '*' to auto variable that is a pointer, as per the coding conventions.David Blaikie2014-11-011-1/+1
| | | | llvm-svn: 221033
* Add DwarfCompileUnit::getSkeleton that returns DwarfCompileUnit* to avoid ↵David Blaikie2014-11-012-3/+6
| | | | | | having to cast from DwarfUnit* on every call. llvm-svn: 221031
* IR: MDNode => Value: Instruction::getMetadata()Duncan P. N. Exon Smith2014-11-012-8/+8
| | | | | | | | | | Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024
* Sink some of DwarfDebug::collectDeadVariables down into DwarfCompileUnit.David Blaikie2014-10-314-20/+25
| | | | llvm-svn: 221010
* Sink most of DwarfDebug::constructAbstractSubprogramScopeDIE into ↵David Blaikie2014-10-313-14/+13
| | | | | | DwarfCompileUnit llvm-svn: 221005
* [CodeGenPrepare] Move extractelement close to store if they can be combined.Quentin Colombet2014-10-311-1/+379
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. ** Context ** Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. ** Motivating Example ** Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. ** Proposed Solution ** To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978
* Correct assert text from r220923David Blaikie2014-10-311-1/+1
| | | | | | Noticed in post-commit review by Adrian Prantl. llvm-svn: 220967
* PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.Hao Liu2014-10-311-1/+5
| | | | | | Initial patch by Oleg Ranevskyy. llvm-svn: 220945
* [SelectionDAG] When scalarizing trunc, don't assert for legal operands.Ahmed Bougacha2014-10-301-1/+17
| | | | | | | | | | | | | | | | | | | | | r212242 introduced a legalizer hook, originally to let AArch64 widen v1i{32,16,8} rather than scalarize, because the legalizer expected, when scalarizing the result of a conversion operation, to already have scalarized the operands. On AArch64, v1i64 is legal, so that commit ensured operations such as v1i32 = trunc v1i64 wouldn't assert. It did that by choosing to widen v1 types whenever possible. However, v1i1 types, for which there's no legal widened type, would still trigger the assert. This commit fixes that, by only scalarizing a trunc's result when the operand has already been scalarized, and introducing an extract_elt otherwise. This is similar to r205625. Fixes PR20777. llvm-svn: 220937
* Fix incorrect invariant check in DAG CombineLouis Gerbarg2014-10-301-1/+1
| | | | | | | | | | | | | Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933
* PR21408: Workaround the appearance of duplicate variables due to problems ↵David Blaikie2014-10-301-1/+6
| | | | | | when inlining two calls to the same function from the same call site. llvm-svn: 220923
* Whitespace.NAKAMURA Takumi2014-10-295-40/+40
| | | | llvm-svn: 220857
* Minimize the scope of some variables, NFC.David Blaikie2014-10-281-2/+2
| | | | llvm-svn: 220759
* [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of theseLang Hames2014-10-271-29/+50
| | | | | | | | | | | sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688
* Remove some unnecessary casts.David Blaikie2014-10-261-2/+2
| | | | llvm-svn: 220658
* Sink DwarfUnit::constructImportedEntityDIE into DwarfCompileUnit.Frederic Riss2014-10-244-32/+32
| | | | | | | | | | So that it has access to getOrCreateGlobalVariableDIE. If we ever support decsribing using directive in C++ classes (thus requiring support in type units), it will certainly use another mechanism anyway. Differential Revision: http://reviews.llvm.org/D5975 llvm-svn: 220594
* Fix copy paste commentMatt Arsenault2014-10-241-2/+2
| | | | llvm-svn: 220581
* DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFileDavid Blaikie2014-10-245-11/+11
| | | | | | | | (part of refactoring to allow subprogram emission in both the skeleton and main units to enable -gmlt-like data to be included in the skeleton for live inlined backtracing purposes) llvm-svn: 220578
* Remove DwarfDebug::FirstCU as it has no useDavid Blaikie2014-10-242-17/+5
| | | | | | | It was only being used as a flag to identify the lack of debug info from within endModule - use the section labels for that instead. llvm-svn: 220575
* Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel2014-10-241-40/+77
| | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
* Added reset of LexicalScope in LiveDebugVariables reset function.Marcello Maggioni2014-10-241-0/+1
| | | | llvm-svn: 220545
* Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries ↵Timur Iskhodzhanov2014-10-241-9/+47
| | | | | | | | with VS2012+ Reviewed at http://reviews.llvm.org/D5772 llvm-svn: 220544
* DebugInfo: Remove DwarfDebug::addScopeVariable now that it's just a trivial ↵David Blaikie2014-10-244-13/+6
| | | | | | wrapper llvm-svn: 220542
* [SelectionDAG] Teach the vector scalarizer about FP conversions.Ahmed Bougacha2014-10-231-0/+4
| | | | | | | | | | | | | | | | | This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533
* Update comment and fix typos in assert message. (NFC)Ahmed Bougacha2014-10-231-3/+3
| | | | llvm-svn: 220531
* ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodesTim Northover2014-10-233-19/+36
| | | | | | | | | | | | | | | x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529
* DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle ↵David Blaikie2014-10-235-51/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument ordering of other arguments (abstract arguments) in the same way and already have code for that too. While refactoring this code I was confused by both the name I had introduced (addNonArgumentVariable... but it has all this logic to handle argument numbering and keep things in order?) and by the redundancy. Seems when I fixed the misordered inlined argument handling, I didn't realize it was mostly redundant with the argument ordering code (which I may've also written, I'm not sure). So let's just rely on the more general case. The only oddity in output this produces is that it means when we emit all the variables for the current function, we don't track when we've finished the argument variables and are about to start the local variables and insert DW_AT_unspecified_parameters (for varargs functions) there. Instead it ends up after the local variables, scopes, etc. But this isn't invalid and doesn't cause DWARF consumers problems that I know of... so we'll just go with that because it makes the code nice & simple. (though, let's see what the buildbots have to say about this - *crosses fingers*) There will be some cleanup commits to follow to remove the now trivial wrappers, etc. llvm-svn: 220527
* DebugInfo: Sink DwarfDebug::addNonArgumentScopeVariable into DwarfFile.David Blaikie2014-10-234-35/+35
| | | | llvm-svn: 220520
* DebugInfo: Remove DwarfDebug::addCurrentFnArgument declaration now that it's ↵David Blaikie2014-10-231-4/+0
| | | | | | moved to DwarfFile. llvm-svn: 220515
* DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.David Blaikie2014-10-231-51/+26
| | | | | | | | | | | | | | | | | | | | | | | This fixes a bug (introduced by fixing the IR emitted from Clang where the definition of a static member would be scoped within the class, rather than within its lexical decl context) where the definition of a static variable would be placed inside a class. It also improves source fidelity by scoping static class member definitions inside the lexical decl context in which tehy are written (eg: namespace n { class foo { static int i; } int foo::i; } - the definition of 'i' will be within the namespace 'n' in the DWARF output now). Lastly, and the original goal, this reduces debug info size slightly (and makes debug info easier to read, etc) by placing the definitions of non-member global variables within their namespace, rather than using a separate namespace-scoped declaration along with a definition at global scope. Based on patches and discussion with Frédéric. llvm-svn: 220497
* Remove explicit (void) use of DwarfFile::DD that was accidentally left in ↵David Blaikie2014-10-231-3/+1
| | | | | | | | r220452. Caught in post-commit review by Frédéric. llvm-svn: 220487
* [DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.David Blaikie2014-10-233-24/+28
| | | | | | | | Variable handling will be sunk into DwarfFile so that abstract variables and the like can be shared across multiple CUs (to handle cross-CU inlining, for example). llvm-svn: 220453
* [DebugInfo] Add DwarfDebug& to DwarfFile.David Blaikie2014-10-233-10/+17
| | | | | | | | Use the DwarfDebug in one function that previously took it as a parameter, and lay the foundation for use this for other operations coming soon. llvm-svn: 220452
* [DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of ↵David Blaikie2014-10-232-13/+19
| | | | | | | | | | LexicalScopes::getCurrentFunctionScope Now that we're sure the only root (non-abstract) scope is the current function scope, there's no need for isCurrentFunctionScope, the property can be tested directly instead. llvm-svn: 220451
* Strength reduce constant-sized vectors into arrays. No functionality change.Benjamin Kramer2014-10-221-2/+2
| | | | llvm-svn: 220412
* Fix typoMatt Arsenault2014-10-221-1/+1
| | | | llvm-svn: 220353
* Add minnum / maxnum codegenMatt Arsenault2014-10-2111-0/+188
| | | | llvm-svn: 220342
* Pacify bots and simplify r220321Arnaud A. de Grandmaison2014-10-211-1/+1
| | | | llvm-svn: 220335
* [PBQP] Teach PassConfig to tell if the default register allocator is used.Arnaud A. de Grandmaison2014-10-211-0/+6
| | | | | | | | This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321
* [PBQP] Fix coalescing benefitsArnaud A. de Grandmaison2014-10-211-2/+2
| | | | | | As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible. llvm-svn: 220302
* Fix a bit of confusion about .set and produce more readable assembly.Rafael Espindola2014-10-211-34/+24
| | | | | | | | | | | | | | | Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256
* Make AsmPrinter::EmitLabelOffsetDifference a static helper and simplify.Rafael Espindola2014-10-212-33/+27
| | | | | | It had exactly one caller in a position where we know hasSetDirective is true. llvm-svn: 220250
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-5/+5
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* [PBQP] Replace the interference-constraints algorithm with a faster versionLang Hames2014-10-181-16/+115
| | | | | | | | loosely based on linear scan. On x86-64 this is good for a ~2% drop in compile time on the nightly test suite. llvm-svn: 220143
* Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper2014-10-171-1/+7
| | | | | | | | | | | | | | | | | | TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
* [Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2014-10-172-51/+64
| | | | | | | | | | | Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
* SelectionDAG: Add sext_inreg optimizationsJan Vesely2014-10-171-0/+22
| | | | | | | | | | v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
* Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.Juergen Ributzka2014-10-162-44/+58
| | | | | | | | | | | | This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
OpenPOWER on IntegriCloud