summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [TwoAddressInstruction] Fix typo in comment. NFCCraig Topper2017-02-041-1/+1
| | | | llvm-svn: 294083
* [RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUEBrendon Cahoon2017-02-041-1/+1
| | | | | | | | | | | An assert occurs when calling SlotIndexes::getInstructionIndex with a DBG_VALUE instruction because the function expects an instruction with a slot index. However, there is no slot index for a DBG_VALUE instruction. Differential Revision: https://reviews.llvm.org/D29048 llvm-svn: 294070
* [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.Ahmed Bougacha2017-02-031-97/+53
| | | | | | | | | | | | | | | | | | | | | | | This re-applies commit r292189, reverted in r292191. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to a few tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. - a handful of SystemZ tests check the SDAG support for lax prototype checking: Ulrich agrees on removing them. I don't think it's worth supporting any of these (IMO) invalid testcases. Instead, fix them to be more meaningful. llvm-svn: 294028
* GlobalISel: translate dynamic alloca instructions.Tim Northover2017-02-032-8/+99
| | | | llvm-svn: 294022
* [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=Alexey Bataev2017-02-031-8/+12
| | | | | | | | | | | | ISD::DELETED_NODE && "NodeToMatch was removed partway through selection"' failed. NodeToMatch can be modified during matching, but code does not handle this situation. Differential Revision: https://reviews.llvm.org/D29292 llvm-svn: 294003
* DebugInfo: ensure type and namespace names are included in pubnames/pubtypes ↵David Blaikie2017-02-035-14/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | even when they are only present in type units While looking to add support for placing singular types (types that will only be emitted in one place (such as attached to a strong vtable or explicit template instantiation definition)) not in type units (since type units have overhead) I stumbled across that change causing an increase in pubtypes. Turns out we were missing some types from type units if they were only referenced from other type units and not from the debug_info section. This fixes that, following GCC's line of describing the offset of such entities as the CU die (since there's no compile unit-relative offset that would describe such an entity - they aren't in the CU). Also like GCC, this change prefers to describe the type stub within the CU rather than the "just use the CU offset" fallback where possible. This may give the DWARF consumer some opportunity to find the extra info in the type stub - though I'm not sure GDB does anything with this currently. The size of the pubnames/pubtypes sections now match exactly with or without type units enabled. This nearly triples (+189%) the pubtypes section for a clang self-host and grows pubnames by 0.07% (without compression). For a total of 8% increase in debug info sections of the objects of a Split DWARF build when using type units. llvm-svn: 293971
* [lto] add getLinkerOpts()Bob Haarman2017-02-021-29/+32
| | | | | | | | | | | | Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API. Reviewers: pcc, ruiu, mehdi_amini, tejohnson Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D29207 llvm-svn: 293950
* [CodeGen] Remove dead call-or-prologue enum from CCStateReid Kleckner2017-02-021-2/+1
| | | | | | | This enum has been dead since Olivier Stannard re-implemented ARM byval handling in r202985 (2014). llvm-svn: 293943
* [PGO] internal option cleanupsXinliang David Li2017-02-022-0/+10
| | | | | | | | | | 1. Added comments for options 2. Added missing option cl::desc field 3. Uniified function filter option for graph viewing. Now PGO count/raw-counts share the same filter option: -view-bfi-func-name=. llvm-svn: 293938
* [LiveRangeEdit] Don't mess up with LiveInterval when a new vreg is created.Quentin Colombet2017-02-021-3/+10
| | | | | | | | | | | | | | | | | | In r283838, we added the capability of splitting unspillable register. When doing so we had to make sure the split live-ranges were also unspillable and we did that by marking the related live-ranges in the delegate method that is called when a new vreg is created. However, by accessing the live-range there, we also triggered their lazy computation (LiveIntervalAnalysis::getInterval) which is not what we want in general. Indeed, later code in LiveRangeEdit is going to build the live-ranges this lazy computation may mess up that computation resulting in assertion failures. Namely, the createEmptyIntervalFrom method expect that the live-range is going to be empty, not computed. Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and reporting the problem. llvm-svn: 293934
* [PGO] make graph view internal options available for all buildsXinliang David Li2017-02-022-13/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D29259 llvm-svn: 293921
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-022-373/+369
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFCAmaury Sechet2017-02-021-1/+1
| | | | llvm-svn: 293903
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-022-369/+373
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* RegisterCoalescer: Cleanup joinReservedPhysReg(); NFCMatthias Braun2017-02-021-9/+24
| | | | | | | | - Factor out a common subexpression - Add some helpful comments - Fix printing of a register in a debug message llvm-svn: 293856
* Remove an assertion that doesn't hold when mixing -g and -gmlt throughPaul Robinson2017-02-011-3/+1
| | | | | | | | | | LTO. Replace it with a related assertion, ensuring that abstract variables appear only in abstract scopes. Part of PR31437. Differential Revision: http://reviews.llvm.org/D29430 llvm-svn: 293841
* Change debug-info-for-profiling from a TargetOption to a function attribute.Dehao Chen2017-02-012-2/+2
| | | | | | | | | | | | | | Summary: LTO requires the debug-info-for-profiling to be a function attribute. Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl Reviewed By: mehdi_amini, dblaikie, aprantl Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29203 llvm-svn: 293833
* Remove an assertion that doesn't hold when mixing -g and -gmlt throughPaul Robinson2017-02-011-1/+0
| | | | | | | | LTO. Part of PR31437. Differential Revision: http://reviews.llvm.org/D29310 llvm-svn: 293818
* [ImplicitNullCheck] Extend canReorder scopeSanjoy Das2017-02-011-3/+4
| | | | | | | | | | | | | | | | | | Summary: This change allows a re-order of two intructions if their uses are overlapped. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29120 llvm-svn: 293775
* [legalizetypes] Push fp16 -> fp32 extension node to worklist. Florian Hahn2017-02-013-5/+7
| | | | | | | | | | | | | | | | | | | | | Summary: This way, the type legalization machinery will take care of registering the result of this node properly. This patches fixes all failing fp16 test cases with expensive checks. (CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll CodeGen/X86/soft-fp.ll) Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel Reviewed By: hfinkel Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28195 llvm-svn: 293765
* [CodeGen] Move MacroFusion to the targetEvandro Menezes2017-02-011-74/+0
| | | | | | | | | | | | | This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737
* [ImplicitNullCheck] NFC isSuitableMemoryOp cleanupSanjoy Das2017-02-011-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: isSuitableMemoryOp method is repsonsible for verification that instruction is a candidate to use in implicit null check. Additionally it checks that base register is not re-defined before. In case base has been re-defined it just returns false and lookup is continued while any suitable instruction will not succeed this check as well. This results in redundant further operations. So when we found that base register has been re-defined we just stop. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29119 llvm-svn: 293736
* Fix regalloc assignment of overlapping registersStanislav Mekhanoshin2017-02-011-0/+21
| | | | | | | | | | | | | | | | SplitEditor::defFromParent() can create a register copy. If register is a tuple of other registers and not all lanes are used a copy will be done on a full tuple regardless. Later register unit for an unused lane will be considered free and another overlapping register tuple can be assigned to a different value even though first register is live at that point. That is because interference only look at liveness info, while full register copy clobbers all lanes, even unused. This patch fixes copy to only cover used lanes. Differential Revision: https://reviews.llvm.org/D29105 llvm-svn: 293728
* CodeGen: Allow small copyable blocks to "break" the CFG.Kyle Butt2017-01-313-35/+333
| | | | | | | | | | | When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716
* GlobalISel: the translation of an invoke must branch to the good block.Tim Northover2017-01-311-0/+1
| | | | | | | Otherwise bad things happen if the basic block order isn't trivial after an invoke. llvm-svn: 293679
* InterleaveAccessPass: Avoid constructing invalid shuffle masksMatthias Braun2017-01-311-2/+6
| | | | | | | | | Fix a bug where we would construct shufflevector instructions addressing invalid elements. Differential Revision: https://reviews.llvm.org/D29313 llvm-svn: 293673
* GlobalISel: merge invoke and call translation paths.Tim Northover2017-01-312-10/+16
| | | | | | | | Well, sort of. But the lower-level code that invoke used to be using completely botched the handling of varargs functions, which hopefully won't be possible if they're using the same code. llvm-svn: 293670
* [X86] Implement -mfentryNirav Dave2017-01-314-0/+60
| | | | | | | | | | | | Summary: Insert calls to __fentry__ at function entry. Reviewers: hfinkel, craig.topper Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28000 llvm-svn: 293648
* [DAGCombine] require UnsafeFPMath for re-association of additionNicolai Haehnle2017-01-311-6/+18
| | | | | | | | | | | | | | | | | | | Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635
* [ExecutionDepsFix] Improve clearance calculation for loopsKeno Fischer2017-01-301-85/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In revision rL278321, ExecutionDepsFix learned how to pick a better register for undef register reads, e.g. for instructions such as `vcvtsi2sdq`. While this revision improved performance on a good number of our benchmarks, it unfortunately also caused significant regressions (up to 3x) on others. This regression turned out to be caused by loops such as: PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT ^ | +----------------------------------+ In the previous version of the clearance calculation, we would visit the blocks in order, remembering for each whether there were any incoming backedges from blocks that we hadn't processed yet and if so queuing up the block to be re-processed. However, for loop structures such as the above, this is clearly insufficient, since the block B does not have any unknown backedges, so we do not see the false dependency from the previous interation's Def of xmm registers in B. To fix this, we need to consider all blocks that are part of the loop and reprocess them one the correct clearance values are known. As an optimization, we also want to avoid reprocessing any later blocks that are not part of the loop. In summary, the iteration order is as follows: Before: PH A B C D A' Corrected (Naive): PH A B C D A' B' C' D' Corrected (w/ optimization): PH A B C A' B' C' D To facilitate this optimization we introduce two new counters for each basic block. The first counts how many of it's predecssors have completed primary processing. The second counts how many of its predecessors have completed all processing (we will call such a block *done*. Now, the criteria to reprocess a block is as follows: - All Predecessors have completed primary processing - For x the number of predecessors that have completed primary processing *at the time of primary processing of this block*, the number of predecessors that are done has reached x. The intuition behind this criterion is as follows: We need to perform primary processing on all predecessors in order to find out any direct defs in those predecessors. When predecessors are done, we also know that we have information about indirect defs (e.g. in block B though that were inherited through B->C->A->B). However, we can't wait for all predecessors to be done, since that would cause cyclic dependencies. However, it is guaranteed that all those predecessors that are prior to us in reverse postorder will be done before us. Since we iterate of the basic blocks in reverse postorder, the number x above, is precisely the count of the number of predecessors prior to us in reverse postorder. Reviewers: myatsina Differential Revision: https://reviews.llvm.org/D28759 llvm-svn: 293571
* GlobalISel: correctly translate invoke when callee is a register.Tim Northover2017-01-301-1/+5
| | | | | | This should fix the GlobalISel verifier. llvm-svn: 293550
* GlobalISel: account for differing exception selector sizes.Tim Northover2017-01-301-1/+10
| | | | | | | | | For some reason the exception selector register must be a pointer (that's assumed by SDag); on the other hand, it gets moved into an IR-level type which might be entirely different (i32 on AArch64). IRTranslator needs to be aware of this. llvm-svn: 293546
* GlobalISel: tidy up def/use test. NFC.Tim Northover2017-01-301-2/+2
| | | | llvm-svn: 293545
* GlobalISel: translate memset & memmove.Tim Northover2017-01-301-9/+25
| | | | llvm-svn: 293541
* GlobalISel: permit unused vregs without a register-class after ISel.Tim Northover2017-01-301-5/+9
| | | | | | | This can happen if earlier combining has removed all uses of some VReg, which is fine and shouldn't flag an error. llvm-svn: 293537
* Use SelectionDAG::getBuildVector helper function where possible. NFCI.Simon Pilgrim2017-01-302-21/+19
| | | | llvm-svn: 293532
* SDAG: Update ChainNodesMatched during UpdateChains if a node is replacedJustin Bogner2017-01-301-1/+11
| | | | | | | | | | | Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522
* Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where ↵Simon Pilgrim2017-01-301-7/+3
| | | | | | possible. NFCI. llvm-svn: 293520
* DAG: Fold fneg into compare with constant into the constantMatt Arsenault2017-01-301-0/+10
| | | | | | | | fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512
* unique_ptrify some containers in GlobalISel::RegisterBankInfoDavid Blaikie2017-01-301-19/+9
| | | | | | | | | To simplify/clarify memory ownership, make leaks (as one was found/fixed recently) harder to write, etc. (also, while I was there - removed a duplicate lookup in a container) llvm-svn: 293506
* DAG: Constant fold fp16_to_fp/fp16_to_fpMatt Arsenault2017-01-301-0/+19
| | | | | | | This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499
* [GlobalISel] Add support for indirectbrKristof Beyls2017-01-302-0/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D28079 llvm-svn: 293470
* MachineInstr: Remove parameter from dump()Matthias Braun2017-01-292-3/+5
| | | | | | | | | | | | | The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features). Differential Revision: https://reviews.llvm.org/D29241 llvm-svn: 293440
* [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.Craig Topper2017-01-291-5/+0
| | | | | | It's operation already exists manually in many places without using the method. llvm-svn: 293421
* [DAGCombiner] Use unsigned for a constant vector index instead of APInt.Craig Topper2017-01-291-2/+2
| | | | | | The type system requires that the number of vector elements should fit in 32-bits so this should be safe. llvm-svn: 293414
* [DAGCombiner] Remove unnecessary check on the size of the type of the index ↵Craig Topper2017-01-291-3/+1
| | | | | | | | of EXTRACT_SUBVECTOR. The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue. llvm-svn: 293413
* [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before ↵Craig Topper2017-01-291-9/+9
| | | | | | trying to use getConstantOperandVal. llvm-svn: 293412
* Add support to dump dot graph block layout after MBPXinliang David Li2017-01-294-5/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D29141 llvm-svn: 293408
* Use print() instead of dump() in codeMatthias Braun2017-01-282-2/+8
| | | | llvm-svn: 293371
* [RegisterBankInfo] Emit proper type for remapped registers.Quentin Colombet2017-01-281-3/+25
| | | | | | | | | | | | | | | | When the OperandsMapper creates virtual registers, it used to just create plain scalar register with the right size. This may confuse the instruction selector because we lose the information of the instruction using those registers what supposed to do. The MachineVerifier complains about that already. With this patch, the OperandsMapper still creates plain scalar register, but the expectation is for the mapping function to remap the type properly. The default mapping function has been updated to do that. rdar://problem/30231850 llvm-svn: 293362
OpenPOWER on IntegriCloud