summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombine] require UnsafeFPMath for re-association of additionNicolai Haehnle2017-01-311-6/+18
| | | | | | | | | | | | | | | | | | | Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635
* [ExecutionDepsFix] Improve clearance calculation for loopsKeno Fischer2017-01-301-85/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In revision rL278321, ExecutionDepsFix learned how to pick a better register for undef register reads, e.g. for instructions such as `vcvtsi2sdq`. While this revision improved performance on a good number of our benchmarks, it unfortunately also caused significant regressions (up to 3x) on others. This regression turned out to be caused by loops such as: PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT ^ | +----------------------------------+ In the previous version of the clearance calculation, we would visit the blocks in order, remembering for each whether there were any incoming backedges from blocks that we hadn't processed yet and if so queuing up the block to be re-processed. However, for loop structures such as the above, this is clearly insufficient, since the block B does not have any unknown backedges, so we do not see the false dependency from the previous interation's Def of xmm registers in B. To fix this, we need to consider all blocks that are part of the loop and reprocess them one the correct clearance values are known. As an optimization, we also want to avoid reprocessing any later blocks that are not part of the loop. In summary, the iteration order is as follows: Before: PH A B C D A' Corrected (Naive): PH A B C D A' B' C' D' Corrected (w/ optimization): PH A B C A' B' C' D To facilitate this optimization we introduce two new counters for each basic block. The first counts how many of it's predecssors have completed primary processing. The second counts how many of its predecessors have completed all processing (we will call such a block *done*. Now, the criteria to reprocess a block is as follows: - All Predecessors have completed primary processing - For x the number of predecessors that have completed primary processing *at the time of primary processing of this block*, the number of predecessors that are done has reached x. The intuition behind this criterion is as follows: We need to perform primary processing on all predecessors in order to find out any direct defs in those predecessors. When predecessors are done, we also know that we have information about indirect defs (e.g. in block B though that were inherited through B->C->A->B). However, we can't wait for all predecessors to be done, since that would cause cyclic dependencies. However, it is guaranteed that all those predecessors that are prior to us in reverse postorder will be done before us. Since we iterate of the basic blocks in reverse postorder, the number x above, is precisely the count of the number of predecessors prior to us in reverse postorder. Reviewers: myatsina Differential Revision: https://reviews.llvm.org/D28759 llvm-svn: 293571
* GlobalISel: correctly translate invoke when callee is a register.Tim Northover2017-01-301-1/+5
| | | | | | This should fix the GlobalISel verifier. llvm-svn: 293550
* GlobalISel: account for differing exception selector sizes.Tim Northover2017-01-301-1/+10
| | | | | | | | | For some reason the exception selector register must be a pointer (that's assumed by SDag); on the other hand, it gets moved into an IR-level type which might be entirely different (i32 on AArch64). IRTranslator needs to be aware of this. llvm-svn: 293546
* GlobalISel: tidy up def/use test. NFC.Tim Northover2017-01-301-2/+2
| | | | llvm-svn: 293545
* GlobalISel: translate memset & memmove.Tim Northover2017-01-301-9/+25
| | | | llvm-svn: 293541
* GlobalISel: permit unused vregs without a register-class after ISel.Tim Northover2017-01-301-5/+9
| | | | | | | This can happen if earlier combining has removed all uses of some VReg, which is fine and shouldn't flag an error. llvm-svn: 293537
* Use SelectionDAG::getBuildVector helper function where possible. NFCI.Simon Pilgrim2017-01-302-21/+19
| | | | llvm-svn: 293532
* SDAG: Update ChainNodesMatched during UpdateChains if a node is replacedJustin Bogner2017-01-301-1/+11
| | | | | | | | | | | Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522
* Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where ↵Simon Pilgrim2017-01-301-7/+3
| | | | | | possible. NFCI. llvm-svn: 293520
* DAG: Fold fneg into compare with constant into the constantMatt Arsenault2017-01-301-0/+10
| | | | | | | | fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512
* unique_ptrify some containers in GlobalISel::RegisterBankInfoDavid Blaikie2017-01-301-19/+9
| | | | | | | | | To simplify/clarify memory ownership, make leaks (as one was found/fixed recently) harder to write, etc. (also, while I was there - removed a duplicate lookup in a container) llvm-svn: 293506
* DAG: Constant fold fp16_to_fp/fp16_to_fpMatt Arsenault2017-01-301-0/+19
| | | | | | | This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499
* [GlobalISel] Add support for indirectbrKristof Beyls2017-01-302-0/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D28079 llvm-svn: 293470
* MachineInstr: Remove parameter from dump()Matthias Braun2017-01-292-3/+5
| | | | | | | | | | | | | The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features). Differential Revision: https://reviews.llvm.org/D29241 llvm-svn: 293440
* [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.Craig Topper2017-01-291-5/+0
| | | | | | It's operation already exists manually in many places without using the method. llvm-svn: 293421
* [DAGCombiner] Use unsigned for a constant vector index instead of APInt.Craig Topper2017-01-291-2/+2
| | | | | | The type system requires that the number of vector elements should fit in 32-bits so this should be safe. llvm-svn: 293414
* [DAGCombiner] Remove unnecessary check on the size of the type of the index ↵Craig Topper2017-01-291-3/+1
| | | | | | | | of EXTRACT_SUBVECTOR. The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue. llvm-svn: 293413
* [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before ↵Craig Topper2017-01-291-9/+9
| | | | | | trying to use getConstantOperandVal. llvm-svn: 293412
* Add support to dump dot graph block layout after MBPXinliang David Li2017-01-294-5/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D29141 llvm-svn: 293408
* Use print() instead of dump() in codeMatthias Braun2017-01-282-2/+8
| | | | llvm-svn: 293371
* [RegisterBankInfo] Emit proper type for remapped registers.Quentin Colombet2017-01-281-3/+25
| | | | | | | | | | | | | | | | When the OperandsMapper creates virtual registers, it used to just create plain scalar register with the right size. This may confuse the instruction selector because we lose the information of the instruction using those registers what supposed to do. The MachineVerifier complains about that already. With this patch, the OperandsMapper still creates plain scalar register, but the expectation is for the mapping function to remap the type properly. The default mapping function has been updated to do that. rdar://problem/30231850 llvm-svn: 293362
* Cleanup dump() functions.Matthias Braun2017-01-2828-67/+113
| | | | | | | | | | | | | | | | | | We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359
* [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".Quentin Colombet2017-01-281-0/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In r292621, the recommit fixes a bug related with live interval update after the partial redundent copy is moved. This recommit solves an additional bug related to the lack of update of subranges. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 Re-apply r292621 Revert "Revert rL292621. Caused some internal build bot failures in apple." This reverts commit r292984. Original patch: Wei Mi <wmi@google.com> Subrange fix: Mostly Matthias Braun <matze@braunis.de> llvm-svn: 293353
* Fix memory leak in globalisel.Evgeniy Stepanov2017-01-281-0/+2
| | | | | | | | | | | | | | | | |     #0 0x89cdeb in operator new[](unsigned long) /code/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:84:37     #1 0x4ec87c4 in llvm::RegisterBankInfo::ValueMapping const* llvm::RegisterBankInfo::getOperandsMapping<llvm::RegisterBankInfo::ValueMapping const* const*>(llvm::RegisterBankInfo::ValueMapping const* const*, llvm::RegisterBankInfo::ValueMapping const* const*) const /code/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:297:9     #2 0x9327ee in llvm::AArch64RegisterBankInfo::getInstrMapping(llvm::MachineInstr const&) const /code/llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp:540:30     #3 0x4eb8d07 in llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:546:24     #4 0x4eb9dd2 in llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:624:12     #5 0x3141875 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /code/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:13     #6 0x396128d in llvm::FPPassManager::runOnFunction(llvm::Function&) /code/llvm/lib/IR/LegacyPassManager.cpp:1513:27     #7 0x3961832 in llvm::FPPassManager::runOnModule(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1534:16     #8 0x3962540 in runOnModule /code/llvm/lib/IR/LegacyPassManager.cpp:1590:27     #9 0x3962540 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1693     #10 0x8ae368 in compileModule(char**, llvm::LLVMContext&) /code/llvm/tools/llc/llc.cpp:562:8     #11 0x8a7a1b in main /code/llvm/tools/llc/llc.cpp:316:22 llvm-svn: 293351
* GlobalISel: don't leak super-entry BB when merging with IR-level one.Tim Northover2017-01-271-0/+1
| | | | | | | We have to delete the block manually or it leaks. That triggers failures in -fsanitize=leak bots (unsurprisingly), which should be fixed by this patch. llvm-svn: 293347
* GlobalISel: set correct regclass for LOAD_STACK_GUARD.Tim Northover2017-01-271-0/+2
| | | | | | | Since it's not actually a generic MI, its register operands need a RegClass, which is conveniently the target's pointer RegClass. llvm-svn: 293335
* GlobalISel: mark incoming landing-pad registers as live.Tim Northover2017-01-271-0/+2
| | | | | | Should fix machine verifier failures. llvm-svn: 293334
* ScheduleDAGInstrs: Do not try to toggle kill flags on debug usesMatthias Braun2017-01-271-0/+3
| | | | | | | | Preparation for upcoming changes. No testcase as none of the public targets bundles early enough and has a post machine scheduler enabled at the same time. The error is also easily catched by asserts. llvm-svn: 293324
* ScheduleDAGInstrs: Cleanup toggleKillFlag(); NFCMatthias Braun2017-01-271-11/+10
| | | | llvm-svn: 293323
* ScheduleDAGInstrs: Cleanup; NFCMatthias Braun2017-01-271-69/+45
| | | | | | Comment, doxygen and a bit of whitespace cleanup. llvm-svn: 293322
* [CodeGenPrep]No negative cost in the ExtLd promotionJun Bum Lim2017-01-271-1/+4
| | | | | | | | | | | | | | Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument. Reviewers: mcrosier, jmolloy, qcolombet, javed.absar Reviewed By: mcrosier, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28871 llvm-svn: 293307
* [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().Jonas Paulsson2017-01-271-0/+9
| | | | | | | | | | | | | In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a partial number of available vector elements), WidenVecRes_Convert() used to resort to scalarization. This patch adds a handling of the (common) case where an input vector can be found of same width as the widened result vector, by converting the node to SIGN/ZERO_EXTEND_VECTOR_INREG. Review: Eli Friedman llvm-svn: 293268
* GlobalISel: support debug intrinsics.Tim Northover2017-01-262-5/+115
| | | | | | | | The translation scheme is mostly cribbed from FastISel, and it's not entirely convincing semantically. But it does seem to work in the common cases and allow variables to be printed so it can't be all wrong. llvm-svn: 293228
* Add intrinsics for constrained floating point operationsAndrew Kaylor2017-01-263-0/+108
| | | | | | | | | | | | | | This commit introduces a set of experimental intrinsics intended to prevent optimizations that make assumptions about the rounding mode and floating point exception behavior. These intrinsics will later be extended to specify flush-to-zero behavior. More work is also required to model instruction dependencies in machine code and to generate these instructions from clang (when required by pragmas and/or command line options that are not currently supported). Differential Revision: https://reviews.llvm.org/D27028 llvm-svn: 293226
* [IfConversion] Use reverse_iterator to simplify. NFCKyle Butt2017-01-261-70/+35
| | | | | | This simplifies skipping debug instructions and shrinking ranges. llvm-svn: 293202
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-01-262-396/+370
| | | | | | | | UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-01-262-370/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
* [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting ↵Craig Topper2017-01-261-0/+8
| | | | | | undef subvectors. llvm-svn: 293152
* New OptimizationRemarkEmitter pass for MIRAdam Nemet2017-01-254-0/+191
| | | | | | | | | | | | | | | | | This allows MIR passes to emit optimization remarks with the same level of functionality that is available to IR passes. It also hooks up the greedy register allocator to report spills. This allows for interesting use cases like increasing interleaving on a loop until spilling of registers is observed. I still need to experiment whether reporting every spill scales but this demonstrates for now that the functionality works from llc using -pass-remarks*=<pass>. Differential Revision: https://reviews.llvm.org/D29004 llvm-svn: 293110
* SDag: fix how initial loads are formed when splitting vector ops.Tim Northover2017-01-251-1/+4
| | | | | | | | Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. llvm-svn: 293088
* GlobalISel: rework getOrCreateVReg to avoid double lookup. NFC.Tim Northover2017-01-251-20/+20
| | | | | | Thanks to Quentin for suggesting the refactoring. llvm-svn: 293087
* DebugInfo: remove unused parameter from function. NFC.Tim Northover2017-01-252-4/+2
| | | | | | | I think it's a hold-over from some previous iteration, but it's never set to true in LLVM as it exists now. llvm-svn: 293086
* Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFCKrzysztof Parzyszek2017-01-255-32/+16
| | | | llvm-svn: 293077
* Revert "Do not verify dominator tree if it has no roots"Chad Rosier2017-01-251-4/+0
| | | | | | | This reverts commit r293033, per Danny's comment. In short, we require domtrees to have roots at all times. llvm-svn: 293075
* Fix buildbot failures introduced by 293036Artur Pilipenko2017-01-251-2/+5
| | | | | | Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039
* [DAGCombiner] Match load by bytes idiom and fold it into a single load. ↵Artur Pilipenko2017-01-251-0/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt #2. The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 *a = ... i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24) => i32 val = *((i32)a) i8 *a = ... i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3] => i32 val = BSWAP(*((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036
* Do not verify dominator tree if it has no rootsSerge Pavlov2017-01-251-0/+4
| | | | | | | | | | | If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033
* DAG: Recognize no-signed-zeros-fp-math attributeMatt Arsenault2017-01-251-1/+2
| | | | | | | | clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
* GlobalISel: Fix typo in error messageJustin Bogner2017-01-251-1/+1
| | | | llvm-svn: 293023
OpenPOWER on IntegriCloud