summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/Mips
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-2612-219/+220
| | | | | | | | UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-2512-220/+219
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
* Recommit "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-242-3/+167
| | | | | | | | | | | | | | | | | | | | | | This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134
* Revert "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-241-11/+3
| | | | | | This reverts r296132. I forgot to include the tests. llvm-svn: 296133
* [mips] Fix atomic compare and swap at O0.Simon Dardis2017-02-241-3/+11
| | | | | | | | | | | | | | | | | | | | Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296132
* [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64elSagar Thakur2017-02-152-0/+178
| | | | | | | | | Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit. Reviewed by sdardis, dberris Differential: D27697 llvm-svn: 295164
* [LLC] Add an inline assembly diagnostics handler.Sanne Wouda2017-02-031-1/+1
| | | | | | | | | | | | | | | | Summary: llc would hit a fatal error for errors in inline assembly. The diagnostics message is now printed. Reviewers: rengolin, MatzeB, javed.absar, anemet Reviewed By: anemet Subscribers: jyknight, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D29408 llvm-svn: 293999
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-0212-219/+220
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-0212-220/+219
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* CodeGen: Allow small copyable blocks to "break" the CFG.Kyle Butt2017-01-312-3/+4
| | | | | | | | | | | When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716
* [mips] Recommit: "N64 static relocation model support"Simon Dardis2017-01-2737-212/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes one change to GOT handling and two changes to N64's relocation model handling. Furthermore, the jumptable encodings have been corrected for static N64. Big GOT handling is now done via a new SDNode MipsGotHi - this node is unconditionally lowered to an lui instruction. The first change to N64's relocation handling is the lifting of the restriction that N64 always uses PIC. Now it is possible to target static environments. The second change adds support for 64 bit symbols and enables them by default. Previously N64 had patterns for sym32 mode only. In this mode all symbols are assumed to have 32 bit addresses. sym32 mode support is selectable with attribute 'sym32'. A follow on patch for clang will add the necessary frontend parameter. This partially resolves PR/23485. Thanks to Brooks Davis for reporting the issue! This version corrects a "Conditional jump or move depends on uninitialised value(s)" error detected by valgrind present in the original commit. Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D23652 llvm-svn: 293279
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-01-2611-212/+212
| | | | | | | | UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-01-2611-212/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
* Revert "[mips] N64 static relocation model support"Simon Dardis2017-01-2637-282/+212
| | | | | | This reverts commit r293164. There are multiple tests failing. llvm-svn: 293170
* [mips] N64 static relocation model supportSimon Dardis2017-01-2637-212/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes one change to GOT handling and two changes to N64's relocation model handling. Furthermore, the jumptable encodings have been corrected for static N64. Big GOT handling is now done via a new SDNode MipsGotHi - this node is unconditionally lowered to an lui instruction. The first change to N64's relocation handling is the lifting of the restriction that N64 always uses PIC. Now it is possible to target static environments. The second change adds support for 64 bit symbols and enables them by default. Previously N64 had patterns for sym32 mode only. In this mode all symbols are assumed to have 32 bit addresses. sym32 mode support is selectable with attribute 'sym32'. A follow on patch for clang will add the necessary frontend parameter. This partially resolves PR/23485. Thanks to Brooks Davis for reporting the issue! Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D23652 llvm-svn: 293164
* Fix some broken CHECK lines.Benjamin Kramer2017-01-221-2/+2
| | | | | | The colon is important. llvm-svn: 292761
* Reverted: Track validity of pass resultsSerge Pavlov2017-01-151-1/+1
| | | | | | Commits r291882 and related r291887. llvm-svn: 292062
* Track validity of pass resultsSerge Pavlov2017-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Running tests with expensive checks enabled exhibits some problems with verification of pass results. First, the pass verification may require results of analysis that are not available. For instance, verification of loop info requires results of dominator tree analysis. A pass may be marked as conserving loop info but does not need to be dependent on DominatorTreePass. When a pass manager tries to verify that loop info is valid, it needs dominator tree, but corresponding analysis may be already destroyed as no user of it remained. Another case is a pass that is skipped. For instance, entities with linkage available_externally do not need code generation and such passes are skipped for them. In this case result verification must also be skipped. To solve these problems this change introduces a special flag to the Pass structure to mark passes that have valid results. If this flag is reset, verifications dependent on the pass result are skipped. Differential Revision: https://reviews.llvm.org/D27190 llvm-svn: 291882
* Revert "CodeGen: Allow small copyable blocks to "break" the CFG."Kyle Butt2017-01-117-20/+17
| | | | | | | | | This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
* CodeGen: Allow small copyable blocks to "break" the CFG.Kyle Butt2017-01-107-17/+20
| | | | | | | | | | | When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609
* DAG: Avoid OOB when legalizing vector indexingMatt Arsenault2017-01-101-1/+2
| | | | | | | | | If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604
* [mips] Fix Mips MSA instrinsicsSimon Dardis2017-01-102-0/+2957
| | | | | | | | | | | | | | | | The usage of some MIPS MSA instrinsics that took immediates could crash LLVM during lowering. This patch addresses that behaviour. Crucially this patch also makes the use of intrinsics with out of range immediates as producing an internal error. The ld,st instrinsics would trigger an assertion failure for MIPS64 as their lowering would attempt to add an i32 offset to a i64 pointer. Reviewers: vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D25438 llvm-svn: 291571
* [mips] Honour -mno-odd-spreg for vector splat (again)Simon Dardis2017-01-101-0/+55
| | | | | | | | | | | | | | | | | | | | Previous the lowering of FILL_FW would use the MSA128W register class when performing a vector splat. Instead it should be honouring -mno-odd-spreg and only use the even registers when performing a splat from word to vector register. Logical follow-on from r230235. This fixes PR/31369. A previous commit was missing the test case and had another differential in it. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D28373 llvm-svn: 291566
* [mips] Fix compact branch hazard detection, part 2Petar Jovanovic2016-12-221-0/+34
| | | | | | | | | | | Follow up to D27209 fix, this patch now properly handles single transient instruction in basic block. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D27856 llvm-svn: 290361
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-1411-212/+212
| | | | | | | | | | UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-1411-212/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659
* [mips] Fix compact branch hazard detectionSimon Dardis2016-12-131-0/+158
| | | | | | | | | | | | | In certain cases it is possible that transient instructions such as %reg = IMPLICIT_DEF as a single instruction in a basic block to reach the MipsHazardSchedule pass. This patch teaches MipsHazardSchedule to properly look through such cases. Reviewers: vkalintiris, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D27209 llvm-svn: 289529
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-12-0911-212/+212
| | | | | | | | UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-12-0911-212/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221
* [mips] Change gnueabi to gnu in the triple because EABI has been removed ↵Simon Atanasyan2016-12-081-1/+1
| | | | | | recently. NFC llvm-svn: 289114
* [mips] Remove N32 Android test because Android does not support N32 ABI. NFCSimon Atanasyan2016-12-081-2/+0
| | | | llvm-svn: 289113
* Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"Nirav Dave2016-11-281-13/+15
| | | | | | This reverts commit r287773 which caused issues with ppc64le builds. llvm-svn: 288035
* [DAG] Improve loads-from-store forwarding to handle TokenFactorNirav Dave2016-11-231-15/+13
| | | | | | | | | | | | | Forward store values to matching loads down through token factors. Factored from D14834. Reviewers: jyknight, hfinkel Subscribers: hfinkel, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D26080 llvm-svn: 287773
* [mips] Add tests for half precision floating point support.Simon Dardis2016-11-212-0/+1175
| | | | | | These should have been part of r287349. llvm-svn: 287574
* [mips] Restrict tail call optimizationSimon Dardis2016-11-207-162/+138
| | | | | | | | | | | | The tail call optimization was being used without proper consideration of ABI requirements for saving and restoring the GP. This patch restricts tail call optimization to functions within the same translation unit. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D24763 llvm-svn: 287505
* [mips] Fix unsigned/signed type errorSimon Dardis2016-11-161-0/+18
| | | | | | | | | | | | | | | MipsFastISel uses a a class to represent addresses with a signed member to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress all treated the offset as being positive. In cases where the offset was actually negative and a frame pointer was used, this would cause the constant synthesis routine to crash as it would generate an unexpected instruction sequence when frame indexes are replaced. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D26192 llvm-svn: 287099
* [mips] Renable small data section test.Simon Dardis2016-11-081-16/+8
| | | | llvm-svn: 286230
* [mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass.Vasileios Kalintiris2016-10-271-0/+14
| | | | | | | | | | | | | | r282428 added the MipsOptimizePICCall as an opt-in pass that can be skipped when using the -opt-bisect-limit option. However, this pass is needed because it generates code that conforms to the o32 ABI specification by using the $t9 register for PIC calls with JALR instructions. This bug was exposed by the fact that skipFunction() also checks for the "optnone" attribute. This caused functions with that attribute to break the requirements of the o32 ABI. llvm-svn: 285305
* [DAG] optimize negation of boolSanjay Patel2016-10-197-66/+99
| | | | | | | | | | | | | | | | Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611
* [mips][FastISel] Instantiate the MipsFastISel class only for targets that ↵Vasileios Kalintiris2016-10-183-5/+33
| | | | | | | | | | | | | | | | | | support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel *only* for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 llvm-svn: 284475
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-10-1310-201/+202
| | | | | | | | | UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-10-1310-202/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151
* [mips][fastisel] Consider soft-float an unsupported floating point modeSimon Dardis2016-10-041-0/+11
| | | | | | | | | | | Treat soft-float as unsupported for fast-isel. Additionally, ensure we check that lowering f32 arguments also considers the case of soft-float mode. Reviewers: ehostunreach, vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24505 llvm-svn: 283209
* Set some tests to an unknown vendor and OSMatthias Braun2016-10-034-7/+7
| | | | | | | | This avoids llc using the hosts OS/vendor as defaults and triggering unwanted behaviour in the tests. This should deal with the buildbot breakages on windows after r283140. llvm-svn: 283149
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-09-2810-201/+202
| | | | | | | | UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-09-2810-202/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600
* [mips] Disable tail calls temporarilySimon Dardis2016-09-279-54/+54
| | | | | | | | | | Disable tail calls while the remaining bugs are fixed. Enable only for tests. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24912 llvm-svn: 282487
* [mips] LLVM PR/30197 - Tail call incorrectly clobbers arguments for mipsSimon Dardis2016-09-211-0/+53
| | | | | | | | | | | | | | | | | The postRA scheduler performs alias analysis to determine if stores and loads can moved past each other. When a function has more arguments than argument registers for the calling convention used, excess arguments are spilled onto the stack. LLVM by default assumes that argument slots are immutable, unless the function contains a tail call. Without the knowledge of that a function contains a tail call site, stores and loads to fixed stack slots may be re-ordered causing the out-going arguments to clobber the incoming arguments before the incoming arguments are supposed to be dead. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24077 llvm-svn: 282063
* Revert "[mips] Fix c.<cc>.<fmt> instruction definition."Simon Dardis2016-09-092-57/+85
| | | | | | | This reverts commit r281022. Mips buildbot broke, due to unhandled register class FCC. llvm-svn: 281033
* [mips] Fix c.<cc>.<fmt> instruction definition.Simon Dardis2016-09-092-85/+57
| | | | | | | | | | | | | | | As part of this effort, remove MipsFCmp nodes and use tablegen patterns rather than custom lowering through C++. Unexpectedly, this improves codesize for microMIPS as previous floating point setcc expansions would materialize 0 and 1 into GPRs before using the relevant mov[tf].[sd] instruction. Now $zero is used directly. Reviewers: dsanders, vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D23118 llvm-svn: 281022
OpenPOWER on IntegriCloud