summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* [PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass.Sean Fertile2017-07-051-0/+38
| | | | | | | Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D34829 llvm-svn: 307180
* [Power9] Exploit vector extract with variable index.Tony Jiang2017-07-051-0/+167
| | | | | | | | | | | | | | | | This patch adds the exploitation for new power 9 instructions which extract variable elements from vectors: VEXTUBLX VEXTUBRX VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX Differential Revision: https://reviews.llvm.org/D34032 Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) llvm-svn: 307174
* [Power9] Exploit vector integer extend instructions when indices aren't correct.Tony Jiang2017-07-051-28/+225
| | | | | | | | | | | | | | | This patch adds on to the exploitation added by https://reviews.llvm.org/D33510. This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine. Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) Differential Revision: https://reviews.llvm.org/D34009 llvm-svn: 307169
* Add the missing triple to the test case added as part of r307120.Nemanja Ivanovic2017-07-051-1/+1
| | | | llvm-svn: 307122
* [PowerPC] Fix for PR33636Nemanja Ivanovic2017-07-051-0/+702
| | | | | | | | Remove casts to a constant when a node can be an undef. Differential Revision: https://reviews.llvm.org/D34808 llvm-svn: 307120
* [PowerPC] auto-generate check lines; NFCSanjay Patel2017-06-301-70/+58
| | | | | | | | | | | The existing check lines were more flexible, but these are small enough tests that there shouldn't be much question about register allocation. I've been hand-modifying this file as I change the CGP memcmp expansion, but that's more error-prone and time-consuming than just running the update script. llvm-svn: 306861
* [PowerPC] fix potential verification error on __tls_get_addrHiroshi Inoue2017-06-292-0/+131
| | | | | | | | | | This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed. Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP. Differential Revision: https://reviews.llvm.org/D34347 llvm-svn: 306678
* [CGP] add specialization for memcmp expansion with only one basic blockSanjay Patel2017-06-272-26/+18
| | | | llvm-svn: 306485
* [CGP] eliminate a sub instruction in memcmp expansionSanjay Patel2017-06-273-65/+54
| | | | | | | | | | | | | | | | | | | | | | As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471
* [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix ↵Hiroshi Inoue2017-06-271-0/+24
| | | | | | | | | | | | assetion failure When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities. This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable. Differential Revision: https://reviews.llvm.org/D34679 llvm-svn: 306404
* [PowerPC] fix incorrect processor name for -mcpu in a test caseHiroshi Inoue2017-06-271-1/+1
| | | | | | to surpress warnings. ppc970 should be 970 (or g5) llvm-svn: 306380
* [PowerPC] set optimization level in SelectionDAGISelHiroshi Inoue2017-06-274-45/+45
| | | | | | | | | PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level. This patch makes the PowerPC backend set the optimization level correctly. Differential Revision: https://reviews.llvm.org/D34615 llvm-svn: 306367
* [SelectionDAG] set dereferenceable flag when expanding memcpy/memmoveHiroshi Inoue2017-06-241-0/+74
| | | | | | | | | | When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable. This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer. This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure. Differential Revision: https://reviews.llvm.org/D34467 llvm-svn: 306209
* Update constants in complex-return test to prevent reduction to smaller ↵Nirav Dave2017-06-241-1/+1
| | | | | | constants llvm-svn: 306192
* [PowerPC] define target hook isReallyTriviallyReMaterializable()Lei Huang2017-06-211-0/+179
| | | | | | | | | | | Define target hook isReallyTriviallyReMaterializable() to explicitly specify PowerPC instructions that are trivially rematerializable. This will allow the MachineLICM pass to accurately identify PPC instructions that should always be hoisted. Differential Revision: https://reviews.llvm.org/D34255 llvm-svn: 305932
* RegisterScavenging: Followup to r305625Matthias Braun2017-06-203-23/+23
| | | | | | | | | | | | | This does some improvements/cleanup to the recently introduced scavengeRegisterBackwards() functionality: - Rewrite findSurvivorBackwards algorithm to use the existing LiveRegUnit::accumulateBackward() code. This also avoids the Available and Candidates bitset and just need 1 LiveRegUnit instance (= 1 bitset). - Pick registers in allocation order instead of register number order. llvm-svn: 305817
* [CGP, PowerPC] try to constant fold before creating loads for memcmp expansionSanjay Patel2017-06-191-25/+8
| | | | | | | | | | | | | | | This is the last step needed to avoid regressions for x86 before we flip the switch to allow expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings, so we need to do that here too. FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1 constant tests because its alignment requirements are too strict (shouldn't require alignment for a constant operand). Differential Revision: https://reviews.llvm.org/D34071 llvm-svn: 305734
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2017-06-172-7/+64
| | | | | | | | | | | | | | | | | | | Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse to place spills as the very first instruciton of a basic block and thus artifically increase pressure (test in test/CodeGen/PowerPC/scavenging.mir:spill_at_begin) This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305625
* Revert "RegScavenging: Add scavengeRegisterBackwards()"Matthias Braun2017-06-162-7/+7
| | | | | | | | | Revert because of reports of some PPC input starting to spill when it was predicted that it wouldn't and no spillslot was reserved. This reverts commit r305516. llvm-svn: 305566
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2017-06-152-7/+7
| | | | | | | | | | | | | | | | | | Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64 problems reported in the stage2 build last time, which I cannot reproduce right now. This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305516
* [MachineLICM] Hoist TOC-based address instructionsLei Huang2017-06-151-0/+110
| | | | | | | | | | | | | | | | | | Add condition for MachineLICM to safely hoist instructions that utilize non constant registers that are reserved. On PPC, global variable access is done through the table of contents (TOC) which is always in register X2. The ABI reserves this register in any functions that have calls or access global variables. A call through a function pointer involves saving, changing and restoring this register around the call and thus MachineLICM does not consider it to be invariant. We can however guarantee the register is preserved across the call and thus is invariant. Differential Revision: https://reviews.llvm.org/D33562 llvm-svn: 305490
* [PowerPC] fix potential verification errors on CFENCE8Hiroshi Inoue2017-06-153-12/+12
| | | | | | | | This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D34208 llvm-svn: 305479
* Revert r304907 as it is causing some failures that I cannot reproduce.Nemanja Ivanovic2017-06-145-567/+6
| | | | | | Reverting this until a test case can be provided to aid the investigation. llvm-svn: 305372
* [PowerPC] Match vec_revb builtins to P9 instructions.Tony Jiang2017-06-121-0/+54
| | | | | | | | | | | | Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214
* [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.Tony Jiang2017-06-121-0/+263
| | | | | | | | | | | Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210
* [PowerPC] add memcmp test with one constant operand and equality cmp; NFCSanjay Patel2017-06-091-3/+29
| | | | llvm-svn: 305131
* [CGP] don't expand a memcmp with nobuiltin attributeSanjay Patel2017-06-081-6/+15
| | | | | | | | | | | | | | | | This matches the behavior used in the SDAG when expanding memcmp. For reference, we're intentionally treating the earlier fortified call transforms differently after: https://bugs.llvm.org/show_bug.cgi?id=23093 https://reviews.llvm.org/rL233776 One motivation for not transforming nobuiltin calls is that it can interfere with sanitizers: https://reviews.llvm.org/D19781 https://reviews.llvm.org/D19801 Differential Revision: https://reviews.llvm.org/D34043 llvm-svn: 305007
* [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64Guozhi Wei2017-06-082-14/+33
| | | | | | | | | | In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 305001
* [Power9] Exploit vector integer extend instructionsZaara Syeda2017-06-081-0/+90
| | | | | | | | | | | | | | This patch adds build vector patterns to exploit the vector integer extend instructions: vextsb2w - Vector Extend Sign Byte To Word vextsb2d - Vector Extend Sign Byte To Doubleword vextsh2w - Vector Extend Sign Halfword To Word vextsh2d - Vector Extend Sign Halfword To Doubleword vextsw2d - Vector Extend Sign Word To Doubleword Differential Revision: https://reviews.llvm.org/D33510 llvm-svn: 304992
* [PowerPC] add memcmp test with nobuiltin attr; NFCSanjay Patel2017-06-081-0/+15
| | | | | | | | | | | In SDAG, we don't expand libcalls with a nobuiltin attribute. It's not clear if that's correct from the existing code comment: "Don't do the check if marked as nobuiltin for some reason." ...adding a test here either way to show that there is currently a different behavior implemented in the CGP-based expansion. llvm-svn: 304991
* [CGP / PowerPC] avoid multi-block overhead for simple memcmp expansionSanjay Patel2017-06-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test diff for PowerPC shows we can better optimize if this case is one block. For x86, there's would be a substantial difference if CGP expansion was enabled because branches are assumed cheap and SDAG can't optimize across blocks. Instead of this: _cmp_eq8: movq (%rdi), %rax cmpq (%rsi), %rax je LBB23_1 ## BB#2: ## %res_block movl $1, %ecx jmp LBB23_3 LBB23_1: xorl %ecx, %ecx LBB23_3: ## %endblock xorl %eax, %eax testl %ecx, %ecx sete %al retq We get this: cmp_eq8: movq (%rdi), %rcx xorl %eax, %eax cmpq (%rsi), %rcx sete %al retq And that matches the optimal codegen that we get from the current expansion in SelectionDAGBuilder::visitMemCmpCall(). If this looks right, then I just need to confirm that vector-sized expansion will work from here, and we can enable CGP memcmp() expansion for x86. Ie, we'll bypass the power-of-2 special cases currently optimized in SDAG because we can lower the IR produced here optimally. Differential Revision: https://reviews.llvm.org/D34005 llvm-svn: 304987
* [CGP] avoid zext/trunc of a memcmp expansion compareSanjay Patel2017-06-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This could be viewed as another shortcoming of the DAGCombiner: when both operands of a compare are zexted from the same source type, we should be able to compare the original types. The effect on PowerPC perf is likely unnoticeable, but there's a visible regression for x86 if we feed the suboptimal IR for memcmp expansion to the DAG: _cmp_eq4_zexted_to_i64: movl (%rdi), %ecx movl (%rsi), %edx xorl %eax, %eax cmpq %rdx, %rcx sete %al _cmp_eq4_better: movl (%rdi), %ecx xorl %eax, %eax cmpl (%rsi), %ecx sete %al llvm-svn: 304923
* [PowerPC] Eliminate integer compare instructions - vol. 5Nemanja Ivanovic2017-06-075-6/+567
| | | | | | | | Adds handling for i64 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33720 llvm-svn: 304907
* [PowerPC] Eliminate integer compare instructions - vol. 3Nemanja Ivanovic2017-06-079-39/+798
| | | | | | | | Adds handling for i32 SETNE comparison (both sign and zero extended). Differential Revision: https://reviews.llvm.org/D33718 llvm-svn: 304901
* [CGP / PowerPC] use direct compares if there's only one load per block in ↵Sanjay Patel2017-06-071-20/+14
| | | | | | | | | | | | | | | | | memcmp() expansion I'd like to enable CGP memcmp expansion for x86, but the output from CGP would regress the special cases (memcmp(x,y,N) != 0 for N=1,2,4,8,16,32 bytes) that we already handle. I'm not sure if we'll actually be able to produce the optimal code given the block-at-a-time limitation in the DAG. We might have to just avoid those special-cases here in CGP. But regardless of that, I think this is a win for the more general cases. http://rise4fun.com/Alive/cbQ Differential Revision: https://reviews.llvm.org/D33963 llvm-svn: 304849
* [PowerPC] auto-generate full checks and increase test coverageSanjay Patel2017-06-061-77/+160
| | | | | | | 3 of the tests were testing exactly the same thing: memcmp(x, y, 16) != 0. I changed that to test 4, 7, and 16 bytes, so we can see how those differ. llvm-svn: 304838
* RegisterScavenging: Add ScavengerTest passMatthias Braun2017-06-021-0/+149
| | | | | | | | | This pass allows to run the register scavenging independently of PrologEpilogInserter to allow targeted testing. Also adds some basic register scavenging tests. llvm-svn: 304606
* [PowerPC] Correctly specify the cache line size for Power 7, 8 and 9.Sean Fertile2017-05-311-0/+49
| | | | | | | | | | Fixes PPCTTIImpl::getCacheLineSize() returning the wrong cache line size for newer ppc processors. Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D33656 llvm-svn: 304317
* [PPC] Inline expansion of memcmpZaara Syeda2017-05-313-0/+402
| | | | | | | | | | | | | | | This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313
* [PowerPC] Fix a performance bug for PPC::XXPERMDI.Tony Jiang2017-05-311-0/+307
| | | | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance. Differential Revision: https://reviews.llvm.org/D33404 llvm-svn: 304298
* [PowerPC] Eliminate integer compare instructions - vol. 3Nemanja Ivanovic2017-05-316-9/+605
| | | | | | | | | This patch builds upon https://reviews.llvm.org/rL302810 to add handling for the 64-bit SETEQ patterns. Differential Revision: https://reviews.llvm.org/D33369 llvm-svn: 304286
* [PowerPC] Eliminate integer compare instructions - vol. 2Nemanja Ivanovic2017-05-311-0/+69
| | | | | | | | | | | | This patch builds upon https://reviews.llvm.org/rL302810 to add handling for bitwise logical operations in general purpose registers. The idea is to keep the values in GPRs as long as possible - only extracting them to a condition register bit when no further operations are to be done. Differential Revision: https://reviews.llvm.org/D31851 llvm-svn: 304282
* [AntiDepBreaker] Revert r299124 and add a test.Tim Shen2017-05-301-330/+0
| | | | | | | | | | | | | | | | | Summary: AntiDepBreaker intends to add all live-outs, including the implicit CSRs, in StartBlock. r299124 was done without understanding that intention. Now with the live-ins propagated correctly (D32464), we can revert this change. Reviewers: MatzeB, qcolombet Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33697 llvm-svn: 304251
* LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEIMatthias Braun2017-05-261-0/+52
| | | | | | | | | | | | | | | | | Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal addPristines() function must be called on an empty set or it may accidentally reset saved registers. - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 304001
* Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI"Matthias Braun2017-05-261-52/+0
| | | | | | | | | | Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960
* Test for r303938Matthias Braun2017-05-261-0/+52
| | | | llvm-svn: 303954
* [PPC] Fix atomics lowering in DAG lowering.Tim Shen2017-05-251-0/+23
| | | | | | | | | | | I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931
* [PowerPC] Fix a performance bug for PPC::XXSLDWI.Tony Jiang2017-05-244-37/+344
| | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822
* P9: D-form vector load/store. Differential Revision: ↵Zaara Syeda2017-05-2410-209/+209
| | | | | | https://reviews.llvm.org/D33248 llvm-svn: 303780
* SummaryHiroshi Inoue2017-05-211-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a | b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500
OpenPOWER on IntegriCloud