summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/SystemZ
Commit message (Collapse)AuthorAgeFilesLines
* [SystemZ] Extend memcmp support to all constant lengthsRichard Sandiford2013-08-282-4/+96
| | | | | | This uses the infrastructure added for memcpy and memmove in r189331. llvm-svn: 189458
* [SystemZ] Extend memcpy and memset support to all constant lengthsRichard Sandiford2013-08-275-29/+224
| | | | | | | | | | | | | | Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs. Lengths above that threshold use a loop to handle X*256 bytes followed by a single MVC to handle the excess (if any). This loop will also be needed in future when support for variable lengths is added. Because the same tablegen classes are used to define MVC and CLC, the patch also has the side-effect of defining a pseudo loop instruction for CLC. That instruction isn't used yet (and wouldn't be handled correctly if it were). I'm planning to use it soon though. llvm-svn: 189331
* [SystemZ] Add basic prefetch supportRichard Sandiford2013-08-231-0/+87
| | | | | | Just the instructions and intrinsics for now. llvm-svn: 189100
* [SystemZ] Try reversing comparisons whose first operand is in memoryRichard Sandiford2013-08-2318-2/+430
| | | | | | This allows us to make more use of the many compare reg,mem instructions. llvm-svn: 189099
* [SystemZ] Prefer LHI;ST... over LAY;MV...Richard Sandiford2013-08-236-48/+49
| | | | | | | | | | | | | | | | | | | If we had a store of an integer to memory, and the integer and store size were suitable for a form of MV..., we used MV... no matter what. We could then have sequences like: lay %r2, 0(%r3,%r4) mvi 0(%r2), 4 In these cases it seems better to force the constant into a register and use a normal store: lhi %r2, 4 stc %r2, 0(%r3, %r4) since %r2 is more likely to be hoisted and is easier to rematerialize. llvm-svn: 189098
* Turn MipsOptimizeMathLibCalls into a target-independent scalar transformRichard Sandiford2013-08-232-1/+31
| | | | | | | | | | ...so that it can be used for z too. Most of the code is the same. The only real change is to use TargetTransformInfo to test when a sqrt instruction is available. The pass is opt-in because at the moment it only handles sqrt. llvm-svn: 189097
* [SystemZ] Define remainig *MUL_LOHI patternsRichard Sandiford2013-08-212-3/+63
| | | | | | | | | | | | | | | | | The initial port used MLG(R) for i64 UMUL_LOHI but left the other three combinations as not-legal-or-custom. Although 32x32->{32,32} multiplications exist, they're not as quick as doing a normal 64-bit multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI would be useful. There's also no direct instruction for i64 SMUL_LOHI, so it needs to be implemented in terms of UMUL_LOHI. However, not defining these patterns means that we don't convert division by a constant into multiplication, so this patch fills in the other cases. The new i64 SMUL_LOHI sequence is simpler than the one that we used previously for 64x64->128 multiplication, so int-mul-08.ll now tests the full sequence. llvm-svn: 188898
* [SystemZ] Use FI[EDX]BRA for codegenRichard Sandiford2013-08-212-6/+315
| | | | llvm-svn: 188895
* [SystemZ] Use SRST to optimize memchrRichard Sandiford2013-08-202-0/+78
| | | | | | | | | | | | | | | | | | | SystemZTargetLowering::emitStringWrapper() previously loaded the character into R0 before the loop and made R0 live on entry. I'd forgotten that allocatable registers weren't allowed to be live across blocks at this stage, and it confused LiveVariables enough to cause a miscompilation of f3 in memchr-02.ll. This patch instead loads R0 in the loop and leaves LICM to hoist it after RA. This is actually what I'd tried originally, but I went for the manual optimisation after noticing that R0 often wasn't being hoisted. This bug forced me to go back and look at why, now fixed as r188774. We should also try to optimize null checks so that they test the CC result of the SRST directly. The select between null and the SRST GPR result could then usually be deleted as dead. llvm-svn: 188779
* Fix test typo and add usual "br %r14" testRichard Sandiford2013-08-201-1/+2
| | | | llvm-svn: 188775
* Fix overly pessimistic shortcut in post-RA MachineLICMRichard Sandiford2013-08-201-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers and TermRegs. When it sees a definition of R it adds all aliases of R to the corresponding set, so that when it needs to test for membership it only needs to test a single register, rather than worrying about aliases there too. E.g. the final candidate loop just has: unsigned Def = Candidates[i].Def; if (!PhysRegClobbers.test(Def) && ...) { to test whether register Def is multiply defined. However, there was also a shortcut in ProcessMI to make sure we didn't add candidates if we already knew that they would fail the final test. This shortcut was more pessimistic than the final one because it checked whether _any alias_ of the defined register was multiply defined. This is too conservative for targets that define register pairs. E.g. on z, R0 and R1 are sometimes used as a pair, so there is a 128-bit register that aliases both R0 and R1. If a loop used R0 and R1 independently, and the definition of R0 came first, we would be able to hoist the R0 assignment (because that used the final test quoted above) but not the R1 assignment (because that meant we had two definitions of the paired R0/R1 register and would fail the shortcut in ProcessMI). This patch just uses the same check for the ProcessMI shortcut as we use in the final candidate loop. llvm-svn: 188774
* [SystemZ] Add negative integer absolute (load negative)Richard Sandiford2013-08-191-0/+91
| | | | | | | | For now this matches the equivalent of (neg (abs ...)), which did hit a few times in projects/test-suite. We should probably also match cases where absolute-like selects are used with reversed arguments. llvm-svn: 188671
* [SystemZ] Add integer absolute (load positive)Richard Sandiford2013-08-191-0/+83
| | | | llvm-svn: 188670
* [SystemZ] Add support for sibling callsRichard Sandiford2013-08-193-154/+125
| | | | | | | | | | | | | | | | | | This first cut is pretty conservative. The final argument register (R6) is call-saved, so we would need to make sure that the R6 argument to a sibling call is the same as the R6 argument to the calling function, which seems worth keeping as a separate patch. Saying that integer truncations are free means that we no longer use the extending instructions LGF and LLGF for spills in int-conv-09.ll and int-conv-10.ll. Instead we treat the registers as 64 bits wide and truncate them to 32-bits where necessary. I think it's unlikely we'd use LGF and LLGF for spills in other situations for the same reason, so I'm removing the tests rather than replacing them. The associated code is generic and applies to many more instructions than just LGF and LLGF, so there is no corresponding code removal. llvm-svn: 188669
* [SystemZ] Use SRST to implement strlen and strnlenRichard Sandiford2013-08-162-0/+78
| | | | | | It would also make sense to use it for memchr; I'm working on that now. llvm-svn: 188547
* [SystemZ] Use MVST to implement strcpy and stpcpyRichard Sandiford2013-08-161-0/+50
| | | | llvm-svn: 188546
* [SystemZ] Use CLST to implement strcmpRichard Sandiford2013-08-162-0/+142
| | | | llvm-svn: 188544
* [SystemZ] Fix handling of 64-bit memcmp resultsRichard Sandiford2013-08-162-1/+136
| | | | | | | | | | | | | Generalize r188163 to cope with return types other than MVT::i32, just as the existing visitMemCmpCall code did. I've split this out into a subroutine so that it can be used for other upcoming patches. I also noticed that I'd used the wrong API to record the out chain. It's a load that uses DAG.getRoot() rather than getRoot(), so the out chain should go on PendingLoads. I don't have a testcase for that because we don't do any interesting scheduling on z yet. llvm-svn: 188540
* [SystemZ] Fix sign of integer memcmp resultRichard Sandiford2013-08-161-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r188163 used CLC to implement memcmp. Code that compares the result directly against zero can test the CC value produced by CLC, but code that needs an integer result must use IPM. The sequence I'd used was: ipm <reg> sll <reg>, 2 sra <reg>, 30 but I'd forgotten that this inverts the order, so that CC==1 ("less") becomes an integer greater than zero, and CC==2 ("greater") becomes an integer less than zero. This sequence should only be used if the CLC arguments are reversed to compensate. The problem then is that the branch condition must also be reversed when testing the CLC result directly. Rather than do that, I went for a different sequence that works with the natural CLC order: ipm <reg> srl <reg>, 28 rll <reg>, <reg>, 31 One advantage of this is that it doesn't clobber CC. A disadvantage is that any sign extension to 64 bits must be done separately, rather than being folded into the shifts. llvm-svn: 188538
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-161-2/+0
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* [SystemZ] Use CLC and IPM to implement memcmpRichard Sandiford2013-08-121-0/+134
| | | | | | | For now this is restricted to fixed-length comparisons with a length in the range [1, 256], as for memcpy() and MVC. llvm-svn: 188163
* [SystemZ] Optimize floating-point comparisons with zeroRichard Sandiford2013-08-071-0/+348
| | | | | | | | | This follows the same lines as the integer code. In the end it seemed easier to have a second 4-bit mask in TSFlags to specify the compare-like CC values. That eats one more TSFlags bit than adding a CCHasUnordered would have done, but it feels more concise. llvm-svn: 187883
* [SystemZ] Add floating-point load-and-test instructionsRichard Sandiford2013-08-073-0/+39
| | | | | | These instructions can also be used as comparisons with zero. llvm-svn: 187882
* [SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequencesRichard Sandiford2013-08-053-1/+237
| | | | | | | | | | | | | | | | This patch just uses a peephole test for "add; compare; branch" sequences within a single block. The IR optimizers already convert loops to decrement-and-branch-on-nonzero form in some cases, so even this simplistic test triggers many times during a clang bootstrap and projects/test-suite run. It looks like there are still cases where we need to more strongly prefer branches on nonzero though. E.g. I saw a case where a loop that started out with a check for 0 ended up with a check for -1. I'll try to look at that sometime. I ended up adding the Reference class because MachineInstr::readsRegister() doesn't check for subregisters (by design, as far as I could tell). llvm-svn: 187723
* [SystemZ] Use LOAD AND TEST to eliminate comparisons against zeroRichard Sandiford2013-08-051-0/+223
| | | | llvm-svn: 187720
* [SystemZ] Reuse CC results for integer comparisons with zeroRichard Sandiford2013-08-012-0/+691
| | | | | | | | | | This also fixes a bug in the predication of LR to LOCR: I'd forgotten that with these in-place instruction builds, the implicit operands need to be added manually. I think this was latent until now, but is tested by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by int-cmp-45.c. llvm-svn: 187573
* [SystemZ] Prefer comparisons with zeroRichard Sandiford2013-08-015-10/+54
| | | | | | | Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own, but it exposes more opportunities for CC reuse (the next patch). llvm-svn: 187571
* [SystemZ] Implement isLegalAddressingMode()Richard Sandiford2013-07-311-0/+25
| | | | | | | | | | The loop optimizers were assuming that scales > 1 were OK. I think this is actually a bug in TargetLoweringBase::isLegalAddressingMode(), since it seems to be trying to reject anything that isn't r+i or r+r, but it has no default case for scales other than 0, 1 or 2. Implementing the hook for z means that z can no longer test any change there though. llvm-svn: 187497
* [SystemZ] Be more careful about inverting CC masks (conditional loads)Richard Sandiford2013-07-312-14/+14
| | | | | | | | Extend r187495 to conditional loads. I split this out because the easiest way seemed to be to force a particular operand order in SystemZISelDAGToDAG.cpp. llvm-svn: 187496
* [SystemZ] Be more careful about inverting CC masksRichard Sandiford2013-07-3147-124/+149
| | | | | | | | | | | | | | | | | | | | | | | | System z branches have a mask to select which of the 4 CC values should cause the branch to be taken. We can invert a branch by inverting the mask. However, not all instructions can produce all 4 CC values, so inverting the branch like this can lead to some oddities. For example, integer comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater). If an integer EQ is reversed to NE before instruction selection, the branch will test for 1 or 2. If instead the branch is reversed after instruction selection (by inverting the mask), it will test for 1, 2 or 3. Both are correct, but the second isn't really canonical. This patch therefore keeps track of which CC values are possible and uses this when inverting a mask. Although this is mostly cosmestic, it fixes undefined behavior for the CIJNLH in branch-08.ll. Another fix would have been to mask out bit 0 when generating the fused compare and branch, but the point of this patch is that we shouldn't need to do that in the first place. The patch also makes it easier to reuse CC results from other instructions. llvm-svn: 187495
* [SystemZ] Move compare-and-branch generation even laterRichard Sandiford2013-07-311-0/+45
| | | | | | | | | | | | | | | | | | | | | | | r187116 moved compare-and-branch generation from the instruction-selection pass to the peephole optimizer (via optimizeCompare). It turns out that even this is a bit too early. Fused compare-and-branch instructions don't interact well with predication, where a CC result is needed. They also make it harder to reuse the CC side-effects of earlier instructions (not yet implemented, but the subject of a later patch). Another problem was that the AnalyzeBranch family of routines weren't handling compares and branches, so we weren't able to reverse the fused form in cases where we would reverse a separate branch. This could have been fixed by extending AnalyzeBranch, but given the other problems, I've instead moved the fusing to the long-branch pass, which is also responsible for the opposite transformation: splitting out-of-range compares and branches into separate compares and long branches. I've added a test for the AnalyzeBranch problem. A test for the predication problem is included in the next patch, which fixes a bug in the choice of CC mask. llvm-svn: 187494
* [SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress()Richard Sandiford2013-07-3129-431/+446
| | | | | | | | | | | | | | | | | | | | | | r186399 aggressively used the RISBG instruction for immediate ANDs, both because it can handle some values that AND IMMEDIATE can't, and because it allows the destination register to be different from the source. I realized later while implementing the distinct-ops support that it would be better to leave the choice up to convertToThreeAddress() instead. The AND IMMEDIATE form is shorter and is less likely to be cracked. This is a problem for 32-bit ANDs because we assume that all 32-bit operations will leave the high word untouched, whereas RISBG used in this way will either clear the high word or copy it from the source register. The patch uses the z196 instruction RISBLG for this instead. This means that z10 will be restricted to NILL, NILH and NILF for 32-bit ANDs, but I think that should be OK for now. Although we're using z10 as the base architecture, the optimization work is going to be focused more on z196 and zEC12. llvm-svn: 187492
* [SystemZ] Rework compare and branch supportRichard Sandiford2013-07-251-0/+22
| | | | | | | | | | | | | | Before the patch we took advantage of the fact that the compare and branch are glued together in the selection DAG and fused them together (where possible) while emitting them. This seemed to work well in practice. However, fusing the compare so early makes it harder to remove redundant compares in cases where CC already has a suitable value. This patch therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of functions instead. No behavioral change intended, but it paves the way for a later patch. llvm-svn: 187116
* [SystemZ] Add LOCR and LOCGRRichard Sandiford2013-07-251-0/+25
| | | | llvm-svn: 187113
* [SystemZ] Add LOC and LOCGRichard Sandiford2013-07-252-0/+260
| | | | | | | As with the stores, these instructions can trap when the condition is false, so they are only used for things like (cond ? x : *ptr). llvm-svn: 187112
* [SystemZ] Add STOC and STOCGRichard Sandiford2013-07-254-2/+312
| | | | | | | | These instructions are allowed to trap even if the condition is false, so for now they are only used for "*ptr = (cond ? x : *ptr)"-style constructs. llvm-svn: 187111
* [SystemZ] Add tests for ALHSIK and ALGHSIKRichard Sandiford2013-07-191-0/+71
| | | | | | | The insn definitions themselves crept into r186689, sorry. This should be the last of the distinct-ops instructions. llvm-svn: 186690
* [SystemZ] Add ALRK, AGLRK, SLRK and SGLRKRichard Sandiford2013-07-195-4/+50
| | | | | | | Follows the same lines as r186686, but much more limited, since we only use ADD LOGICAL for multi-i64 additions. llvm-svn: 186689
* [SystemZ] Add AHIK and AGHIKRichard Sandiford2013-07-192-0/+134
| | | | | | | I did these as a separate patch because it uses a slightly different form of RIE layout. llvm-svn: 186687
* [SystemZ] Add ARK, AGRK, SRK and SGRKRichard Sandiford2013-07-1910-8/+90
| | | | | | The testsuite changes follow the same lines as for r186683. llvm-svn: 186686
* [SystemZ] Add NGRK, OGRK and XGRKRichard Sandiford2013-07-1910-7/+64
| | | | | | Like r186683, but for 64 bits. llvm-svn: 186685
* [SystemZ] Add NRK, ORK and XRKRichard Sandiford2013-07-1910-7/+73
| | | | | | | | | | | | The atomic tests assume the two-operand forms, so I've restricted them to z10. Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why using convertToThreeAddress() is better than exposing the three-operand forms first and then converting back to two operands where possible (which is what I'd originally tried). Using the three-operand form first stops us from taking advantage of NG, OG and XG for spills. llvm-svn: 186683
* [SystemZ] Use SLLK, SRLK and SRAK for codegenRichard Sandiford2013-07-191-0/+63
| | | | | | This patch uses the instructions added in r186680 for codegen. llvm-svn: 186681
* [SystemZ] Use RNSBGRichard Sandiford2013-07-181-0/+257
| | | | | | This should be the last of the R.SBG patches for now. llvm-svn: 186573
* [SystemZ] Generalize RxSBG SRA caseRichard Sandiford2013-07-181-0/+38
| | | | | | | | | The original code only folded SRA into ROTATE ... SELECTED BITS if there was no outer shift. This patch splits out that check and generalises it slightly. The extra cases aren't really that interesting, but this is paving the way for RNSBG support. llvm-svn: 186571
* [SystemZ] Use RXSBGRichard Sandiford2013-07-181-0/+112
| | | | | | Extend the previous R.SBG patches to handle XORs. llvm-svn: 186570
* [SystemZ] Use ROSBG and non-zero form of RISBG for OR nodesRichard Sandiford2013-07-162-0/+203
| | | | llvm-svn: 186405
* [SystemZ] Use RISBG for (shift (and ...))Richard Sandiford2013-07-161-1/+153
| | | | | | | Another patch in the series to make more use of R.SBG. This one extends r186072 and r186073 to handle cases where the AND is inside the shift. llvm-svn: 186399
* Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to ↵Stephen Lin2013-07-14314-2901/+2901
| | | | | | | | | | | | | | | | | | | | | | | | | | function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name "*.ll" | \ while read NAME; do echo "$NAME" if ! grep -q "^; *RUN: *llc.*debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \ while read FUNC; do sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280
* [SystemZ] Add test missing from r186148Richard Sandiford2013-07-121-0/+82
| | | | | | Sigh, twice in two days sorry. One day I'll remember... llvm-svn: 186150
OpenPOWER on IntegriCloud