summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [SystemZ] Extend the use of C(L)GFRRichard Sandiford2013-12-061-2/+19
| | | | | | | instcombine prefers to put extended operands first, so this patch handles that case for C(L)GFR. llvm-svn: 196579
* [SystemZ] Optimize selects between 0 and -1Richard Sandiford2013-12-061-14/+44
| | | | | | | | | | | | | Since z has no setcc instruction as such, the choice of setBooleanContents is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent, so we produced a branch-free form when selecting between 0 and 1, but not when selecting between 0 and -1. This patch handles the latter case too. At some point I'd like to measure whether it's better to use conditional moves for constant selects on z196, but that's future work. llvm-svn: 196578
* [SystemZ] Fix choice of known-zero mask in insertion optimizationRichard Sandiford2013-12-031-4/+4
| | | | | | | | | | | | | | The backend converts 64-bit ORs into subreg moves if the upper 32 bits of one operand and the low 32 bits of the other are known to be zero. It then tries to peel away redundant ANDs from the upper 32 bits. Since AND masks are canonicalized to exclude known-zero bits, the test ORs the mask and the known-zero bits together before checking for redundancy. The problem was that it was using the wrong node when checking for known-zero bits, so could drop ANDs that were still needed. llvm-svn: 196267
* [SystemZ] Handle vectors in getSetCCResultTypeRichard Sandiford2013-11-061-2/+7
| | | | | | | I don't have a standalone testcase for this, but it should allow r193676 to be reapplied. llvm-svn: 194148
* [SystemZ] Improve handling of SETCCRichard Sandiford2013-10-161-2/+114
| | | | | | | | We previously used the default expansion to SELECT_CC, which in turn would expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based sequence instead. llvm-svn: 192784
* Add missing #include's to cctype when using isdigit/alpha/etc.Will Dietz2013-10-121-0/+2
| | | | llvm-svn: 192519
* [SystemZ] Extend pseudo conditional 8- and 16-bit stores to high wordsRichard Sandiford2013-10-011-0/+8
| | | | | | As the comment says, we always want to use STOC for 32-bit stores. llvm-svn: 191767
* [SystemZ] Optimize 32-bit FPR<->GPR moves for z196 and aboveRichard Sandiford2013-10-011-7/+18
| | | | | | | | Floats are stored in the high 32 bits of an FPR, and the only GPR<->FPR transfers are full-register transfers. This patch optimizes GPR<->FPR float transfers when the high word of a GPR is directly accessible. llvm-svn: 191764
* [SystemZ] Allow integer AND involving high wordsRichard Sandiford2013-10-011-12/+12
| | | | llvm-svn: 191762
* [SystemZ] Allow integer XOR involving high wordsRichard Sandiford2013-10-011-2/+2
| | | | llvm-svn: 191759
* [SystemZ] Allow integer OR involving high wordsRichard Sandiford2013-10-011-6/+6
| | | | llvm-svn: 191755
* [SystemZ] Allow selects with a high-word destinationRichard Sandiford2013-10-011-0/+1
| | | | llvm-svn: 191751
* [SystemZ] Use upper words of GR64s for codegenRichard Sandiford2013-10-011-2/+10
| | | | | | | | | | | | | | This just adds the basics necessary for allocating the upper words to virtual registers (move, load and store). The move support is parameterised in a way that makes it easy to handle zero extensions, but the associated zero-extend patterns are added by a later patch. The easiest way of testing this seemed to be add a new "h" register constraint for high words. I don't expect the constraint to be useful in real inline asms, but it should work, so I didn't try to hide it behind an option. llvm-svn: 191739
* [SystemZ] Rename subregs and add subreg_h32Richard Sandiford2013-09-301-9/+9
| | | | | | | | | | | | | Use subreg_hNN and subreg_lNN for the high and low NN bits of a register. List the low registers first, so that subreg_l32 also means the low 32 bits of a 128-bit register. Floats are stored in the upper 32 bits of a 64-bit register, so they should use subreg_h32 rather than subreg_l32. No behavioral change intended. llvm-svn: 191659
* [SystemZ] Rename 32-bit GPR registersRichard Sandiford2013-09-301-5/+5
| | | | | | | | I'm about to add support for high-word operations, so it seemed better for the low-word registers to have names like R0L rather than R0W. No behavioral change intended. llvm-svn: 191655
* [SystemZ] Improve handling of PC-relative addressesRichard Sandiford2013-09-271-10/+11
| | | | | | | | | | | | | | The backend previously folded offsets into PC-relative addresses whereever possible. That's the right thing to do when the address can be used directly in a PC-relative memory reference (using things like LRL). But if we have a register-based memory reference and need to load the PC-relative address separately, it's better to use an anchor point that could be shared with other accesses to the same area of the variable. Fixes a FIXME. llvm-svn: 191524
* [SystemZ] Define the GR64 low-word logic instructions as pseudo aliases.Richard Sandiford2013-09-251-46/+46
| | | | | | | | Another patch to avoid duplication of encoding information. Things like NILF, NILL and NILH are used as both 32-bit and 64-bit instructions. Here the 64-bit versions are defined as aliases of the 32-bit ones. llvm-svn: 191369
* [SystemZ] Use subregs for 64-bit truncating storesRichard Sandiford2013-09-251-12/+0
| | | | | | | | Another patch to reduce the duplication of encoding information. Rather than define separate patterns for truncating 64-bit stores, use the 32-bit stores with a subreg. No behavioral changed intended. llvm-svn: 191365
* [SystemZ] Use getTarget{Insert,Extract}Subreg rather than getMachineNodeRichard Sandiford2013-09-131-19/+9
| | | | | | Just a clean-up, no behavioral change intended. llvm-svn: 190673
* [SystemZ] Try to fold shifts into TMxxRichard Sandiford2013-09-131-20/+57
| | | | | | E.g. "SRL %r2, 2; TMLL %r2, 1" => "TMLL %r2, 4". llvm-svn: 190672
* [SystemZ] Add TM and TMYRichard Sandiford2013-09-101-6/+8
| | | | | | | | | | | | | | | | | | | | | | | The main complication here is that TM and TMY (the memory forms) set CC differently from the register forms. When the tested bits contain some 0s and some 1s, the register forms set CC to 1 or 2 based on the value the uppermost bit. The memory forms instead set CC to 1 regardless of the uppermost bit. Until now, I've tried to make it so that a branch never tests for an impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the result will only test for 0 or 1. Originally I'd tried to do the same thing for TM and TMY by using custom matching code in ISelDAGToDAG. That ended up being very ugly though, and would have meant duplicating some of the chain checks that the common isel code does. I've therefore gone for the simpler alternative of adding an extra operand to the TM DAG opcode to say whether a memory form would be OK. This means that the inverse of a "TM;JE" is "TM;JNE" rather than the more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE". I suppose that's arguably less confusing though... llvm-svn: 190400
* [SystemZ] Tweak integer comparison codeRichard Sandiford2013-09-061-87/+63
| | | | | | | | | | | | | | | | | | | The architecture has many comparison instructions, including some that extend one of the operands. The signed comparison instructions use sign extensions and the unsigned comparison instructions use zero extensions. In cases where we had a free choice between signed or unsigned comparisons, we were trying to decide at lowering time which would best fit the available instructions, taking things like extension type into account. The code to do that was getting increasingly hairy and was also making some bad decisions. E.g. when comparing the result of two LLCs, it is better to use CR rather than CLR, since CR can be fused with a branch while CLR can't. This patch removes the lowering code and instead adds an operand to integer comparisons to say whether signed comparison is required, whether unsigned comparison is required, or whether either is OK. We can then leave the choice of instruction up to the normal isel code. llvm-svn: 190138
* [SystemZ] Add NC, OC and XCRichard Sandiford2013-09-051-0/+15
| | | | | | | For now these are just used to handle scalar ANDs, ORs and XORs in which all operands are memory. llvm-svn: 190041
* [SystemZ] Add support for TMHH, TMHL, TMLH and TMLLRichard Sandiford2013-09-031-8/+103
| | | | | | | | | For now this just handles simple comparisons of an ANDed value with zero. The CC value provides enough information to do any comparison for a 2-bit mask, and some nonzero comparisons with more populated masks, but that's all future work. llvm-svn: 189819
* [SystemZ] Add support for TMHH, TMHL, TMLH and TMLLRichard Sandiford2013-08-281-10/+51
| | | | | | | | | For now just handles simple comparisons of an ANDed value with zero. The CC value provides enough information to do any comparison for a 2-bit mask, and some nonzero comparisons with more populated masks, but that's all future work. llvm-svn: 189469
* [SystemZ] Extend memcmp support to all constant lengthsRichard Sandiford2013-08-281-17/+66
| | | | | | This uses the infrastructure added for memcpy and memmove in r189331. llvm-svn: 189458
* [SystemZ] Extend memcpy and memset support to all constant lengthsRichard Sandiford2013-08-271-21/+147
| | | | | | | | | | | | | | Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs. Lengths above that threshold use a loop to handle X*256 bytes followed by a single MVC to handle the excess (if any). This loop will also be needed in future when support for variable lengths is added. Because the same tablegen classes are used to define MVC and CLC, the patch also has the side-effect of defining a pseudo loop instruction for CLC. That instruction isn't used yet (and wouldn't be handled correctly if it were). I'm planning to use it soon though. llvm-svn: 189331
* [SystemZ] Add basic prefetch supportRichard Sandiford2013-08-231-0/+26
| | | | | | Just the instructions and intrinsics for now. llvm-svn: 189100
* [SystemZ] Try reversing comparisons whose first operand is in memoryRichard Sandiford2013-08-231-0/+71
| | | | | | This allows us to make more use of the many compare reg,mem instructions. llvm-svn: 189099
* [SystemZ] Define remainig *MUL_LOHI patternsRichard Sandiford2013-08-211-16/+72
| | | | | | | | | | | | | | | | | The initial port used MLG(R) for i64 UMUL_LOHI but left the other three combinations as not-legal-or-custom. Although 32x32->{32,32} multiplications exist, they're not as quick as doing a normal 64-bit multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI would be useful. There's also no direct instruction for i64 SMUL_LOHI, so it needs to be implemented in terms of UMUL_LOHI. However, not defining these patterns means that we don't convert division by a constant into multiplication, so this patch fills in the other cases. The new i64 SMUL_LOHI sequence is simpler than the one that we used previously for 64x64->128 multiplication, so int-mul-08.ll now tests the full sequence. llvm-svn: 188898
* [SystemZ] Use FI[EDX]BRA for codegenRichard Sandiford2013-08-211-0/+9
| | | | llvm-svn: 188895
* [SystemZ] Use SRST to optimize memchrRichard Sandiford2013-08-201-3/+4
| | | | | | | | | | | | | | | | | | | SystemZTargetLowering::emitStringWrapper() previously loaded the character into R0 before the loop and made R0 live on entry. I'd forgotten that allocatable registers weren't allowed to be live across blocks at this stage, and it confused LiveVariables enough to cause a miscompilation of f3 in memchr-02.ll. This patch instead loads R0 in the loop and leaves LICM to hoist it after RA. This is actually what I'd tried originally, but I went for the manual optimisation after noticing that R0 often wasn't being hoisted. This bug forced me to go back and look at why, now fixed as r188774. We should also try to optimize null checks so that they test the CC result of the SRST directly. The select between null and the SRST GPR result could then usually be deleted as dead. llvm-svn: 188779
* [SystemZ] Add support for sibling callsRichard Sandiford2013-08-191-15/+70
| | | | | | | | | | | | | | | | | | This first cut is pretty conservative. The final argument register (R6) is call-saved, so we would need to make sure that the R6 argument to a sibling call is the same as the R6 argument to the calling function, which seems worth keeping as a separate patch. Saying that integer truncations are free means that we no longer use the extending instructions LGF and LLGF for spills in int-conv-09.ll and int-conv-10.ll. Instead we treat the registers as 64 bits wide and truncate them to 32-bits where necessary. I think it's unlikely we'd use LGF and LLGF for spills in other situations for the same reason, so I'm removing the tests rather than replacing them. The associated code is generic and applies to many more instructions than just LGF and LLGF, so there is no corresponding code removal. llvm-svn: 188669
* [SystemZ] Use SRST to implement strlen and strnlenRichard Sandiford2013-08-161-0/+3
| | | | | | It would also make sense to use it for memchr; I'm working on that now. llvm-svn: 188547
* [SystemZ] Use MVST to implement strcpy and stpcpyRichard Sandiford2013-08-161-0/+3
| | | | llvm-svn: 188546
* [SystemZ] Use CLST to implement strcmpRichard Sandiford2013-08-161-0/+63
| | | | llvm-svn: 188544
* [SystemZ] Use CLC and IPM to implement memcmpRichard Sandiford2013-08-121-4/+8
| | | | | | | For now this is restricted to fixed-length comparisons with a length in the range [1, 256], as for memcpy() and MVC. llvm-svn: 188163
* [SystemZ] Add a definition of the CLC instructionRichard Sandiford2013-08-121-0/+1
| | | | llvm-svn: 188162
* [SystemZ] Reuse CC results for integer comparisons with zeroRichard Sandiford2013-08-011-1/+2
| | | | | | | | | | This also fixes a bug in the predication of LR to LOCR: I'd forgotten that with these in-place instruction builds, the implicit operands need to be added manually. I think this was latent until now, but is tested by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by int-cmp-45.c. llvm-svn: 187573
* [SystemZ] Prefer comparisons with zeroRichard Sandiford2013-08-011-1/+25
| | | | | | | Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own, but it exposes more opportunities for CC reuse (the next patch). llvm-svn: 187571
* [SystemZ] Implement isLegalAddressingMode()Richard Sandiford2013-07-311-0/+15
| | | | | | | | | | The loop optimizers were assuming that scales > 1 were OK. I think this is actually a bug in TargetLoweringBase::isLegalAddressingMode(), since it seems to be trying to reject anything that isn't r+i or r+r, but it has no default case for scales other than 0, 1 or 2. Implementing the hook for z means that z can no longer test any change there though. llvm-svn: 187497
* [SystemZ] Be more careful about inverting CC masksRichard Sandiford2013-07-311-26/+39
| | | | | | | | | | | | | | | | | | | | | | | | System z branches have a mask to select which of the 4 CC values should cause the branch to be taken. We can invert a branch by inverting the mask. However, not all instructions can produce all 4 CC values, so inverting the branch like this can lead to some oddities. For example, integer comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater). If an integer EQ is reversed to NE before instruction selection, the branch will test for 1 or 2. If instead the branch is reversed after instruction selection (by inverting the mask), it will test for 1, 2 or 3. Both are correct, but the second isn't really canonical. This patch therefore keeps track of which CC values are possible and uses this when inverting a mask. Although this is mostly cosmestic, it fixes undefined behavior for the CIJNLH in branch-08.ll. Another fix would have been to mask out bit 0 when generating the fused compare and branch, but the point of this patch is that we shouldn't need to do that in the first place. The patch also makes it easier to reuse CC results from other instructions. llvm-svn: 187495
* [SystemZ] Move compare-and-branch generation even laterRichard Sandiford2013-07-311-14/+9
| | | | | | | | | | | | | | | | | | | | | | | r187116 moved compare-and-branch generation from the instruction-selection pass to the peephole optimizer (via optimizeCompare). It turns out that even this is a bit too early. Fused compare-and-branch instructions don't interact well with predication, where a CC result is needed. They also make it harder to reuse the CC side-effects of earlier instructions (not yet implemented, but the subject of a later patch). Another problem was that the AnalyzeBranch family of routines weren't handling compares and branches, so we weren't able to reverse the fused form in cases where we would reverse a separate branch. This could have been fixed by extending AnalyzeBranch, but given the other problems, I've instead moved the fusing to the long-branch pass, which is also responsible for the opposite transformation: splitting out-of-range compares and branches into separate compares and long branches. I've added a test for the AnalyzeBranch problem. A test for the predication problem is included in the next patch, which fixes a bug in the choice of CC mask. llvm-svn: 187494
* [SystemZ] Rework compare and branch supportRichard Sandiford2013-07-251-54/+2
| | | | | | | | | | | | | | Before the patch we took advantage of the fact that the compare and branch are glued together in the selection DAG and fused them together (where possible) while emitting them. This seemed to work well in practice. However, fusing the compare so early makes it harder to remove redundant compares in cases where CC already has a suitable value. This patch therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of functions instead. No behavioral change intended, but it paves the way for a later patch. llvm-svn: 187116
* [SystemZ] Add STOC and STOCGRichard Sandiford2013-07-251-24/+38
| | | | | | | | These instructions are allowed to trap even if the condition is false, so for now they are only used for "*ptr = (cond ? x : *ptr)"-style constructs. llvm-svn: 187111
* Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵Craig Topper2013-07-141-3/+3
| | | | | | size. llvm-svn: 186274
* [SystemZ] Fix parsing of inline asm registersRichard Sandiford2013-07-121-0/+42
| | | | | | | | | | | | GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly because the name-to-regno mapping depends on the value type and (because of that) the internal names in RegStrings are not the same as the AsmName. CC constraints like "{cc}" didn't work either because there was no associated register class. llvm-svn: 186148
* AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and allStephen Lin2013-07-091-0/+20
| | | | | | | | | | | | | | | | | | | | | | | in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956
* [SystemZ] Use "STC;MVC" for memsetRichard Sandiford2013-07-091-0/+8
| | | | | | | | | | | | Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet small enough for a single MVC. As with memcpy, I'm leaving longer cases till later. The number of tests might seem excessive, but f33 & f34 from memset-04.ll failed the first cut because I'd not added the "?:" on the calculation of Size1. llvm-svn: 185918
* [SystemZ] Remove unwanted part from last commitRichard Sandiford2013-07-081-2/+0
| | | | | | | I was originally going to use MVC for memmove too, but that's less of a clear win. Remove some accidental left-overs in the previous commit. llvm-svn: 185804
OpenPOWER on IntegriCloud