summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/SystemZ
Commit message (Collapse)AuthorAgeFilesLines
* [SystemZ] Bugfix in emitSelect()Jonas Paulsson2020-02-121-0/+43
| | | | | | | | | | | | | | When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352 (cherry picked from commit 0311e28e9cc01a244faa774b8cab337b45404fa9)
* [FPEnv] Fix chain handling regression after 04a8696Ulrich Weigand2020-01-141-3/+61
| | | | | | | | | | | | | | | | | | | Code in getRoot made the assumption that every node in PendingLoads must always itself have a dependency on the current DAG root node. After the changes in 04a8696, it turns out that this assumption no longer holds true, causing wrong codegen in some cases (e.g. stores after constrained FP intrinsics might get deleted). To fix this, we now need to make sure that the TokenFactor created by getRoot always includes the previous root, if there is no implicit dependency already present. The original getControlRoot code already has exactly this check, so this patch simply reuses that code now for getRoot as well. This fixes the regression. NFC if no constrained FP intrinsic is present.
* [FPEnv] Fix chain handling for fpexcept.strict nodesUlrich Weigand2020-01-132-274/+278
| | | | | | | | | | | | | | | | | We need to ensure that fpexcept.strict nodes are not optimized away even if the result is unused. To do that, we need to chain them into the block's terminator nodes, like already done for PendingExcepts. This patch adds two new lists of pending chains, PendingConstrainedFP and PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without and with fpexcept.strict markers. This allows not only to solve the above problem, but also to relax chains a bit further by no longer flushing all FP nodes before a store or other memory access. (They are still flushed before nodes with other side effects.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D72341
* [SystemZ] Fix matching another pattern for nxgrk (PR44496)Ulrich Weigand2020-01-091-0/+26
| | | | | | | | SystemZDAGToDAGISel::Select will attempt to split logical instruction with a large immediate constant. This must not happen if the result matches one of the z15 combined operations, so the code checks for those. However, one of them was missed, causing invalid code to be generated in the test case for PR44496.
* [SystemZ] Extend fp-strict-alias test caseUlrich Weigand2020-01-071-14/+85
| | | | Explicitly add test for fpexcept.maytrap intrinsics.
* [SystemZ] Fix python failure in test caseUlrich Weigand2020-01-071-1/+1
| | | | | With recent Python the Large/spill-02.py test failed with an error: TypeError: can't multiply sequence by non-int of type 'float'
* [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.Jonas Paulsson2020-01-022-71/+84
| | | | | | | For consistency with GCC, the target label is moved to the brcl itself instead of the next instruction. Review: Ulrich Weigand
* Migrate function attribute "no-frame-pointer-elim"="false" to ↵Fangrui Song2019-12-249-11/+11
| | | | "frame-pointer"="none" as cleanups after D56351
* Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵Fangrui Song2019-12-241-1/+1
| | | | as cleanups after D56351
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-232-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* [SystemZ] Add a mapping from "select register" to "load on condition" (2-addr).Jonas Paulsson2019-12-202-2/+56
| | | | | | | | | | | | | | | | | | The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux) instructions whenever one of the sources are the same reg as dest. By adding this mapping in getTwoOperandOpcode(), we get: - Two-address hints in getRegAllocationHints() for select register instructions. - No need anymore for special handling in SystemZShortenInst.cpp - shortenSelect() removed. The two-address hints are now added before the GRX32 hints, which should be preferred. Review: Ulrich Weigand https://reviews.llvm.org/D68870
* [SystemZ] Bugfix and improve the handling of CC values.Jonas Paulsson2019-12-205-44/+379
| | | | | | | | | | | | | | | | | It was recently discovered that the handling of CC values was actually broken since overflow was not properly handled ('nsw' flag not checked for). Add and sub instructions now have a new target specific instruction flag named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used instead of a compare with 0, but only if the instruction has the 'nsw' flag set. This patch also adds the improvements of conversion to logical instructions and the analyzing of add with immediates, to be able to eliminate more compares. Review: Ulrich Weigand https://reviews.llvm.org/D66868
* [SystemZ][FPEnv] Enable strict vector FP extends/truncationsUlrich Weigand2019-12-202-12/+70
| | | | | | | | | | | | | | The back-end currently has special DAGCombine code to detect cases where two floating-point extend or truncate operations can be combined into a single vector operation. This patch extends that support to also handle strict FP operations. Note that currently only the case where both operations have the same input chain are supported. This already suffices to cover the common case where the operations result from scalarizing a non-legal vector type. More general cases can be supported in the future.
* [SystemZ] Recognize mrecord-mcount in backendJonas Paulsson2019-12-192-0/+42
| | | | | | | Emit the __mcount_loc section for all fentry calls. Review: Ulrich Weigand https://reviews.llvm.org/D71629
* [FPEnv] Strict versions of llvm.minimum/llvm.maximumUlrich Weigand2019-12-182-0/+136
| | | | | | | | | | | | | Add new intrinsics llvm.experimental.constrained.minimum llvm.experimental.constrained.maximum as strict versions of llvm.minimum and llvm.maximum. Includes SystemZ back-end support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71624
* [Clang FE, SystemZ] Don't add "true" value for the "mnop-mcount" attribute.Jonas Paulsson2019-12-182-3/+2
| | | | | | | | Let the "mnop-mcount" function attribute simply be present or non-present. Update SystemZ backend as well to use hasFnAttribute() instead. Review: Ulrich Weigand https://reviews.llvm.org/D71669
* [FPEnv] Remove unnecessary rounding mode argument for constrained intrinsicsUlrich Weigand2019-12-178-180/+92
| | | | | | | | | | | | | | | | | | | The following intrinsics currently carry a rounding mode metadata argument: llvm.experimental.constrained.minnum llvm.experimental.constrained.maxnum llvm.experimental.constrained.ceil llvm.experimental.constrained.floor llvm.experimental.constrained.round llvm.experimental.constrained.trunc This is not useful since the semantics of those intrinsics do not in any way depend on the rounding mode. In similar cases, other constrained intrinsics do not have the rounding mode argument. Remove it here as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71218
* [SystemZ][FPEnv] Back-end support for STRICT_[SU]INT_TO_FPUlrich Weigand2019-12-178-5/+421
| | | | | | | | | As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP, so this patch adds SystemZ back-end support as well. The patch is SystemZ target specific except for adding SD patterns strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td as usual.
* [SystemZ] Improve verification of MachineOperands.Jonas Paulsson2019-12-161-0/+72
| | | | | | | | | | | Now that the machine verifier will check for cases of register/immediate MachineOperands and their correspondence to the MC instruction descriptor, this patch adds the operand types to the descriptors where they were previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled to get a known type, except for G_... (global isel) instructions. Review: Ulrich Weigand https://reviews.llvm.org/D71494
* [SystemZ] Implement the packed stack layoutJonas Paulsson2019-12-127-186/+274
| | | | | | | | | Any llvm function with the "packed-stack" attribute will be compiled to use the packed stack layout which reuses unused parts of the incoming register save area. This is needed for building the Linux kernel. Review: Ulrich Weigand https://reviews.llvm.org/D70821
* [SystemZ] Add llvm.minimum / llvm.maximum testsUlrich Weigand2019-12-112-32/+148
| | | | | The backend already supports the @llvm.minimum and @llvm.maximum intrinsics, but we had no test cases for those. Add tests.
* [SystemZ] Fix 128-bit strict FMA expansion pre-z14Ulrich Weigand2019-12-113-0/+123
| | | | | | | | | | | | | Before z14, we did not have any FMA instruction for 128-bit floating-point, so the @llvm.fma.f128 intrinsic needs to be expanded to a libcall on those platforms. This worked correctly for regular FMA, but was implemented incorrectly for the strict version. This was not noticed because we did not have test coverage for this case. This patch fixes that incorrect expansion and adds the missing test cases.
* [FPEnv] Constrained FCmp intrinsicsUlrich Weigand2019-12-0718-0/+4814
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for constrained floating-point comparison intrinsics. Specifically, we add: declare <ty2> @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) declare <ty2> @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) The first variant implements an IEEE "quiet" comparison (i.e. we only get an invalid FP exception if either argument is a SNaN), while the second variant implements an IEEE "signaling" comparison (i.e. we get an invalid FP exception if either argument is any NaN). The condition code is implemented as a metadata string. The same set of predicates as for the fcmp instruction is supported (except for the "true" and "false" predicates). These new intrinsics are mapped by SelectionDAG codegen onto two new ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again representing quiet vs. signaling comparison operations. Otherwise those nodes look like SETCC nodes, with an additional chain argument and result as usual for strict FP nodes. The patch includes support for the common legalization operations for those nodes. The patch also includes full SystemZ back-end support for the new ISD nodes, mapping them to all available SystemZ instruction to fully implement strict semantics (scalar and vector). Differential Revision: https://reviews.llvm.org/D69281
* [TargetLowering] Fix another potential FPE in expandFP_TO_UINTCraig Topper2019-12-062-24/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105
* [FPEnv][SelectionDAG] Relax chain requirementsUlrich Weigand2019-12-063-451/+617
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the following changes: 1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well. This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG 2) The OPC_CheckFoldableChainNode check requires that each of the intermediate nodes in a multi-node pattern match only has a single use. This check tends to fail if those intermediate nodes are strict operations as those have a chain output that typically indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't. This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value. In theory, this change could affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening 3) The SystemZ back-end currently does not allow matching multiply-and- extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for strict FP operations. This was not possible in the past due to the problems described under 1) and 2) above. With those issues fixed, it is now possible to fully support those instructions in strict mode as well, and this patch does so. Differential Revision: https://reviews.llvm.org/D70913
* [RegisterCoalescer] Fix the creation of subranges when rematerialization is usedQuentin Colombet2019-12-051-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Context * During register coalescing, we use rematerialization when coalescing is not possible. That means we may rematerialize a super register when only a smaller register is actually used. E.g., 0B v1 = ldimm 0xFF 1B v2 = COPY v1.low8bits 2B = v2 => 0B v1 = ldimm 0xFF 1B v2 = ldimm 0xFF 2B = v2.low8bits Where xB are the slot indexes. Here v2 grew from a 8-bit register to a 16-bit register. When that happens and subregister liveness is enabled, we create subranges for the newly created value. E.g., before remat, the live range of v2 looked like: main range: [1r, 2r) (Reads v2 is defined at index 1 slot register and used before the slot register of index 2) After remat, it should look like: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 1d) <-- dead def I.e., the unsused lanes of v2 should be marked as dead definition. * The Problem * Prior to this patch, the live-ranges from the previous exampel, would have the full live-range for all subranges: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 2r) <-- too long * The Fix * Technically, the code that this patch changes is not wrong: When we create the subranges for the newly rematerialized value, we create only one subrange for the whole bit mask. In other words, at this point v2 live-range looks like this: main range: [1r, 2r) low & high: [1r, 2r) Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits: main range: [1r, 2r) low 8 bits: [1r, 2r) high 8 bits: [1r, 2r) <-- too long Ideally, we would like LiveInterval::refineSubRanges to be able to do the right thing and mark the dead lanes as such. However, this is not possible, because by the time we update / refine the live ranges, the IR hasn't been updated yet, therefore we actually don't have enough information to do the right thing. Another option to fix the problem would have been to call LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as this may have a noticeable impact on compile time. Instead, what this patch does is when we create the subranges for the rematerialized value, we explicitly create one subrange for the lanes that were used before rematerialization and one for the lanes that were not used. The used one inherits the live range of the main range and the unused one is just created empty. The existing rematerialization code then detects that the unused one are not live and it correctly sets dead def intervals for them. https://llvm.org/PR41372
* [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequenceUlrich Weigand2019-12-041-0/+62
| | | | | | | | | | | | | | | | | InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select sequences (where the fcmp is marked nnan). Currently, if the target does not otherwise handle these nodes, they'll get expanded to libcalls to fmin/fmax. However, these functions may reside in libm, which may introduce a library dependency that was not originally present in the source code, potentially resulting in link failures. To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the libcall. This is done only if the node is marked as "nnan"; in this case, the expansion to compare+select is always correct. This also suffices to catch all cases where FMINNUM/FMAXNUM was synthesized as above. Differential Revision: https://reviews.llvm.org/D70965
* [SystemZ] Return the right offsets from getCalleeSavedSpillSlots().Jonas Paulsson2019-11-251-4/+13
| | | | | | | | | | | | | | // Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not // equal to the incoming stack pointer, but to incoming stack pointer plus // 160. The getOffsetOfLocalArea() returned value is interpreted as "the // offset of the local area from the CFA". The immediate offsets into the Register save area returned by getCalleeSavedSpillSlots() should take this offset into account, which this patch makes sure of. Patch and review by Ulrich Weigand. https://reviews.llvm.org/D70427
* [SystemZ] Avoid mixing strict and non-strict FP operations in testsUlrich Weigand2019-11-207-74/+215
| | | | | | This is to prepare for having the IR verifier reject mixed functions. Note that fp-strict-mul-02.ll and fp-strict-mul-04.ll still remain to be fixed.
* [SystemZ] Use fneg in test casesUlrich Weigand2019-11-2022-98/+88
| | | | | | Now that we have fneg, prefer using it over "fsub -0.0, ...". This helps in particular with strict FP tests, as fneg does not raise any exceptions.
* [LegalizeDAG] Convert strict fp nodes to libcalls without losing the chain.Craig Topper2019-11-181-165/+116
| | | | | | | | | | Previously we mutated the node and then converted it to a libcall. But this loses the chain information. This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right. Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better. Differential Revision: https://reviews.llvm.org/D70334
* [X86] Add more add/sub carry testsDavid Zarzycki2019-11-122-2/+2
| | | | | | Preparation for: https://reviews.llvm.org/D70079 https://reviews.llvm.org/D70077
* [SystemZ] Add GHC calling conventionUlrich Weigand2019-11-047-0/+184
| | | | | | | This is a special calling convention to be used by the GHC compiler. Author: Stefan Schulze Frielinghaus Differential Revision: https://reviews.llvm.org/D69024
* [SystemZ] Improve handling of huge PC relative immediate offsets.Jonas Paulsson2019-11-041-0/+31
| | | | | | | | | | | | | | | Demand that an immediate offset to a PC relative address fits in 32 bits, or else load it into a register and perform a separate add. Verify in the assembler that such immediate offsets fit the bitwidth. Even though the final address of a Load Address Relative Long may fit in 32 bits even with a >32 bit offset (depending on where the symbol lives relative to PC), the GNU toolchain demands the offset by itself to be in range. This patch adapts the same behavior for llvm. Review: Ulrich Weigand https://reviews.llvm.org/D69749
* [FPEnv] Strict FP tests should use the requisite function attributes.Kevin P. Neal2019-10-0462-992/+1077
| | | | | | | | | | | | | | | A set of function attributes is required in any function that uses constrained floating point intrinsics. None of our tests use these attributes. This patch fixes this. These tests have been tested against the IR verifier changes in D68233. Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand Approved by: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D67925 llvm-svn: 373761
* [SystemZ] Add SystemZPostRewrite in addPostRegAlloc() instead at -O0.Jonas Paulsson2019-09-301-0/+29
| | | | | | | | SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA pseudo pass also at -O0, so it should be added in addPostRegAlloc(). Review: Ulrich Weigand llvm-svn: 373182
* Change -march=systemz to triple and fix testKai Nacke2019-09-272-6/+4
| | | | | | | | | | | | | These two test cases use -march=systemz instead of a triple. In particular, the used file format is then based on the default host triple. This leads to different behaviour on different platforms. The SystemZ implementation uses the integrated assembler for a long time now. The mature-mc-support test can be fully enabled. Differential Revision: https://reviews.llvm.org/D68129 llvm-svn: 373098
* [SystemZ] Recognize mnop-mcount in backendJonas Paulsson2019-09-262-0/+37
| | | | | | | | | | With -pg -mfentry -mnop-mcount, a nop is emitted instead of the call to fentry. Review: Ulrich Weigand https://reviews.llvm.org/D67765 llvm-svn: 372950
* [SystemZ] Improve emitSelect()Jonas Paulsson2019-09-254-70/+92
| | | | | | | | | | | | | Merge more Select pseudo instructions in emitSelect() by allowing other instructions between them as long as they do not clobber CC. Debug value instructions are now moved down to below the new PHIs instead of erasing them. Review: Ulrich Weigand https://reviews.llvm.org/D67619 llvm-svn: 372873
* [SystemZ] Support z15 processor nameUlrich Weigand2019-09-2020-26/+26
| | | | | | | | | | | The recently announced IBM z15 processor implements the architecture already supported as "arch13" in LLVM. This patch adds support for "z15" as an alternate architecture name for arch13. The patch also uses z15 in a number of places where we used arch13 as long as the official name was not yet announced. llvm-svn: 372435
* [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes ↵Guillaume Chatelet2019-09-1119-19/+19
| | | | | | | | | | | | | | | | | | | | | | mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608
* [BPI] Adjust the probability for floating point unordered comparisonGuozhi Wei2019-09-101-1/+2
| | | | | | | | Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number. Differential Revision: https://reviews.llvm.org/D65303 llvm-svn: 371541
* [SystemZ] Recognize INLINEASM_BR in backendJonas Paulsson2019-09-051-2/+2
| | | | | | | | | | Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048
* [SystemZ] Recognize INLINEASM_BR in backend.Jonas Paulsson2019-09-031-0/+15
| | | | | | | | SystemZInstrInfo::analyzeBranch() needs to check for INLINEASM_BR instructions, or it will crash. Review: Ulrich Weigand llvm-svn: 370753
* [SystemZ] Add support for fentry.Jonas Paulsson2019-09-031-0/+29
| | | | | | | SystemZAsmPrinter now properly emits function calls to __fentry__. Review: Ulrich Weigand llvm-svn: 370743
* [SystemZ] Support constrained fpto[su]i intrinsicsUlrich Weigand2019-09-028-0/+505
| | | | | | | | | | | Now that constrained fpto[su]i intrinsic are available, add codegen support to the SystemZ backend. In addition to pure back-end changes, I've also needed to add the strict_fp_to_[su]int and any_fp_to_[su]int pattern fragments in the obvious way. llvm-svn: 370674
* [SystemZ] Regenerate <8 x i31> store testSimon Pilgrim2019-07-291-32/+33
| | | | | | To help show the diffs from an upcoming SimplifyDemandedBits patch. llvm-svn: 367216
* [TargetLowering] Add SimplifyMultipleUseDemandedBitsSimon Pilgrim2019-07-231-6/+6
| | | | | | | | | | | | | | | | | | This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799
* [Strict FP] Allow more relaxed schedulingUlrich Weigand2019-07-162-87/+165
| | | | | | | | | | | | | | Reimplement scheduling constraints for strict FP instructions in ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed scheduling. Specifially, allow one strict FP instruction to be scheduled across another, as long as it is not moved across any global barrier. Differential Revision: https://reviews.llvm.org/D64412 Reviewed By: cameron.mcinally llvm-svn: 366222
* [SystemZ] Fix addcarry of addcarry of const carry (PR42606)Nikita Popov2019-07-121-0/+35
| | | | | | | | | | | | This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending D64213. Instead of only checking if the carry comes from a matching operation, we now check the full chain of carries. Otherwise we might custom lower the outermost addcarry, but then generically legalize an inner addcarry. Differential Revision: https://reviews.llvm.org/D64658 llvm-svn: 365949
OpenPOWER on IntegriCloud