summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add custom lowering for v2i64->v2f32 ↵Craig Topper2019-12-261-8/+32
| | | | | | | strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.
* [PowerPC] stop folding if result rlwinm mask is wrap while original rlwinm ↵czhengsz2019-12-251-2/+6
| | | | | | | | | | | | | | | | | is not. %1:g8rc = RLWINM8 %0:g8rc, 0, 16, 9 %2:g8rc = RLWINM8 killed %1:g8rc, 0, 0, 31 -> %2:g8rc = RLWINM8 %0:g8rc, 0, 16, 9 The above folding is wrong. Before transformation, %2:g8rc is 32 bit value. After transformation, %2:g8rc becomes a 64 bit value. This patch fixes above issue. Reviewed by: steven.zhang Differential Revision: https://reviews.llvm.org/D71833
* [NFC][PowerPC] Add a function tryAndWithMask to handle all the casesQingShan Zhang2019-12-261-111/+120
| | | | | | | | | that 'and' with constant More patches will be committed later to exploit more about 'and' with constant. Differential Revision: https://reviews.llvm.org/D71693
* [PowerPC] Modify the hasSideEffects of MTLR and MFLR from 1 to 0Kang Zhang2019-12-262-0/+4
| | | | | | | | | | | | | | | | | Summary: If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen` will set it as true for those instructions which has no match pattern. The instructions `MTLR` and `MFLR` don't set the hasSideEffects flag and don't have match pattern, so their hasSideEffects flag will be set true by `llvm-tblgen`. But in fact, we can use `[LR]` to model the two instructions, so they should not have SideEffects. This patch is to modify the hasSideEffects of MTLR and MFLR from 1 to 0. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D71390
* [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backendWang, Pengfei2019-12-265-46/+141
| | | | | | | | | | | | Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
* [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint ↵Craig Topper2019-12-251-3/+7
| | | | | | | | v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.
* [X86FixupSetCC] Remember the preceding eflags defining instruction while ↵Craig Topper2019-12-251-27/+5
| | | | | | | | | | | | | | | | | | | | | we're scanning the basic block instead of looking back for it. Summary: We're already scanning forward through the basic block. Might as well just remember eflags defs instead of doing a bounded search backwards later. Based on a comment in D71841. Reviewers: RKSimon, spatel, uweigand Reviewed By: uweigand Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71865
* [X86] Merge together some common code in LowerFP_TO_INT now that we have ↵Craig Topper2019-12-251-17/+11
| | | | STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC
* Add missing strict_fp_to_intLiu, Chen32019-12-251-0/+3
| | | | Differential Revision: https://reviews.llvm.org/D71867
* [X86FixupSetCC] Use MachineInstr::readRegister/definesRegister to check for ↵Craig Topper2019-12-241-15/+3
| | | | EFLAGS use/def instead of our own custom operand scan. NFCI
* [WinEH] Delete addFnAttr("no-frame-pointer-elim") which seems no longer neededFangrui Song2019-12-241-5/+0
| | | | | | | | It was added in rL238619. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71862
* AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixupMatt Arsenault2019-12-242-1/+6
|
* [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit ↵Craig Topper2019-12-241-7/+14
| | | | | | | | | | | | | | targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.
* [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.Craig Topper2019-12-246-165/+182
| | | | Differential Revision: https://reviews.llvm.org/D71850
* AMDGPU/GlobalISel: Legalize some 16-bit round instructionsMatt Arsenault2019-12-241-1/+6
|
* AMDGPU/GlobalISel: Lower llvm.amdgcn.elseMatt Arsenault2019-12-241-6/+17
|
* [SelectionDAG] Change SelectionDAGISel::{funcInfo,SDB} to use unique_ptrFangrui Song2019-12-231-8/+9
| | | | | CurDAG is referenced more than 2000 times and used in many gerated .cpp files. Don't touch it for now.
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-234-92/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointerJay Foad2019-12-235-34/+11
| | | | | | | | | | | | | | | Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838
* [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵Sanjay Patel2019-12-232-97/+0
| | | | | | | | | | | | | | | | | | | extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
* [AArch64] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-233-7/+15
| | | | | | | | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71721
* [ARM] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-231-4/+6
| | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71720
* [NFC] Style cleanupsShengchen Kan2019-12-231-22/+23
| | | | | | 1. Remove duplicate function for class name at the beginning of the comment. 2. Use auto where the type is already obvious from the context.
* [Power9] Remove the PPCISD::XXREVERSE as it has completely the same ↵QingShan Zhang2019-12-234-23/+5
| | | | | | | | | semantics of ISD::BSWAP The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP. We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse. Differential Revision: https://reviews.llvm.org/D70657
* [AVR] Fix codegen for rotate instructionsJim Lin2019-12-233-4/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces the ROLBRd and RORBRd pseudo-instructions, which implemenent the "traditional" rotate operations; instead of the AVR rotate instructions that use the carry bit. The code is not optimized at all. Especially when dealing with loops of rotate instructions, this codegen should be improved some day. Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358> //Note//: This is my first submitted patch. Reviewers: dylanmckay, Jim Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels Tags: #llvm Patched by dsprenkels (Daan Sprenkels) Differential Revision: https://reviews.llvm.org/D60365
* [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotationKai Luo2019-12-232-1/+21
| | | | | | | | | Summary: Currently, we set legalization action of `ISD::ROTL` vectors as `Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)` to lower `ISD::ROTL` directly. Differential Revision: https://reviews.llvm.org/D71324
* [AMDGPU] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-222-4/+4
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71815
* [Hexagon] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-10/+10
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71814
* [NVPTX] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-1/+1
| | | | | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Also removed the top-level const as requested by Aaron Ballman in similar patches. Differential Revision: https://reviews.llvm.org/D71812
* [PowerPC] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-3/+3
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71811
* [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵Eric Astor2019-12-224-13/+41
| | | | | | | | | | | | | | | | | | | | assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
* [AArch64] match splat of bitcasted extract subvector to DUPLANESanjay Patel2019-12-221-7/+43
| | | | | | | | | | This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672
* Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.Simon Pilgrim2019-12-211-1/+1
|
* [AArch64] Respect reserved registers while renaming in LdSt opt.Florian Hahn2019-12-211-1/+4
| | | | | | We cannot pick reserved registers as rename registers. Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-5/+5
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xorMatt Arsenault2019-12-211-0/+5
|
* AMDGPU/GlobalISel: Simplify codeMatt Arsenault2019-12-211-5/+5
| | | | | This can directly access the register bank, and doesn't need to get it through the ID.
* [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly ↵Yury Delendik2019-12-206-0/+33
| | | | | | | | | | | | | | | | | | operands locations Extends DWARF expression language to express locals/globals locations. (via target-index operands atm) (possible variants are: non-virtual registers or address spaces) The WebAssemblyExplicitLocals can replace virtual registers to targertindex operand type at the time when WebAssembly backend introduces {get,set,tee}_local instead of corresponding virtual registers. Reviewed By: aprantl, dschuff Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D52634
* Add parentheses to silence warningBill Wendling2019-12-201-2/+2
|
* More style cleanups following rG14fc20ca6282 [NFC]Philip Reames2019-12-201-34/+28
| | | | | | | Demote member functions to static functions where possible Use early continue/early return to reduce nesting Clarify comments slightly. Reuse previously define expression in one case.
* Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282Philip Reames2019-12-201-6/+6
| | | | Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
* Align branches within 32-Byte boundary (NOP padding)Philip Reames2019-12-201-1/+286
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARNING: If you're looking at this patch because you're looking for a full performace mitigation of the Intel JCC Erratum, this is not it! This is a preliminary patch on the patch towards mitigating the performance regressions caused by Intel's microcode update for Jump Conditional Code Erratum. For context, see: https://www.intel.com/content/www/us/en/support/articles/000055650.html The patch adds the required assembler infrastructure and command line options needed to exercise the logic for INTERNAL TESTING. These are NOT public flags, and should not be used for anything other than LLVM's own testing/debugging purposes. They are likely to change both in spelling and meaning. WARNING: This patch is knowingly incorrect in some cornercases. We need, and do not yet provide, a mechanism to selective enable/disable the padding. Conversation on this will continue in parellel with work on extending this infrastructure to support prefix padding. The goal here is to have the assembler align specific instructions such that they neither cross or end at a 32 byte boundary. The impacted instructions are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Indirect jump. e. Ret. f. Call. The new options for llvm-mc are: -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary. -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align. A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit NOP to align the fused/unfused branch. alignBranchesBegin inserts MCBoundaryAlignFragment before instructions, alignBranchesEnd marks the end of the branch to be aligned, relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch. Nop padding is disabled when the instruction may be rewritten by the linker, such as TLS Call. Process Note: I am landing a patch by skan as it has been LGTMed, and continuing to iterate on the review is simply slowing us down at this point. We can and will continue to iterate in tree. Patch By: skan Differential Revision: https://reviews.llvm.org/D70157
* [PPC32] Emit R_PPC_PLTREL24 for calls to dso_local ifuncFangrui Song2019-12-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | static void *ifunc(void) __attribute__((ifunc("resolver"))); void foo() { ifunc(); } The relocation produced by the ifunc() call: 1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000 3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 4. clang -msecure-plt -fPIE => R_PPC_REL24 4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a function, both GNU ld and lld (after D71621 fix) may produce a wrong result. This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC. Both GNU ld and lld (after D71621) will be happy. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D71649
* [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables ↵Craig Topper2019-12-201-19/+23
| | | | | | | | | | | | | | before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736
* [AArch64][SVE] Replace integer immediate intrinsics with splat vector variantDanilo Carvalho Grael2019-12-202-22/+39
| | | | | | | | | | | | Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71614
* [SystemZ] Add a mapping from "select register" to "load on condition" (2-addr).Jonas Paulsson2019-12-204-81/+60
| | | | | | | | | | | | | | | | | | The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux) instructions whenever one of the sources are the same reg as dest. By adding this mapping in getTwoOperandOpcode(), we get: - Two-address hints in getRegAllocationHints() for select register instructions. - No need anymore for special handling in SystemZShortenInst.cpp - shortenSelect() removed. The two-address hints are now added before the GRX32 hints, which should be preferred. Review: Ulrich Weigand https://reviews.llvm.org/D68870
* [SystemZ] Bugfix and improve the handling of CC values.Jonas Paulsson2019-12-206-33/+141
| | | | | | | | | | | | | | | | | It was recently discovered that the handling of CC values was actually broken since overflow was not properly handled ('nsw' flag not checked for). Add and sub instructions now have a new target specific instruction flag named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used instead of a compare with 0, but only if the instruction has the 'nsw' flag set. This patch also adds the improvements of conversion to logical instructions and the analyzing of add with immediates, to be able to eliminate more compares. Review: Ulrich Weigand https://reviews.llvm.org/D66868
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2019-12-206-158/+6
| | | | This reverts commit bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce.
* [SystemZ][FPEnv] Enable strict vector FP extends/truncationsUlrich Weigand2019-12-204-13/+66
| | | | | | | | | | | | | | The back-end currently has special DAGCombine code to detect cases where two floating-point extend or truncate operations can be combined into a single vector operation. This patch extends that support to also handle strict FP operations. Note that currently only the case where both operations have the same input chain are supported. This already suffices to cover the common case where the operations result from scalarizing a non-legal vector type. More general cases can be supported in the future.
* [AArch64][SVE] Correct intrinsics and patterns for logical predicate ↵Paul Walker2019-12-201-17/+17
| | | | | | | | | | | | | | | | | | | | | | | instructions In general SVE intrinsics are considered predicated and merging with everything else having suitable decoration. For predicated zeroing operations (like the predicate logical instructions) we use the "_z" suffix. After this change all intrinsics use their expected names (i.e. orr instead of or and eor instead of xor). I've removed intrinsics and patterns for condition code setting instructions as that data is not returned as part of the intrinsic. The expectation is to ask for a cc flag explicitly. For example: a = and_z(pg, p1, p2) cc = ptest_<flag>(pg, a) With the code generator expected to use "s" variants of instructions when available. Differential Revision: https://reviews.llvm.org/D71715
OpenPOWER on IntegriCloud