summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][FPEnv] Promote some float strictfp operations to double on ↵Craig Topper2019-12-261-2/+9
| | | | | | | | i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.
* [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.Craig Topper2019-12-261-7/+16
| | | | | | | | I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.
* add custom operation for strict fpextend/fproundLiu, Chen32019-12-275-20/+57
| | | | Differential Revision: https://reviews.llvm.org/D71892
* Remove SrcVT only used in an assert and propagate query.Eric Christopher2019-12-261-2/+2
|
* [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without ↵Craig Topper2019-12-262-62/+106
| | | | | | | | avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.
* [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, ↵Craig Topper2019-12-262-10/+23
| | | | | | | | | | | | | | | but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.
* [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, ↵Craig Topper2019-12-261-6/+15
| | | | | | | | | | | | | but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.
* [BPF] Enable relocation location for load/store/shiftsYonghong Song2019-12-264-50/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790
* [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since ↵Craig Topper2019-12-261-23/+11
| | | | the AVX512DQ+AVX512VL code is very similar in both. NFC
* [X86] Add custom lowering for v2i64->v2f32 ↵Craig Topper2019-12-261-8/+32
| | | | | | | strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.
* [PowerPC] stop folding if result rlwinm mask is wrap while original rlwinm ↵czhengsz2019-12-251-2/+6
| | | | | | | | | | | | | | | | | is not. %1:g8rc = RLWINM8 %0:g8rc, 0, 16, 9 %2:g8rc = RLWINM8 killed %1:g8rc, 0, 0, 31 -> %2:g8rc = RLWINM8 %0:g8rc, 0, 16, 9 The above folding is wrong. Before transformation, %2:g8rc is 32 bit value. After transformation, %2:g8rc becomes a 64 bit value. This patch fixes above issue. Reviewed by: steven.zhang Differential Revision: https://reviews.llvm.org/D71833
* [NFC][PowerPC] Add a function tryAndWithMask to handle all the casesQingShan Zhang2019-12-261-111/+120
| | | | | | | | | that 'and' with constant More patches will be committed later to exploit more about 'and' with constant. Differential Revision: https://reviews.llvm.org/D71693
* [PowerPC] Modify the hasSideEffects of MTLR and MFLR from 1 to 0Kang Zhang2019-12-262-0/+4
| | | | | | | | | | | | | | | | | Summary: If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen` will set it as true for those instructions which has no match pattern. The instructions `MTLR` and `MFLR` don't set the hasSideEffects flag and don't have match pattern, so their hasSideEffects flag will be set true by `llvm-tblgen`. But in fact, we can use `[LR]` to model the two instructions, so they should not have SideEffects. This patch is to modify the hasSideEffects of MTLR and MFLR from 1 to 0. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D71390
* [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backendWang, Pengfei2019-12-265-46/+141
| | | | | | | | | | | | Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
* [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint ↵Craig Topper2019-12-251-3/+7
| | | | | | | | v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.
* [X86FixupSetCC] Remember the preceding eflags defining instruction while ↵Craig Topper2019-12-251-27/+5
| | | | | | | | | | | | | | | | | | | | | we're scanning the basic block instead of looking back for it. Summary: We're already scanning forward through the basic block. Might as well just remember eflags defs instead of doing a bounded search backwards later. Based on a comment in D71841. Reviewers: RKSimon, spatel, uweigand Reviewed By: uweigand Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71865
* [X86] Merge together some common code in LowerFP_TO_INT now that we have ↵Craig Topper2019-12-251-17/+11
| | | | STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC
* Add missing strict_fp_to_intLiu, Chen32019-12-251-0/+3
| | | | Differential Revision: https://reviews.llvm.org/D71867
* [X86FixupSetCC] Use MachineInstr::readRegister/definesRegister to check for ↵Craig Topper2019-12-241-15/+3
| | | | EFLAGS use/def instead of our own custom operand scan. NFCI
* [WinEH] Delete addFnAttr("no-frame-pointer-elim") which seems no longer neededFangrui Song2019-12-241-5/+0
| | | | | | | | It was added in rL238619. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71862
* AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixupMatt Arsenault2019-12-242-1/+6
|
* [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit ↵Craig Topper2019-12-241-7/+14
| | | | | | | | | | | | | | targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.
* [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.Craig Topper2019-12-246-165/+182
| | | | Differential Revision: https://reviews.llvm.org/D71850
* AMDGPU/GlobalISel: Legalize some 16-bit round instructionsMatt Arsenault2019-12-241-1/+6
|
* AMDGPU/GlobalISel: Lower llvm.amdgcn.elseMatt Arsenault2019-12-241-6/+17
|
* [SelectionDAG] Change SelectionDAGISel::{funcInfo,SDB} to use unique_ptrFangrui Song2019-12-231-8/+9
| | | | | CurDAG is referenced more than 2000 times and used in many gerated .cpp files. Don't touch it for now.
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-234-92/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointerJay Foad2019-12-235-34/+11
| | | | | | | | | | | | | | | Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838
* [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵Sanjay Patel2019-12-232-97/+0
| | | | | | | | | | | | | | | | | | | extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
* [AArch64] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-233-7/+15
| | | | | | | | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71721
* [ARM] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-231-4/+6
| | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71720
* [NFC] Style cleanupsShengchen Kan2019-12-231-22/+23
| | | | | | 1. Remove duplicate function for class name at the beginning of the comment. 2. Use auto where the type is already obvious from the context.
* [Power9] Remove the PPCISD::XXREVERSE as it has completely the same ↵QingShan Zhang2019-12-234-23/+5
| | | | | | | | | semantics of ISD::BSWAP The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP. We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse. Differential Revision: https://reviews.llvm.org/D70657
* [AVR] Fix codegen for rotate instructionsJim Lin2019-12-233-4/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces the ROLBRd and RORBRd pseudo-instructions, which implemenent the "traditional" rotate operations; instead of the AVR rotate instructions that use the carry bit. The code is not optimized at all. Especially when dealing with loops of rotate instructions, this codegen should be improved some day. Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358> //Note//: This is my first submitted patch. Reviewers: dylanmckay, Jim Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels Tags: #llvm Patched by dsprenkels (Daan Sprenkels) Differential Revision: https://reviews.llvm.org/D60365
* [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotationKai Luo2019-12-232-1/+21
| | | | | | | | | Summary: Currently, we set legalization action of `ISD::ROTL` vectors as `Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)` to lower `ISD::ROTL` directly. Differential Revision: https://reviews.llvm.org/D71324
* [AMDGPU] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-222-4/+4
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71815
* [Hexagon] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-10/+10
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71814
* [NVPTX] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-1/+1
| | | | | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Also removed the top-level const as requested by Aaron Ballman in similar patches. Differential Revision: https://reviews.llvm.org/D71812
* [PowerPC] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-3/+3
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71811
* [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵Eric Astor2019-12-224-13/+41
| | | | | | | | | | | | | | | | | | | | assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
* [AArch64] match splat of bitcasted extract subvector to DUPLANESanjay Patel2019-12-221-7/+43
| | | | | | | | | | This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672
* Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.Simon Pilgrim2019-12-211-1/+1
|
* [AArch64] Respect reserved registers while renaming in LdSt opt.Florian Hahn2019-12-211-1/+4
| | | | | | We cannot pick reserved registers as rename registers. Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-5/+5
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xorMatt Arsenault2019-12-211-0/+5
|
* AMDGPU/GlobalISel: Simplify codeMatt Arsenault2019-12-211-5/+5
| | | | | This can directly access the register bank, and doesn't need to get it through the ID.
* [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly ↵Yury Delendik2019-12-206-0/+33
| | | | | | | | | | | | | | | | | | operands locations Extends DWARF expression language to express locals/globals locations. (via target-index operands atm) (possible variants are: non-virtual registers or address spaces) The WebAssemblyExplicitLocals can replace virtual registers to targertindex operand type at the time when WebAssembly backend introduces {get,set,tee}_local instead of corresponding virtual registers. Reviewed By: aprantl, dschuff Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D52634
* Add parentheses to silence warningBill Wendling2019-12-201-2/+2
|
* More style cleanups following rG14fc20ca6282 [NFC]Philip Reames2019-12-201-34/+28
| | | | | | | Demote member functions to static functions where possible Use early continue/early return to reduce nesting Clarify comments slightly. Reuse previously define expression in one case.
* Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282Philip Reames2019-12-201-6/+6
| | | | Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
OpenPOWER on IntegriCloud