summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove a redundant `default:` on an exhaustive switch(enum).Eric Astor2019-12-301-2/+0
|
* [X86][AsmParser] re-introduce 'offset' operatorEric Astor2019-12-303-88/+158
| | | | | | | | | | | | | | | | | | | | | | | Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-302-4/+12
|
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* AMDGPU/GlobalISel: Re-use MRI available in selectorMatt Arsenault2019-12-301-9/+7
|
* [MIPS GlobalISel] Select bitreverse. RecommitPetar Avramovic2019-12-301-0/+4
| | | | | | | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Recommit notes: Introduce temporary variables in order to make sure instructions get inserted into MachineFunction in same order regardless of compiler used to build llvm. Differential Revision: https://reviews.llvm.org/D71363
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-302-4/+9
|
* [ARM][Thumb][FIX] Add unwinding information to t4Diogo Sampaio2019-12-301-0/+2
| | | | | | | | | | | | | | | | | Summary: Add missing part of patch D71361. Now that the stack-frame can be operated using a addw/subw instruction, they should appear in the unwinding list. Reviewers: dmgreen, efriedma Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72000
* GlobalISel: moreElementsVector for FP min/maxMatt Arsenault2019-12-301-0/+1
|
* AMDGPU: Improve llvm.round.f64 lowering for CI+Matt Arsenault2019-12-302-4/+5
| | | | | The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
* AMDGPU/GlobalISel: Account for G_PHI result bankMatt Arsenault2019-12-301-13/+23
| | | | | | | | | Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.
* [PowerPC] Legalize rounding nodesNemanja Ivanovic2019-12-302-0/+52
| | | | | | | | VSX provides a full complement of rounding instructions yet we somehow ended up with some of them legal and others not. This just legalizes all of the FP rounding nodes and the FP -> int rounding nodes with unsafe math. Differential revision: https://reviews.llvm.org/D69949
* Revert "[MIPS GlobalISel] Select bitreverse"Dmitri Gribenko2019-12-301-4/+0
| | | | | | This reverts commit dbc136e0fe7e14c64dcb78e72321bb41af60afa4. It broke buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/21066
* [ARM] Sink splat to ICmpDavid Green2019-12-302-2/+3
| | | | | | | | | This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
* [ARM][THUMB2] Allow emitting T3 types of add and subDiogo Sampaio2019-12-301-42/+33
| | | | | | | | | | | | | | | | | | Summary: This patch allows to emit thumb2 add and sub instructions with 12 bit immediates in the emitT2RegPlusImmediate function. - Splitting parts of the D70680 Reviewers: eli.friedman, olista01, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71361
* [MIPS GlobalISel] Select bitreversePetar Avramovic2019-12-301-0/+4
| | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Differential Revision: https://reviews.llvm.org/D71363
* [MIPS GlobalISel] Select bswapPetar Avramovic2019-12-302-0/+14
| | | | | | | | | G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates these intrinsics from __builtin_bswap32 and __builtin_bswap64. Add lower and narrowscalar for G_BSWAP. Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later. Differential Revision: https://reviews.llvm.org/D71362
* [PowerPC] Exploit the rlwinm instructions for "and" with constantQingShan Zhang2019-12-302-0/+44
| | | | | | | | | | | | | | | | | | | | For now, PowerPC will using several instructions to get the constant and "and" it with the following case: define i32 @test1(i32 %a) { %and = and i32 %a, -2 ret i32 %and } However, we could exploit it with the rotate mask instructions. MB ME +----------------------+ |xxxxxxxxxxx00011111000| +----------------------+ 0 32 64 Notice that, we can only do it if the MB is larger than 32 and MB <= ME as RLWINM will replace the content of [0 - 32) with [32 - 64) even we didn't rotate it. Differential Revision: https://reviews.llvm.org/D71829
* [X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFCCraig Topper2019-12-291-4/+4
| | | | | These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.
* [X86] Use isOneConstant to simplify some code. NFCCraig Topper2019-12-291-2/+1
|
* [X86] Remove dyn_casts to ConstantSDNode for operand 1 of ↵Craig Topper2019-12-291-108/+99
| | | | | | | | X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.
* [SelectionDAG] Disallow indirect "i" constraintFangrui Song2019-12-2911-22/+1
| | | | | | | | | This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.
* [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only ↵Craig Topper2019-12-281-2/+3
| | | | | | | | | targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.
* [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in ↵Craig Topper2019-12-281-6/+1
| | | | LowerUINT_TO_FP_i32. NFCI
* [X86] Fix -enable-machine-outliner for x86-32 after D48683Fangrui Song2019-12-281-3/+1
| | | | D48683 accidentally disabled -enable-machine-outliner for x86-32.
* Fix bots after a9ad65a2b34fNemanja Ivanovic2019-12-281-0/+1
| | | | | In the last commit, I neglected to initialize the new subtarget feature I added which caused failures on a few bots. This should fix that.
* [PowerPC] Change default for unaligned FP access for older subtargetsNemanja Ivanovic2019-12-283-1/+10
| | | | | | | | | | | This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554 Some CPU's trap to the kernel on unaligned floating point access and there are kernels that do not handle the interrupt. The program then fails with a SIGBUS according to the PR. This just switches the default for unaligned access to only allow it on recent server CPUs that are known to allow this. Differential revision: https://reviews.llvm.org/D71954
* [PowerPC] Modify the hasSideEffects of some VSX instructions from 1 to 0Kang Zhang2019-12-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen` will set it as true for those instructions which has no match pattern. Below 6 instructions don't set the hasSideEffects flag and don't have match pattern, so their hasSideEffects flag will be set true by llvm-tblgen. But in fact below instructions don't modify any special register and don't have other SideEffects, they shouldn't have SideEffects. This patch is to modify the hasSideEffects of below instructions from 1 to 0. ``` VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX VSPLTBs VSPLTHs ``` Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D71391
* AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constrainingMatt Arsenault2019-12-271-1/+1
| | | | | This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.
* Hexagon: Fix missing tablegen mode commentMatt Arsenault2019-12-271-1/+1
|
* TII: Fix using Register for a subregister index argumentMatt Arsenault2019-12-272-2/+2
|
* AMDGPU: Use RegisterMatt Arsenault2019-12-271-9/+9
|
* AMDGPU/GlobalISel: Fix extra result register in fdiv64 loweringMatt Arsenault2019-12-271-2/+1
| | | | | | There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.
* AMDGPU/GlobalISel: Select some 128-bit load/storesMatt Arsenault2019-12-271-4/+10
|
* AMDGPU: Use correct DebugLocMatt Arsenault2019-12-271-1/+1
|
* [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to ↵Craig Topper2019-12-271-1/+1
| | | | | | | | v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.
* [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.Craig Topper2019-12-271-3/+19
|
* Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821Fangrui Song2019-12-273-35/+0
| | | | | | | Intrinsic has incorrect argument type! i32 (i32*)* @llvm.setjmp *wipes tear*
* [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵Craig Topper2019-12-263-64/+92
| | | | | | | | | | | | | | | | | | | | | | avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
* [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.Craig Topper2019-12-261-0/+12
| | | | | For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.
* [X86] Fix -Wmisleading-indentation after D71892Fangrui Song2019-12-261-0/+1
|
* [X86][FPEnv] Promote some float strictfp operations to double on ↵Craig Topper2019-12-261-2/+9
| | | | | | | | i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.
* [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.Craig Topper2019-12-261-7/+16
| | | | | | | | I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.
* add custom operation for strict fpextend/fproundLiu, Chen32019-12-275-20/+57
| | | | Differential Revision: https://reviews.llvm.org/D71892
* Remove SrcVT only used in an assert and propagate query.Eric Christopher2019-12-261-2/+2
|
* [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without ↵Craig Topper2019-12-262-62/+106
| | | | | | | | avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.
* [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, ↵Craig Topper2019-12-262-10/+23
| | | | | | | | | | | | | | | but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.
* [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, ↵Craig Topper2019-12-261-6/+15
| | | | | | | | | | | | | but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.
* [BPF] Enable relocation location for load/store/shiftsYonghong Song2019-12-264-50/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790
* [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since ↵Craig Topper2019-12-261-23/+11
| | | | the AVX512DQ+AVX512VL code is very similar in both. NFC
OpenPOWER on IntegriCloud