summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Use carry flag from add for (seteq (add X, -1), -1).Craig Topper2019-12-311-10/+31
| | | | | | | | If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019
* [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to ↵Craig Topper2019-12-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | be promoted. These operations are needed as building blocks for promoting so they can't be promoted themselves. This appeared to work because the fp_extend query type for operation actions is the result type, not the input type so it never triggered in the legalizer. For fp_round, the vector op legalizer just ended up creating a nop fp_extend that was elided by getNode, followed by a nop fp_round that was also elided by getNode. This was followed by a final fp_round from v4f32 back to vf416 which was CSEd to the original node. Then legalize vector ops just believed that node legalized to itself. LegalizeDAG took another crack at promoting it, but didn't have a handler so just skipped it with a debug message saying it wasn't promoted. This patch just removes the operation actions to avoid this non-sense. Found while trying to refactor LegalizeVectorOps to handle multiple result nodes better.
* [amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`.Michael Liao2019-12-311-1/+1
| | | | | | | | | | Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866
* [PowerPC][NFC] Fix clang-tidy warningJinsong Ji2019-12-311-5/+5
| | | | | | | | | | | | | | | | | | Reported by https://results.llvm-merge-guard.org/amd64_debian_testing_clang8-726/clang-tidy.txt /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11672:10: warning: invalid case style for variable 'isEQ' [readability-identifier-naming] bool isEQ = (MI.getOpcode() == PPC::ANDI_rec_1_EQ_BIT || ^~~~ IsEq /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11679:14: warning: invalid case style for variable 'dl' [readability-identifier-naming] DebugLoc dl = MI.getDebugLoc(); ^~ Dl
* [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in ↵Craig Topper2019-12-311-2/+8
| | | | | | | | | | | | | | | LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.
* [NFC] Style cleanupShengchen Kan2019-12-311-28/+29
| | | | | | 1. make function Is16BitMemOperand static 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter
* [NFC] Make X86MCCodeEmitter::isPCRel32Branch staticShengchen Kan2019-12-311-4/+2
|
* [NFC] Style cleanupShengchen Kan2019-12-311-389/+479
| | | | | | | 1. Remove function is64BitMode() and use STI.hasFeature(X86::Mode16Bit) directly 2. Use Doxygen features in comment 3. Rename functions to make them start with a lower case letter 4. Format the code with clang-format
* [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues ↵Craig Topper2019-12-303-9/+19
| | | | | | | | | | | instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.
* Remove a redundant `default:` on an exhaustive switch(enum).Eric Astor2019-12-301-2/+0
|
* [X86][AsmParser] re-introduce 'offset' operatorEric Astor2019-12-303-88/+158
| | | | | | | | | | | | | | | | | | | | | | | Summary: Amend MS offset operator implementation, to more closely fit with its MS counterpart: 1. InlineAsm: evaluate non-local source entities to their (address) location 2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous 3. address PR32530 Based on http://llvm.org/D37461 Fix broken test where the break appears unrelated. - Set up appropriate memory-input rewrites for variable references. - Intel-dialect assembly printing now correctly handles addresses by adding "offset". - Pass offsets as immediate operands (using "r" constraint for offsets of locals). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D71436
* AMDGPU/GlobalISel: Select mul24 intrinsicsMatt Arsenault2019-12-302-4/+12
|
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* AMDGPU/GlobalISel: Re-use MRI available in selectorMatt Arsenault2019-12-301-9/+7
|
* [MIPS GlobalISel] Select bitreverse. RecommitPetar Avramovic2019-12-301-0/+4
| | | | | | | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Recommit notes: Introduce temporary variables in order to make sure instructions get inserted into MachineFunction in same order regardless of compiler used to build llvm. Differential Revision: https://reviews.llvm.org/D71363
* AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftzMatt Arsenault2019-12-302-4/+9
|
* [ARM][Thumb][FIX] Add unwinding information to t4Diogo Sampaio2019-12-301-0/+2
| | | | | | | | | | | | | | | | | Summary: Add missing part of patch D71361. Now that the stack-frame can be operated using a addw/subw instruction, they should appear in the unwinding list. Reviewers: dmgreen, efriedma Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72000
* GlobalISel: moreElementsVector for FP min/maxMatt Arsenault2019-12-301-0/+1
|
* AMDGPU: Improve llvm.round.f64 lowering for CI+Matt Arsenault2019-12-302-4/+5
| | | | | The path already used for f16/f32 works a lot better when v_trunc_f64 is available.
* AMDGPU/GlobalISel: Account for G_PHI result bankMatt Arsenault2019-12-301-13/+23
| | | | | | | | | Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.
* [PowerPC] Legalize rounding nodesNemanja Ivanovic2019-12-302-0/+52
| | | | | | | | VSX provides a full complement of rounding instructions yet we somehow ended up with some of them legal and others not. This just legalizes all of the FP rounding nodes and the FP -> int rounding nodes with unsafe math. Differential revision: https://reviews.llvm.org/D69949
* Revert "[MIPS GlobalISel] Select bitreverse"Dmitri Gribenko2019-12-301-4/+0
| | | | | | This reverts commit dbc136e0fe7e14c64dcb78e72321bb41af60afa4. It broke buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/21066
* [ARM] Sink splat to ICmpDavid Green2019-12-302-2/+3
| | | | | | | | | This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997
* [ARM][THUMB2] Allow emitting T3 types of add and subDiogo Sampaio2019-12-301-42/+33
| | | | | | | | | | | | | | | | | | Summary: This patch allows to emit thumb2 add and sub instructions with 12 bit immediates in the emitT2RegPlusImmediate function. - Splitting parts of the D70680 Reviewers: eli.friedman, olista01, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71361
* [MIPS GlobalISel] Select bitreversePetar Avramovic2019-12-301-0/+4
| | | | | | | | | | G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics, clang genrates these intrinsics from __builtin_bitreverse32 and __builtin_bitreverse64. Add lower and narrowscalar for G_BITREVERSE. Lower G_BITREVERSE on MIPS32. Differential Revision: https://reviews.llvm.org/D71363
* [MIPS GlobalISel] Select bswapPetar Avramovic2019-12-302-0/+14
| | | | | | | | | G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates these intrinsics from __builtin_bswap32 and __builtin_bswap64. Add lower and narrowscalar for G_BSWAP. Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later. Differential Revision: https://reviews.llvm.org/D71362
* [PowerPC] Exploit the rlwinm instructions for "and" with constantQingShan Zhang2019-12-302-0/+44
| | | | | | | | | | | | | | | | | | | | For now, PowerPC will using several instructions to get the constant and "and" it with the following case: define i32 @test1(i32 %a) { %and = and i32 %a, -2 ret i32 %and } However, we could exploit it with the rotate mask instructions. MB ME +----------------------+ |xxxxxxxxxxx00011111000| +----------------------+ 0 32 64 Notice that, we can only do it if the MB is larger than 32 and MB <= ME as RLWINM will replace the content of [0 - 32) with [32 - 64) even we didn't rotate it. Differential Revision: https://reviews.llvm.org/D71829
* [X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFCCraig Topper2019-12-291-4/+4
| | | | | These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.
* [X86] Use isOneConstant to simplify some code. NFCCraig Topper2019-12-291-2/+1
|
* [X86] Remove dyn_casts to ConstantSDNode for operand 1 of ↵Craig Topper2019-12-291-108/+99
| | | | | | | | X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.
* [SelectionDAG] Disallow indirect "i" constraintFangrui Song2019-12-2911-22/+1
| | | | | | | | | This allows us to delete InlineAsm::Constraint_i workarounds in SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and TargetLowering::getInlineAsmMemConstraint overrides. They were introduced to X86 in r237517 to prevent crashes for constraints like "=*imr". They were later copied to other targets.
* [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only ↵Craig Topper2019-12-281-2/+3
| | | | | | | | | targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.
* [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in ↵Craig Topper2019-12-281-6/+1
| | | | LowerUINT_TO_FP_i32. NFCI
* [X86] Fix -enable-machine-outliner for x86-32 after D48683Fangrui Song2019-12-281-3/+1
| | | | D48683 accidentally disabled -enable-machine-outliner for x86-32.
* Fix bots after a9ad65a2b34fNemanja Ivanovic2019-12-281-0/+1
| | | | | In the last commit, I neglected to initialize the new subtarget feature I added which caused failures on a few bots. This should fix that.
* [PowerPC] Change default for unaligned FP access for older subtargetsNemanja Ivanovic2019-12-283-1/+10
| | | | | | | | | | | This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554 Some CPU's trap to the kernel on unaligned floating point access and there are kernels that do not handle the interrupt. The program then fails with a SIGBUS according to the PR. This just switches the default for unaligned access to only allow it on recent server CPUs that are known to allow this. Differential revision: https://reviews.llvm.org/D71954
* [PowerPC] Modify the hasSideEffects of some VSX instructions from 1 to 0Kang Zhang2019-12-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen` will set it as true for those instructions which has no match pattern. Below 6 instructions don't set the hasSideEffects flag and don't have match pattern, so their hasSideEffects flag will be set true by llvm-tblgen. But in fact below instructions don't modify any special register and don't have other SideEffects, they shouldn't have SideEffects. This patch is to modify the hasSideEffects of below instructions from 1 to 0. ``` VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX VSPLTBs VSPLTHs ``` Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D71391
* AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constrainingMatt Arsenault2019-12-271-1/+1
| | | | | This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.
* Hexagon: Fix missing tablegen mode commentMatt Arsenault2019-12-271-1/+1
|
* TII: Fix using Register for a subregister index argumentMatt Arsenault2019-12-272-2/+2
|
* AMDGPU: Use RegisterMatt Arsenault2019-12-271-9/+9
|
* AMDGPU/GlobalISel: Fix extra result register in fdiv64 loweringMatt Arsenault2019-12-271-2/+1
| | | | | | There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.
* AMDGPU/GlobalISel: Select some 128-bit load/storesMatt Arsenault2019-12-271-4/+10
|
* AMDGPU: Use correct DebugLocMatt Arsenault2019-12-271-1/+1
|
* [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to ↵Craig Topper2019-12-271-1/+1
| | | | | | | | v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.
* [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.Craig Topper2019-12-271-3/+19
|
* Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821Fangrui Song2019-12-273-35/+0
| | | | | | | Intrinsic has incorrect argument type! i32 (i32*)* @llvm.setjmp *wipes tear*
* [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵Craig Topper2019-12-263-64/+92
| | | | | | | | | | | | | | | | | | | | | | avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
* [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.Craig Topper2019-12-261-0/+12
| | | | | For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.
* [X86] Fix -Wmisleading-indentation after D71892Fangrui Song2019-12-261-0/+1
|
OpenPOWER on IntegriCloud