summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][SVE] Implement several floating-point arithmetic intrinsicsKerry McLaughlin2019-11-012-27/+56
| | | | | | | | | | | | | | | | | | | Summary: Adds intrinsics for the following: - fabd, fadd, fsub & fsubr - fmul, fmulx, fdiv & fdivr - fmax, fmaxnm, fmin & fminnm - fscale & ftsmul Reviewers: huntergr, sdesmalen, dancgr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69657
* AMDGPU: Add default denormal mode to MachineFunctionInfoMatt Arsenault2019-11-013-6/+33
| | | | | | The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.
* [X86] Reland: Enable YMM memcmp with AVX1David Zarzycki2019-11-011-3/+2
| | | | | | | | | | Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. This also updates the memcmp-optsize.ll test unlike the first patch. https://reviews.llvm.org/D69658
* DAG: Add DAG argument to isFPExtFoldableMatt Arsenault2019-10-312-3/+4
| | | | | For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.
* [WebAssembly] SIMD integer min and max instructionsThomas Lively2019-10-311-0/+9
| | | | | | | | | | | | | | | | | | | | | | Summary: Introduces a clang builtins and LLVM intrinsics representing integer min/max instructions. These instructions have not been merged to the SIMD spec proposal yet, so they are currently opt-in only via builtins and not produced by general pattern matching. If these instructions are accepted into the spec proposal the builtins and intrinsics will be replaced with normal pattern matching. Defined in https://github.com/WebAssembly/simd/pull/27. Reviewers: aheejin Reviewed By: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69696
* Reland "[WebAssembly] Handle multiple loads of splatted loads"Thomas Lively2019-10-314-17/+27
| | | | | This reverts commit 92a25fbf11da51c8e3573b81a877d3b226990c07 and fixes the ambiguous method call that was causing build failures.
* Revert "[WebAssembly] Handle multiple loads of splatted loads"Vlad Tsyrklevich2019-10-314-27/+17
| | | | | | | This reverts commit 2ab1b8c1ec452fb743f6cc5051e75a01039cabfe, it is causing build failures on numerous bots, including sanitizer-x86_64-linux-bootstrap-ubsan. My previous revert was for the wrong commit.
* Revert "[WebAssembly] Expand setcc of v2i64"Vlad Tsyrklevich2019-10-312-31/+0
| | | | | | This reverts commit 11850a6305c5778b180243eb06aefe86762dd4ce, it was causing build failures on numerous bots, including sanitizer-x86_64-linux-bootstrap-ubsan.
* Revert "[X86] add mayRaiseFPException flag and FPCW registers for X87 ↵Nico Weber2019-10-312-46/+25
| | | | | | | instructions" This reverts commit a678677da498a45f59c16ee74fea438e34a801ce. It broke CodeGen/ms-inline-asm.c on most bots.
* [X86] add mayRaiseFPException flag and FPCW registers for X87 instructionsCraig Topper2019-10-312-25/+46
| | | | | | | | | This patch adds flag "mayRaiseFPException" , FPCW and FPSW for X87 instructions which could raise float exception. Patch by LiuChen. With a couple small fixes from me. Differential Revision: https://reviews.llvm.org/D68854
* [WebAssembly] Handle multiple loads of splatted loadsThomas Lively2019-10-314-17/+27
| | | | | | | | | | | | | | | | | Summary: Fixes an ISel failure when a splatted load is used more than once. The failure was due to the hacks we were doing in ISel lowering to preserve the original load as the operand of a LOAD_SPLAT node. The fix is to properly lower the splatted use of the load to a separate LOAD_SPLAT node. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69640
* [WebAssembly] Expand setcc of v2i64Thomas Lively2019-10-312-0/+31
| | | | | | | | | | | | | | | | | | | Summary: The SIMD spec does not include i64x2 comparisons, so they need to be expanded. Using setOperationAction to expand them also causes f64x2 comparisons to be expanded, so setCondCodeAction needs to be used instead. But since there are no legal condition codes, the legalizer does not know how to expand the comparisons. We therefore manually unroll the operation, taking care to fill each lane with -1 or 0 rather than 1 or 0 for consistency with the other vector comparisons. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69604
* [X86] Remove FSIN/FCOS isel patterns and the pseudo instructions that they ↵Craig Topper2019-10-313-13/+3
| | | | | | | selected for the FP stackifier. We always expand these to libcalls so get rid of the last vestiges of using the instructions.
* [AArch64] Update for ExynosEvandro Menezes2019-10-313-2/+15
| | | | Fix the costs of `add` and `orr` with an immediate operand.
* Revert rG0e252ae19ff8d99a59d64442c38eeafa5825d441 : [X86] Enable YMM memcmp ↵Simon Pilgrim2019-10-311-2/+3
| | | | | | | | with AVX1 Breaks build bots Differential Revision: https://reviews.llvm.org/D69658
* [X86] Enable YMM memcmp with AVX1David Zarzycki2019-10-311-3/+2
| | | | | | | | Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp. This is a follow up to D68632 which enabled XOR compares which made this possible. https://reviews.llvm.org/D69658
* Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional ↵Simon Pilgrim2019-10-3110-65/+98
| | | | | | destination and source pair as a return value; NFC This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
* [AArch64] Select saturating Neon instructionsDavid Green2019-10-313-1/+31
| | | | | | | | | | | | | | | | | | | | This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent sadd_sat, uadd_sat, ssub_sat and usub_sat nodes. It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better. So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes. Differential Revision: https://reviews.llvm.org/D69374
* DAG: Add new control for ISD::FMAD formationMatt Arsenault2019-10-312-0/+16
| | | | | | | | | For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.
* AMDGPU: Simplify getAddressSpace callsMatt Arsenault2019-10-314-11/+12
| | | | | These can be directly taken from the GlobalValue instead of going through the type.
* [TII] Use optional destination and source pair as a return value; NFCDjordje Todorovic2019-10-3110-98/+65
| | | | | | | | | | Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622
* [X86][SSE] Convert computeZeroableShuffleElements to emit KnownUndef and ↵Simon Pilgrim2019-10-311-23/+35
| | | | KnownZero
* [cfi] Add flag to always generate .debug_frameDavid Candler2019-10-317-20/+12
| | | | | | | | | This adds a flag to LLVM and clang to always generate a .debug_frame section, even if other debug information is not being generated. In situations where .eh_frame would normally be emitted, both .debug_frame and .eh_frame will be used. Differential Revision: https://reviews.llvm.org/D67216
* [AArch64][SVE] Add patterns for some integer vector instructionsEhsan Amiri2019-10-303-40/+85
| | | | | | | | | | | | Add pattern matching for SVE vector instructions: -- add, sub, and, or, xor instructions -- sqadd, uqadd, sqsub, uqsub target-independent intrinsics -- bic intrinsics -- predicated add, sub, subr intrinsics Patch Review: https://reviews.llvm.org/D69128 Patch authored by: dancgr (Danilo Carvalho Grael)
* [X86] Model MXCSR for all SSE instructionsCraig Topper2019-10-304-51/+97
| | | | | | | | | | | | | | | This patch adds MXCSR as a reserved physical register and models its use by X86 SSE instructions. It also adds flag "mayRaiseFPException" for the instructions that possibly can raise FP exception according to the architecture definition. Following what SystemZ and other targets does, only the current rounding modes and the IEEE exception masks are modeled. *Changes* of the MXCSR due to exceptions are not modeled. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D68121
* AMDGPU: Disallow spill folding with m0 copiesMatt Arsenault2019-10-302-0/+42
| | | | | | | | | | readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.
* AMDGPU: Don't fold S_NOPs with implicit operandsMatt Arsenault2019-10-301-1/+3
|
* [X86] Rewrite hasReassociableOperands and setSpecialOperandAttr to not ↵Craig Topper2019-10-301-28/+19
| | | | | | hardcode number of operands or position of the EFLAGS operand. This makes the code immune to the MXCSR addition in D68121.
* [clang][llvm] Obsolete Exynos M1 and M2Evandro Menezes2019-10-305-889/+0
|
* [AArch64] Remove overlapping scheduling definitions (NFC)Evandro Menezes2019-10-301-19/+0
| | | | | | | | | | The scheduling definitions for ASIMD transpose and zipping overlapped with others a few lines below. Somehow, they didn't raise errors before. There seem to be other overlapping definitions. Somehow, they still don't raise errors. Differential revision: https://reviews.llvm.org/D68353
* [X86] Add FIXME comment to merge more of computeZeroableShuffleElements and ↵Simon Pilgrim2019-10-301-0/+1
| | | | getTargetShuffleAndZeroables
* [X86][SSE] combineX86ShuffleChain - use resolveZeroablesFromTargetShuffle ↵Simon Pilgrim2019-10-301-4/+3
| | | | helper. NFCI.
* [AMDGPU] Simplify VCCZ bug handlingJay Foad2019-10-301-5/+1
| | | | | | | | | | | | | | | | | | | | Summary: VCCZBugHandledSet was used to make sure we don't apply the same workaround more than once to a single cbranch instruction, but it's not necessary because the workaround involves inserting an s_waitcnt instruction, which is enough for subsequent iterations to detect that no further workaround is necessary. Also beef up the test case to check that the workaround was only applied once. I have also manually verified that the test still passes even if I hack the big do-while loop in runOnMachineFunction to run a minimum of five iterations. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69621
* Fix pattern error for S2_tstbit_i instructionIkhlas Ajbar2019-10-301-2/+2
| | | | | It used to generate S2_tstbit_i with constant -33 which resulted in an assert. The reason is log2_32 was called with 64bit value 0.
* [AIX] Lowering CPI/JTI/BA to MIRXiangling Liao2019-10-301-6/+6
| | | | | | Enable lowering of constant pool index, jump table index, and bloack address to MIR on AIX. Differential Revision: https://reviews.llvm.org/D69264
* [AArch64][MachineOutliner] Return address signing for outlined functionsDavid Tellenbach2019-10-301-7/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: During AArch64 frame lowering instructions to enable return address signing are inserted into function if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 2. If all candidate functions agree on the signing scope, signing key and and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: olista01, paquette, t.p.northover, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69097
* [ARM][AArch64][DebugInfo] Improve call site instruction interpretationDjordje Todorovic2019-10-304-0/+87
| | | | | | | | | | | | | Extend the describeLoadedValue() with support for target specific ARM and AArch64 instructions interpretation. The patch provides specialization for ADD and SUB operations that include a register and an immediate/offset operand. Some of the instructions can operate with global string addresses or constant pool indexes but such cases are omitted since we currently lack flexible support for processing such operands at DWARF production stage. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D67556
* [AArch64][SVE] Implement masked store intrinsicsKerry McLaughlin2019-10-303-1/+67
| | | | | | | | | | | | | | | | Summary: Adds support for codegen of masked stores, with non-truncating and truncating variants. Reviewers: huntergr, greened, dmgreen, rovka, sdesmalen Reviewed By: dmgreen, sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69378
* [X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check ↵Simon Pilgrim2019-10-301-1/+2
| | | | | | FSHL/FSHR support Remove hard wired legality check.
* [X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of ↵Simon Pilgrim2019-10-301-5/+8
| | | | hardwiring to MVT::i8
* [AArch64][SVE] Implement additional integer arithmetic intrinsicsKerry McLaughlin2019-10-302-16/+28
| | | | | | | | | | | | | | | | | | Summary: Add intrinsics for the following: - sxt[b|h|w] & uxt[b|h|w] - cls & clz - not & cnot Reviewers: huntergr, sdesmalen, dancgr Reviewed By: sdesmalen Subscribers: cameron.mcinally, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69567
* [AMDGPU] Consolidate one more getGeneration checkJay Foad2019-10-301-1/+1
| | | | | This one should have been done in r363902 when hasReadVCCZBug was introduced.
* [Alignment] Use Align for TFI.getStackAlignment() in X86ISelLoweringGuillaume Chatelet2019-10-301-26/+18
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034
* [PowerPC] Clear the sideeffect bit for those instructions that didn't have ↵QingShan Zhang2019-10-303-6/+8
| | | | | | | | | | | | the match pattern If the instruction have match pattern, llvm-tblgen will infer the sideeffect bit from the match pattern and it works well. If not, the tblgen will set it as true that hurt the scheduling. PowerPC has some instructions that didn't specify the match pattern(i.e. LXSD etc), which is manually selected post-ra according to the register pressure. We need to clear the sideeffect flag for these instructions. Differential Revision: https://reviews.llvm.org/D69232
* [X86] Make memcmp vector lowering handle arbitrary expansionsDavid Zarzycki2019-10-301-23/+43
| | | | | | | | | | Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507
* AMDGPU/GlobalISel: Legalize FDIV32Austin Kerbow2019-10-292-0/+101
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581
* [SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and ↵Philip Reames2019-10-291-1/+1
| | | | | | | | | | stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219
* [X86] Narrow i64 compares with constant to i32 when the upper 32-bits are ↵Craig Topper2019-10-291-5/+17
| | | | | | | | | | | | | | | | known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499
* [SVE][AArch64] Adding pattern matching for some SVE instructions.Ehsan Amiri2019-10-291-4/+4
| | | | | | | | | Adding patten matching for two SVE intrinsics: frecps and frsqrts. Also added patterns for fsub and fmul - these SDNodes directly correspond to machine instructions. Review: https://reviews.llvm.org/D68476 Patch authored by mgudim (Mikhail Gudim).
* Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)Sander de Smalen2019-10-294-4/+65
| | | | | | | | | | | | | | | | llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with EXPENSIVE_CHECKS enabled, causing the patch to be reverted in rG2c496bb5309c972d59b11f05aee4782ddc087e71. This patch relands the patch with a proper fix to the live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the callee-saves correctly so that the CalleeSaved info is taken from MIR directly, rather than letting it be recalculated by the PEI pass. I've done this by running `llc -stop-before=prologepilog` on the LLVM IR as captured in the test files, adding the extra MOV instructions that were manually added in the original test file, then running `llc -run-pass=prologepilog` and finally re-added the comments for the MOV instructions.
OpenPOWER on IntegriCloud