summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [SLP] Replace NeedToGather variable with enum.Dinar Temirbulatov2019-12-231-22/+34
|
* [AVR] Fix codegen for rotate instructionsJim Lin2019-12-233-4/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces the ROLBRd and RORBRd pseudo-instructions, which implemenent the "traditional" rotate operations; instead of the AVR rotate instructions that use the carry bit. The code is not optimized at all. Especially when dealing with loops of rotate instructions, this codegen should be improved some day. Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358> //Note//: This is my first submitted patch. Reviewers: dylanmckay, Jim Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels Tags: #llvm Patched by dsprenkels (Daan Sprenkels) Differential Revision: https://reviews.llvm.org/D60365
* [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotationKai Luo2019-12-232-1/+21
| | | | | | | | | Summary: Currently, we set legalization action of `ISD::ROTL` vectors as `Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)` to lower `ISD::ROTL` directly. Differential Revision: https://reviews.llvm.org/D71324
* reland "[DebugInfo] Support to emit debugInfo for extern variables"Yonghong Song2019-12-222-3/+4
| | | | | | | | | | | | | Commit d77ae1552fc21a9f3877f3ed7e13d631f517c825 ("[DebugInfo] Support to emit debugInfo for extern variables") added deebugInfo for extern variables for BPF target. The commit is reverted by 891e25b02d760d0de18c7d46947913b3166047e7 as the committed tests using %clang instead of %clang_cc1 causing test failed in certain scenarios as reported by Reid Kleckner. This patch fixed the tests by using %clang_cc1. Differential Revision: https://reviews.llvm.org/D71818
* [DAGCombiner] Check term use before applying aggressive FSUB optimisationsCarl Ritson2019-12-231-3/+6
| | | | | | | | | | | | | | | | Summary: Without this check unnecessary FMA instructions are generated when the FSUB terms are reused. This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation. Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel Reviewed By: arsenm Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71656
* Revert "[DebugInfo] Support to emit debugInfo for extern variables"Reid Kleckner2019-12-222-4/+3
| | | | | | | This reverts commit d77ae1552fc21a9f3877f3ed7e13d631f517c825. The tests committed along with this change do not pass, and should be changed to use %clang_cc1.
* [SelectionDAG] Copy FP flags when visiting a binary instruction.Valentin Churavy2019-12-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We noticed in Julia that the sequence below no longer turned into a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6. ``` %29 = fmul contract <4 x double> %wide.load, %wide.load16 %30 = fmul contract <4 x double> %wide.load13, %wide.load17 %31 = fmul contract <4 x double> %wide.load14, %wide.load18 %32 = fmul contract <4 x double> %wide.load15, %wide.load19 %33 = fadd fast <4 x double> %vec.phi, %29 %34 = fadd fast <4 x double> %vec.phi10, %30 %35 = fadd fast <4 x double> %vec.phi11, %31 %36 = fadd fast <4 x double> %vec.phi12, %32 ``` Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function attribute, but rather emits more local instruction flags. This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further. Reviewers: spatel, mcberg2017 Reviewed By: spatel Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71495
* Revert "[ARM][TypePromotion] Enable by default"Reid Kleckner2019-12-221-21/+3
| | | | | | | | This reverts commit ee7579409b7d940c4e1314d126e900db30c4edff. It causes crashes during ThinLTO. I suspect the issue is related to races on the global TypeSize variable, which is 80 at the time of the crash.
* [AMDGPU] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-222-4/+4
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71815
* [Hexagon] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-10/+10
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71814
* [NVPTX] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-1/+1
| | | | | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Also removed the top-level const as requested by Aaron Ballman in similar patches. Differential Revision: https://reviews.llvm.org/D71812
* [PowerPC] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-3/+3
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71811
* [Transforms] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-6/+6
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71810
* [InstCombine] enhance fold for copysign with known sign argSanjay Patel2019-12-221-8/+12
| | | | | This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵Eric Astor2019-12-226-18/+81
| | | | | | | | | | | | | | | | | | | | assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
* [AArch64] match splat of bitcasted extract subvector to DUPLANESanjay Patel2019-12-221-7/+43
| | | | | | | | | | This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672
* DebugInfo: Remove out of date commentDavid Blaikie2019-12-211-4/+0
|
* Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.Simon Pilgrim2019-12-211-1/+1
|
* [InstCombine] check alloc size in bitcast of geps fold (PR44321)Sanjay Patel2019-12-211-4/+6
| | | | | | | | | | We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771
* [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transformsSanjay Patel2019-12-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706
* [AArch64] Respect reserved registers while renaming in LdSt opt.Florian Hahn2019-12-211-1/+4
| | | | | | We cannot pick reserved registers as rename registers. Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-5/+5
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xorMatt Arsenault2019-12-211-0/+5
|
* AMDGPU/GlobalISel: Simplify codeMatt Arsenault2019-12-211-5/+5
| | | | | This can directly access the register bank, and doesn't need to get it through the ID.
* [ORC] De-register eh-frames in the RTDyldObjectLinkingLayer destructor.Lang Hames2019-12-201-0/+6
| | | | | This matches the behavior of the legacy layer, which automatically deregistered frames.
* [NFC][MachineOutliner] Rewrite setSuffixIndices to be iterativeJessica Paquette2019-12-201-18/+25
| | | | | | | | Having this function be recursive could use up way too much stack space. Rewrite it as an iterative traversal in the tree instead to prevent this. Fixes PR44344.
* [DWARF] Defer creating declaration DIEs until we prepare call site infoVedant Kumar2019-12-201-7/+0
| | | | | | | | It isn't necessary to create DIEs for all of the declaration subprograms in a CU's retainedTypes list. We can defer creating these subprograms until we need to prepare a call site tag that refers to one. This cleanup was mentioned in passing in D70350.
* Reland: [DWARF] Allow cross-CU references of subprogram definitionsVedant Kumar2019-12-204-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows a call site tag in CU A to reference a callee DIE in CU B without resorting to creating an incomplete duplicate DIE for the callee inside of CU A. We already allow cross-CU references of subprogram declarations, so it doesn't seem like definitions ought to be special. This improves entry value evaluation and tail call frame synthesis in the LTO setting. During LTO, it's common for cross-module inlining to produce a call in some CU A where the callee resides in a different CU, and there is no declaration subprogram for the callee anywhere. In this case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram in order to fill in the call site tag. That empty 'definition' defeats entry value evaluation etc., because the debugger can't figure out what it means. As a follow-up, maybe we could add a DWARF verifier check that a DW_TAG_subprogram at least has a DW_AT_name attribute. Update: Reland with a fix to create a declaration DIE when the declaration is missing from the CU's retainedTypes list. The declaration is left out of the retainedTypes list in two cases: 1) Re-compiling pre-r266445 bitcode (in which declarations weren't added to the retainedTypes list), and 2) Doing LTO function importing (which doesn't update the retainedTypes list). It's possible to handle (1) and (2) by modifying the retainedTypes list (in AutoUpgrade, or in the LTO importing logic resp.), but I don't see an advantage to doing it this way, as it would cause more DWARF to be emitted compared to creating the declaration DIEs lazily. Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a ReleaseLTO-g build of the test suite. rdar://46577651, rdar://57855316, rdar://57840415 Differential Revision: https://reviews.llvm.org/D70350
* [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly ↵Yury Delendik2019-12-2012-3/+96
| | | | | | | | | | | | | | | | | | operands locations Extends DWARF expression language to express locals/globals locations. (via target-index operands atm) (possible variants are: non-virtual registers or address spaces) The WebAssemblyExplicitLocals can replace virtual registers to targertindex operand type at the time when WebAssembly backend introduces {get,set,tee}_local instead of corresponding virtual registers. Reviewed By: aprantl, dschuff Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D52634
* [InstCombine] Improve infinite loop detectionJakub Kuderski2019-12-201-4/+15
| | | | | | | | | | | | | | | | | | | Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673
* Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysrootAdrian Prantl2019-12-207-19/+19
| | | | | | | | | | | | This is a purely cosmetic change that is NFC in terms of the binary output. I bugs me that I called the attribute DW_AT_LLVM_isysroot since the "i" is an artifact of GCC command line option syntax (-isysroot is in the category of -i options) and doesn't carry any useful information otherwise. This attribute only appears in Clang module debug info. Differential Revision: https://reviews.llvm.org/D71722
* Add parentheses to silence warningBill Wendling2019-12-201-2/+2
|
* More style cleanups following rG14fc20ca6282 [NFC]Philip Reames2019-12-201-34/+28
| | | | | | | Demote member functions to static functions where possible Use early continue/early return to reduce nesting Clarify comments slightly. Reuse previously define expression in one case.
* Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282Philip Reames2019-12-201-6/+6
| | | | Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
* Align branches within 32-Byte boundary (NOP padding)Philip Reames2019-12-204-1/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARNING: If you're looking at this patch because you're looking for a full performace mitigation of the Intel JCC Erratum, this is not it! This is a preliminary patch on the patch towards mitigating the performance regressions caused by Intel's microcode update for Jump Conditional Code Erratum. For context, see: https://www.intel.com/content/www/us/en/support/articles/000055650.html The patch adds the required assembler infrastructure and command line options needed to exercise the logic for INTERNAL TESTING. These are NOT public flags, and should not be used for anything other than LLVM's own testing/debugging purposes. They are likely to change both in spelling and meaning. WARNING: This patch is knowingly incorrect in some cornercases. We need, and do not yet provide, a mechanism to selective enable/disable the padding. Conversation on this will continue in parellel with work on extending this infrastructure to support prefix padding. The goal here is to have the assembler align specific instructions such that they neither cross or end at a 32 byte boundary. The impacted instructions are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Indirect jump. e. Ret. f. Call. The new options for llvm-mc are: -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary. -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align. A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit NOP to align the fused/unfused branch. alignBranchesBegin inserts MCBoundaryAlignFragment before instructions, alignBranchesEnd marks the end of the branch to be aligned, relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch. Nop padding is disabled when the instruction may be rewritten by the linker, such as TLS Call. Process Note: I am landing a patch by skan as it has been LGTMed, and continuing to iterate on the review is simply slowing us down at this point. We can and will continue to iterate in tree. Patch By: skan Differential Revision: https://reviews.llvm.org/D70157
* [PPC32] Emit R_PPC_PLTREL24 for calls to dso_local ifuncFangrui Song2019-12-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | static void *ifunc(void) __attribute__((ifunc("resolver"))); void foo() { ifunc(); } The relocation produced by the ifunc() call: 1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000 3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 4. clang -msecure-plt -fPIE => R_PPC_REL24 4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a function, both GNU ld and lld (after D71621 fix) may produce a wrong result. This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC. Both GNU ld and lld (after D71621) will be happy. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D71649
* [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables ↵Craig Topper2019-12-201-19/+23
| | | | | | | | | | | | | | before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736
* [AArch64][SVE] Replace integer immediate intrinsics with splat vector variantDanilo Carvalho Grael2019-12-202-22/+39
| | | | | | | | | | | | Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71614
* [SystemZ] Add a mapping from "select register" to "load on condition" (2-addr).Jonas Paulsson2019-12-204-81/+60
| | | | | | | | | | | | | | | | | | The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux) instructions whenever one of the sources are the same reg as dest. By adding this mapping in getTwoOperandOpcode(), we get: - Two-address hints in getRegAllocationHints() for select register instructions. - No need anymore for special handling in SystemZShortenInst.cpp - shortenSelect() removed. The two-address hints are now added before the GRX32 hints, which should be preferred. Review: Ulrich Weigand https://reviews.llvm.org/D68870
* llvm-symbolizer: support DW_FORM_loclistx locations.Evgenii Stepanov2019-12-202-31/+27
| | | | | | | | | | | | | | | | Summary: With -gdwarf-5 local variable locations are emitted as DW_FORM_loclistx form instead of the regular DW_FORM_sec_offset. Teach DWARFDie::getLocations to understand the new format and use it in llvm-symbolizer "FRAME" command. Reviewers: pcc, jdoerfert Subscribers: srhines, aprantl, hiraditya, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70756
* Temporarily revert "Reapply [LVI] Normalize pointer behavior" and "[LVI] ↵Jordan Rupprecht2019-12-201-134/+174
| | | | | | Restructure caching" This reverts commits 7e18aeba5062cd4324a9efb7bc25c9dbc4a34c2c (D70376) 21fbd5587cdfa11dabb3aeb0ead2d3d5fd0b490d (D69914) due to increased memory usage.
* [SystemZ] Bugfix and improve the handling of CC values.Jonas Paulsson2019-12-206-33/+141
| | | | | | | | | | | | | | | | | It was recently discovered that the handling of CC values was actually broken since overflow was not properly handled ('nsw' flag not checked for). Add and sub instructions now have a new target specific instruction flag named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used instead of a compare with 0, but only if the instruction has the 'nsw' flag set. This patch also adds the improvements of conversion to logical instructions and the analyzing of add with immediates, to be able to eliminate more compares. Review: Ulrich Weigand https://reviews.llvm.org/D66868
* Revert "[ARM] Improve codegen of volatile load/store of i64"Victor Campos2019-12-206-158/+6
| | | | This reverts commit bbcf1c3496ce2bd1ed87e8fb15ad896e279633ce.
* [SystemZ][FPEnv] Enable strict vector FP extends/truncationsUlrich Weigand2019-12-204-13/+66
| | | | | | | | | | | | | | The back-end currently has special DAGCombine code to detect cases where two floating-point extend or truncate operations can be combined into a single vector operation. This patch extends that support to also handle strict FP operations. Note that currently only the case where both operations have the same input chain are supported. This already suffices to cover the common case where the operations result from scalarizing a non-legal vector type. More general cases can be supported in the future.
* [AArch64][SVE] Correct intrinsics and patterns for logical predicate ↵Paul Walker2019-12-201-17/+17
| | | | | | | | | | | | | | | | | | | | | | | instructions In general SVE intrinsics are considered predicated and merging with everything else having suitable decoration. For predicated zeroing operations (like the predicate logical instructions) we use the "_z" suffix. After this change all intrinsics use their expected names (i.e. orr instead of or and eor instead of xor). I've removed intrinsics and patterns for condition code setting instructions as that data is not returned as part of the intrinsic. The expectation is to ask for a cc flag explicitly. For example: a = and_z(pg, p1, p2) cc = ptest_<flag>(pg, a) With the code generator expected to use "s" variants of instructions when available. Differential Revision: https://reviews.llvm.org/D71715
* [OPT-DBG] Teach DbgEntityHistoryCalculator about meta-instructions.Tom Weaver2019-12-201-1/+3
| | | | | | | | | | | | | | The calculator was considering instructions such as KILLs as clobbers of a physical address. This is wrong as meta instructions such as KILLs produce no output in the final program and thus don't clobber or change any physical location's value. As a result they're safe to ignore whilst calculating location list ranges. reviewers: aprantl, vsk diff revision: https://reviews.llvm.org/D70497 fixes: https://bugs.llvm.org/show_bug.cgi?id=38753
* [LV] Strip wrap flags from vectorized reductionsAyal Zaks2019-12-201-1/+39
| | | | | | | | | | | | A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563
* [AArch64][SVE] Fold constant multiply of element countCullen Rhodes2019-12-203-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: E.g. %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31) %mul = mul i64 %0, <const> Should emit: cntw x0, all, mul #<const> For <const> in the range 1-16. Patch by Kerry McLaughlin Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71014
* [AArch64][SVE] Add intrnisics for saturating scalar arithmeticAndrzej Warzynski2019-12-202-69/+137
| | | | | | | | | | | | | | | | | | | | | | Summary: The following intrnisics are added: * @llvm.aarch64.sve.sqdec{b|h|w|d|p} * @llvm.aarch64.sve.sqinc{b|h|w|d|p} * @llvm.aarch64.sve.uqdec{b|h|w|d|p} * @llvm.aarch64.sve.uqinc{b|h|w|d|p} For every intrnisic there a scalar variants (with n32 or n64 suffix) and vector variants (no suffix). Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen, efriedma Subscribers: eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71252
* Recommit "[AArch64][SVE] Add permutation and selection intrinsics"Cullen Rhodes2019-12-205-42/+264
| | | | | | | | Recommit 23c28c40436143006be740533375c036d11c92cd (reverted in dcb48f50bdfa0fa47b62d089b6ed999d857fc9f8) with a fix for an assert "Request for a fixed size on a scalable object" being triggered in `LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the TypeSize object.
OpenPOWER on IntegriCloud