summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64] Select saturating Neon instructionsDavid Green2019-10-316-978/+274
| | | | | | | | | | | | | | | | | | | | This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent sadd_sat, uadd_sat, ssub_sat and usub_sat nodes. It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better. So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes. Differential Revision: https://reviews.llvm.org/D69374
* Fix missing memcpy, memmove and memset tail callsSanne Wouda2019-10-311-0/+18
| | | | | | | | | | | | | | | | | | | Summary: If a wrapper around one of the mem* stdlib functions bitcasts the returned pointer value before returning it (e.g. to a wchar_t*), LLVM does not emit a tail call. Add a check for this scenario so that we emit a tail call. Reviewers: wmi, mkuper, ramred01, dmgreen Reviewed By: wmi, dmgreen Subscribers: hiraditya, sanwou01, javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59078
* [MachineOutliner][NFC] Fix FileCheck regex in two of test-casesDavid Tellenbach2019-10-312-7/+7
|
* [AArch64][SVE] Add patterns for some integer vector instructionsEhsan Amiri2019-10-303-0/+497
| | | | | | | | | | | | Add pattern matching for SVE vector instructions: -- add, sub, and, or, xor instructions -- sqadd, uqadd, sqsub, uqsub target-independent intrinsics -- bic intrinsics -- predicated add, sub, subr intrinsics Patch Review: https://reviews.llvm.org/D69128 Patch authored by: dancgr (Danilo Carvalho Grael)
* [GISel][CombinerHelper] Combine shuffle_vector scalar to build_vectorQuentin Colombet2019-10-301-0/+63
| | | | | | | | | | Teach the combiner helper how to replace shuffle_vector of scalars into build_vector. I am not particularly happy about having to add this combine, but we currently get those from <1 x iN> from the IR. Bonus: This fixes an assert in the shuffle_vector combines since before this patch, we were expecting vector types.
* [clang][llvm] Obsolete Exynos M1 and M2Evandro Menezes2019-10-3014-413/+5
|
* [AArch64][MachineOutliner] Return address signing for outlined functionsDavid Tellenbach2019-10-309-0/+725
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: During AArch64 frame lowering instructions to enable return address signing are inserted into function if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 2. If all candidate functions agree on the signing scope, signing key and and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: olista01, paquette, t.p.northover, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69097
* [AArch64][SVE] Implement masked store intrinsicsKerry McLaughlin2019-10-302-21/+193
| | | | | | | | | | | | | | | | Summary: Adds support for codegen of masked stores, with non-truncating and truncating variants. Reviewers: huntergr, greened, dmgreen, rovka, sdesmalen Reviewed By: dmgreen, sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69378
* [AArch64][SVE] Implement additional integer arithmetic intrinsicsKerry McLaughlin2019-10-303-0/+356
| | | | | | | | | | | | | | | | | | Summary: Add intrinsics for the following: - sxt[b|h|w] & uxt[b|h|w] - cls & clz - not & cnot Reviewers: huntergr, sdesmalen, dancgr Reviewed By: sdesmalen Subscribers: cameron.mcinally, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69567
* [SVE][AArch64] Adding pattern matching for some SVE instructions.Ehsan Amiri2019-10-291-0/+104
| | | | | | | | | Adding patten matching for two SVE intrinsics: frecps and frsqrts. Also added patterns for fsub and fmul - these SDNodes directly correspond to machine instructions. Review: https://reviews.llvm.org/D68476 Patch authored by mgudim (Mikhail Gudim).
* Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)Sander de Smalen2019-10-291-0/+92
| | | | | | | | | | | | | | | | llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with EXPENSIVE_CHECKS enabled, causing the patch to be reverted in rG2c496bb5309c972d59b11f05aee4782ddc087e71. This patch relands the patch with a proper fix to the live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the callee-saves correctly so that the CalleeSaved info is taken from MIR directly, rather than letting it be recalculated by the PEI pass. I've done this by running `llc -stop-before=prologepilog` on the LLVM IR as captured in the test files, adding the extra MOV instructions that were manually added in the original test file, then running `llc -run-pass=prologepilog` and finally re-added the comments for the MOV instructions.
* Revert rG70f5aecedef9a6e347e425eb5b843bf797b95319 - "Reland ↵Simon Pilgrim2019-10-291-92/+0
| | | | | | [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)" This fails on EXPENSIVE_CHECKS builds
* [AArch64][GlobalISel] Fix assertion fail in C++ selection for vector zext of ↵Amara Emerson2019-10-281-0/+9
| | | | | | | | <4 x s8> We bailed out of dealing with vectors only after the assertion, move it before. Fixes PR43794
* Convert files added in d157a9bc8ba1 to unix line endings.Nico Weber2019-10-282-172/+172
| | | | | | Ran: git show --diff-filter=A --stat d157a9bc8ba1 | grep '|' | \ awk '{ print $1 }' | xargs dos2unix
* Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)Sander de Smalen2019-10-281-0/+92
| | | | | Fixed up test/DebugInfo/MIR/Mips/live-debug-values-reg-copy.mir that broke r375425.
* Add Windows Control Flow Guard checks (/guard:cf).Andrew Paverd2019-10-282-0/+172
| | | | | | | | | | | | | | | | | | | Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761
* Precommit AArch64 test for -consider-local-interval-costSanne Wouda2019-10-281-0/+393
| | | | | | | | | | | | | | Summary: Precommitting this test makes it more obvious what the delta is of enabling -consider-local-interval-cost in D69437. Reviewers: dmgreen Subscribers: kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69512
* [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLEvhscampos2019-10-281-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250
* [AArch64][SVE] Implement masked load intrinsicsKerry McLaughlin2019-10-283-0/+225
| | | | | | | | | | | | | | | | Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877
* [SDAG] fold insert_vector_elt with undef indexSanjay Patel2019-10-271-13/+11
| | | | | | | | | | | | Similar to: rG4c47617627fb This makes the DAG behavior consistent with IR's insertelement. https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for AArch64 and WebAssembly by replacing undef index operands with something else.
* [GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to ↵Craig Topper2019-10-241-13/+12
| | | | | | | | combine trunc(g_constant). This allows X86 to properly form shift by immediate instructions since we require an 8-bit constant to match the imported SelectionDAG patterns.
* [GISel][CombinerHelper] Add a combine turning shuffle_vector into concat_vectorsQuentin Colombet2019-10-211-0/+353
| | | | | | | | | | Teach the CombinerHelper how to turn shuffle_vectors, that concatenate vectors, into concat_vectors and add this combine to the AArch64 pre-legalizer combiner. Differential Revision: https://reviews.llvm.org/D69149 llvm-svn: 375452
* Reverted r375425 as it broke some buildbots.Sander de Smalen2019-10-211-92/+0
| | | | llvm-svn: 375444
* [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)Sander de Smalen2019-10-211-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit message from D66935: This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*` Changes after D66935: Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return the uninitialized CalleeSavedStackSize when running `llc` on a specific pass where the MIR code has already been expected to have gone through PEI. Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug mode, the compiler will assert the recalculated size equals the cached size as calculated through a call to determineCalleeSaves. This fixes two tests: test/DebugInfo/AArch64/asan-stack-vars.mir test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir that otherwise fail when compiled using msan. Reviewed By: omjavaid, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68783 llvm-svn: 375425
* [AArch64][SVE] Add SPLAT_VECTOR ISD NodeGraham Hunter2019-10-181-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222
* [AArch64] Don't combine callee-save and local stack adjustment when ↵David Green2019-10-181-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | optimizing for size For arm64, D18619 introduced the ability to combine bumping the stack pointer upfront in case it needs to be bumped for both the callee-save area as well as the local stack area. That diff already remarks that "This change can cause an increase in instructions", but argues that even when that happens, it should be still be a performance benefit because the number of micro-ops is reduced. We have observed that this code-size increase can be significant in practice. This diff disables combining stack bumping for methods that are marked as optimize-for-size. Example of a prologue with the behavior before this diff (combining stack bumping when possible): sub sp, sp, #0x40 stp d9, d8, [sp, #0x10] stp x20, x19, [sp, #0x20] stp x29, x30, [sp, #0x30] add x29, sp, #0x30 [... compute x8 somehow ...] stp x0, x8, [sp] And after this diff, if the method is marked as optimize-for-size: stp d9, d8, [sp, #-0x30]! stp x20, x19, [sp, #0x10] stp x29, x30, [sp, #0x20] add x29, sp, #0x20 [... compute x8 somehow ...] stp x0, x8, [sp, #-0x10]! Note that without combining the stack bump there are two auto-decrements, nicely folded into the stp instructions, whereas otherwise there is a single sub sp, ... instruction, but not folded. Patch by Nikolai Tillmann! Differential Revision: https://reviews.llvm.org/D68530 llvm-svn: 375217
* [Codegen] Alter the default promotion for saturating adds and subsDavid Green2019-10-1812-376/+318
| | | | | | | | | | | | | | | | | | The default promotion for the add_sat/sub_sat nodes currently does: ANY_EXTEND iN to iM SHL by M-N [US][ADD|SUB]SAT L/ASHR by M-N If the promoted add_sat or sub_sat node is not legal, this can produce code that effectively does a lot of shifting (and requiring large constants to be materialised) just to use the overflow flag. It is simpler to just do the saturation manually, using the higher bitwidth addition and a min/max against the saturating bounds. That is what this patch attempts to do. Differential Revision: https://reviews.llvm.org/D68926 llvm-svn: 375211
* [AArch64][SVE] Implement unpack intrinsicsKerry McLaughlin2019-10-181-0/+129
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Implements the following intrinsics: - int_aarch64_sve_sunpkhi - int_aarch64_sve_sunpklo - int_aarch64_sve_uunpkhi - int_aarch64_sve_uunpklo This patch also adds AArch64ISD nodes for UNPK instead of implementing the intrinsics directly, as they are required for a future patch which implements the sign/zero extension of legal vectors. This patch includes tests for the Subdivide2Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: greened Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67550 llvm-svn: 375210
* [gicombiner] Add the run-time rule disable optionDaniel Sanders2019-10-171-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Each generated helper can be configured to generate an option that disables rules in that helper. This can be used to bisect rulesets. The disable bits are stored in a SparseVector as this is very cheap for the common case where nothing is disabled. It gets more expensive the more rules are disabled but you're generally doing that for debug purposes where performance is less of a concern. Depends on D68426 Reviewers: volkan, bogner Reviewed By: volkan Subscribers: hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68438 llvm-svn: 375067
* [GISel][CombinerHelper] Add concat_vectors(build_vector, build_vector) => ↵Quentin Colombet2019-10-171-0/+141
| | | | | | | | | | | | | build_vector Teach the combiner helper how to flatten concat_vectors of build_vectors into a build_vector. Add this combine as part of AArch64 pre-legalizer combiner. Differential Revision: https://reviews.llvm.org/D69071 llvm-svn: 375066
* [AArch64] Fix offset calculationShoaib Meenai2019-10-161-0/+17
| | | | | | | | | | | | | | | | | r374772 changed Offset to be an int64_t but left NewOffset as an int. Scale is unsigned, so in the calculation `Offset - NewOffset * Scale`, `NewOffset * Scale` was promoted to unsigned and was then zero-extended to 64 bits, leading to an incorrect computation which manifested as an out-of-memory when building the Swift standard library for Android aarch64. Promote NewOffset to int64_t to fix this, and promote EmittableOffset as well, since its one user passes it to a function which takes an int64_t anyway. Test case based on a suggestion by Sander de Smalen! Differential Revision: https://reviews.llvm.org/D69018 llvm-svn: 375043
* [Codegen] Adjust saturation test. NFC.David Green2019-10-164-0/+332
| | | | | | Add some extra sat tests and adjust some of the existing tests to use signext where it would naturally be. llvm-svn: 375009
* [DebugInfo] Remove some users of DBG_VALUEs IsIndirect fieldJeremy Morse2019-10-152-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch kills off a significant user of the "IsIndirect" field of DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is techncally redundant as it can be expressed by the DIExpression of a DBG_VALUE inst, and it isn't helpful to have two ways of expressing things. Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref to the insts DIExpression. There should now be no appearences of IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter, which is ensured by an assertion in LDVImpl::handleDebugValue. This means we also get to delete the IsIndirect handling in LiveDebugVariables. Tests can be upgraded by for example swapping the following IsIndirect=True DBG_VALUE: DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo) With one where the indirection is in the DIExpression, by _appending_ a deref: DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref) Which both mean the same thing. Most of the test changes in this patch are updates of that form; also some changes in how the textual assembly printer handles these insts. Differential Revision: https://reviews.llvm.org/D68945 llvm-svn: 374877
* [update_mir_test_checks] Handle MI flags properlyRoman Tereshin2019-10-144-20/+20
| | | | | | | | | | | | | | previously we would generate literal check lines w/ no reg-exps for vregs as MI flags (nsw, ninf, etc.) won't be recognized as a part of MI. Fixing that. Includes updating the MIR tests that suffered from the problem. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D68905 llvm-svn: 374829
* Reapply r374743 with a fix for the ocaml bindingJoerg Sonnenberger2019-10-143-17/+2
| | | | | | | | | | | | | | | | | | | Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374784
* [AArch64] Stackframe accesses to SVE objects.Sander de Smalen2019-10-141-4/+213
| | | | | | | | | | | | | | | | | Materialize accesses to SVE frame objects from SP or FP, whichever is available and beneficial. This patch still assumes the objects are pre-allocated. The automatic layout of SVE objects within the stackframe will be added in a separate patch. Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D67749 llvm-svn: 374772
* Revert "Add a pass to lower is.constant and objectsize intrinsics"Dmitri Gribenko2019-10-143-2/+17
| | | | | | | This reverts commit r374743. It broke the build with Ocaml enabled: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218 llvm-svn: 374768
* Add a pass to lower is.constant and objectsize intrinsicsJoerg Sonnenberger2019-10-133-17/+2
| | | | | | | | | | | | | | | | | This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374743
* Revert 374373: [Codegen] Alter the default promotion for saturating adds and ↵David Green2019-10-118-163/+272
| | | | | | | | | subs This commit is not extending the promoted integers as it should. Reverting whilst I look into the details. llvm-svn: 374592
* [GISel][CallLowering] Enable vector support in argument loweringQuentin Colombet2019-10-111-0/+22
| | | | | | | | | | The exciting code is actually already enough to handle the splitting of vector arguments but we were lacking a test case. This commit adds a test case for vector argument lowering involving splitting and enable the related support in call lowering. llvm-svn: 374589
* [AArch64] add tests for (v)select-of-constants; NFCSanjay Patel2019-10-112-0/+820
| | | | | | These are copied from existing test files in x86/PPC. llvm-svn: 374568
* [AArch64][SVE] Implement sdot and udot (lane) intrinsicsKerry McLaughlin2019-10-111-0/+93
| | | | | | | | | | | | | | | | | | | | | Summary: Implements the following arithmetic intrinsics: - int_aarch64_sve_sdot - int_aarch64_sve_sdot_lane - int_aarch64_sve_udot - int_aarch64_sve_udot_lane This patch includes tests for the Subdivide4Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67551 llvm-svn: 374566
* [Codegen] Alter the default promotion for saturating adds and subsDavid Green2019-10-108-272/+163
| | | | | | | | | | | | | | | | | The default promotion for the add_sat/sub_sat nodes currently does: 1. ANY_EXTEND iN to iM 2. SHL by M-N 3. [US][ADD|SUB]SAT 4. L/ASHR by M-N If the promoted add_sat or sub_sat node is not legal, this can produce code that effectively does a lot of shifting (and requiring large constants to be materialised) just to use the overflow flag. It is simpler to just do the saturation manually, using the higher bitwidth addition and a min/max against the saturating bounds. That is what this patch attempts to do. Differential Revision: https://reviews.llvm.org/D68643 llvm-svn: 374373
* [AArch64][x86] add tests for (v)select bit magic; NFCSanjay Patel2019-10-101-0/+95
| | | | llvm-svn: 374334
* [AArch64] Ensure no tagged memory is left in the unallocated portion of theMomchil Velikov2019-10-093-0/+334
| | | | | | | | | | | | | | | | | stack This patch makes sure that if we tag some memory, we untag that memory before the function returns/throws via any exit, reachable from the tag operation. For that we place the untag operation either at: a) the lifetime end call for the alloca, if that call post-dominates the lifetime start call (where the tag operation is placed), or it (the lifetime end call) dominates all reachable exits, otherwise b) at the reachable exits Differential Revision: https://reviews.llvm.org/D68469 llvm-svn: 374182
* Add and adjust saturating tests. NFCDavid Green2019-10-094-14/+134
| | | | | | | This adds some extra testing to the existing [su][add/sub]_sat X86 and AArch64 tests and adds equivalent tests for ARM. llvm-svn: 374169
* (Re)generate various tests. NFCAmaury Sechet2019-10-081-12/+218
| | | | llvm-svn: 374074
* fix fmls fp16Sebastian Pop2019-10-081-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tim Northover remarked that the added patterns for fmls fp16 produce wrong code in case the fsub instruction has a multiplication as its first operand, i.e., all the patterns FMLSv*_OP1: > define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) { > ; CHECK-LABEL: test_FMLSv8f16_OP1: > ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h > entry: > > %mul = fmul fast <8 x half> %c, %b > %sub = fsub fast <8 x half> %mul, %a > ret <8 x half> %sub > } > > This doesn't look right to me. The exact instruction produced is "fmls > v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the > IR is calculating "v2*v1-v0". The equivalent <4 x float> code also > doesn't emit an fmls. This patch generates an fmla and negates the value of the operand2 of the fsub. Inspecting the pattern match, I found that there was another mistake in the opcode to be selected: matching FMULv4*16 should generate FMLSv4*16 and not FMLSv2*32. Tested on aarch64-linux with make check-all. Differential Revision: https://reviews.llvm.org/D67990 llvm-svn: 374044
* [NFC][CGP] Tests for making ICMP_EQ use CR result of ICMP_S(L|G)T dominatorsYi-Hong Lyu2019-10-071-0/+547
| | | | llvm-svn: 373876
* [SelectionDAG] Add tests for LKK algorithmDavid Bolvansky2019-10-054-0/+843
| | | | | | | | | | Added some tests testing urem and srem operations with a constant divisor. Patch by TG908 (Tim Gymnich) Differential Revision: https://reviews.llvm.org/D68421 llvm-svn: 373830
OpenPOWER on IntegriCloud