summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Remove CRBits Copy Of Unset/set CBitStefan Pintilie2019-05-242-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the situation, where we generate the following code: crxor 8, 8, 8 < Some instructions> .LBB0_1: < Some instructions> cror 1, 8, 8 cror (COPY of CRbit) depends on the result of the crxor instruction. CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on any previous instruction. This patch will optimize it to: < Some instructions> .LBB0_1: < Some instructions> cror 1, 1, 1 Patch By: Victor Huang (NeHuang) Differential Revision: https://reviews.llvm.org/D62044 llvm-svn: 361632
* [AArch64][SVE2] Asm: support SVE2 String Processing GroupCullen Rhodes2019-05-242-0/+35
| | | | | | | | | | | | | | Summary: Patch adds support for the SVE2 character match instructions MATCH and NMATCH. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62206 llvm-svn: 361627
* [AArch64][SVE2] Asm: support SVE2 Narrowing GroupCullen Rhodes2019-05-242-0/+118
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 bitwise shift right narrow: * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT, SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB, UQRSHRNT SVE2 integer add/subtract narrow high part: * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT SVE2 saturating extract narrow: * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62205 llvm-svn: 361624
* [AArch64][SVE2] Asm: support SVE2 Accumulate GroupCullen Rhodes2019-05-242-0/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 bitwise shift and insert: * SRI, SLI SVE2 bitwise shift right and accumulate: * SSRA, USRA, SRSRA, URSRA SVE2 complex integer add: * CADD, SQCADD SVE2 integer absolute difference and accumulate: * SABA, UABA SVE2 integer absolute difference and accumulate long: * SABALB, SABALT, UABALB, UABALT SVE2 integer add/subtract long with carry: * ADCLB, ADCLT, SBCLB, SBCLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62204 llvm-svn: 361622
* [SelectionDAG] computeKnownBits - support constant pool values from targetSimon Pilgrim2019-05-242-4/+14
| | | | | | | | | | | | | | | | This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620
* [AArch64][SVE2] Asm: add PMULLB/PMULLT instructionsCullen Rhodes2019-05-242-0/+17
| | | | | | | | | | | | | | | | | Summary: This patch adds support for the polynomial multiplication instructions PMULLB/PMULLT. The 64-bit source and 128-bit destination element variants are enabled with crypto extensions (+sve2-aes), similar to the NEON PMULL2 instruction. All other variants are enabled with +sve2. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62145 llvm-svn: 361619
* [AArch64][SVE2] Asm: add integer add/sub long/wide instructionsCullen Rhodes2019-05-242-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 integer add/subtract long: * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT, SABDLB, SABDLT, UABDLB, UABDLT SVE2 integer add/subtract wide: * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62142 llvm-svn: 361615
* [AArch64][SVE2] Asm: add various bitwise shift instructionsCullen Rhodes2019-05-242-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds support for the SVE2 saturating/rounding bitwise shift left (predicated) group of instructions: * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL, SQSHLR, UQSHLR, SQRSHLR, UQRSHLR Immediate forms of the SQSHL and UQSHL instructions are also added to the existing SVE bitwise shift by immediate (predicated) group, as well as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in this group are encoded similarly and are implemented using the same TableGen class with a minimal change (1 bit in encoding). The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62140 llvm-svn: 361612
* [AArch64][SVE2] Asm: add saturating add/sub instructionsCullen Rhodes2019-05-241-0/+10
| | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * SQADD, UQADD, SUQADD, USQADD * SQSUB, UQSUB, SQSUBR, UQSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62130 llvm-svn: 361611
* [AArch64][SVE2] Asm: fix overlapping bitCullen Rhodes2019-05-241-1/+1
| | | | | | | | | | | | | | | Summary: Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The encodings are not affected as bit 20 is defined by the opc bits and this was overwriting the earlier error of setting bit 20 to 0. Raised by Momchil: https://reviews.llvm.org/D62130 Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62292 llvm-svn: 361609
* GlobalISel: support swifterror attribute on AArch64.Tim Northover2019-05-242-4/+26
| | | | | | | | swifterror marks an argument as a register pretending to be a pointer, so we need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the infrastructure can be reused from the DAG world. llvm-svn: 361608
* [mips] Always check that `shift and add` optimization is efficient.Simon Atanasyan2019-05-241-26/+31
| | | | | | | | | | | | | | The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function to check that breaking down constant multiplications into a series of shifts, adds, and subs is efficient. Unfortunately, this function does not check maximum number of steps on all paths of the algorithm. This patch fixes this bug. Fix for PR41929. Differential Revision: https://reviews.llvm.org/D62166 llvm-svn: 361606
* [ARM] ARMExpandPseudoInsts: add debug messagesSjoerd Meijer2019-05-241-2/+16
| | | | | | | | | | | This pass wasn't printing any messages at all, which I find really inconvenient while debugging/tracing things. It now dumps the before and after of expanded instructions. It doesn't do this yet for all instructions, but this is a good start I guess. Differential Revision: https://reviews.llvm.org/D62297 llvm-svn: 361604
* [Power9] Add a specific heuristic to schedule the addi before the loadQingShan Zhang2019-05-242-0/+58
| | | | | | | | | | When we are scheduling the load and addi, if all other heuristic didn't take effect, we will try to schedule the addi before the load, to hide the latency, and avoid the true dependency added by RA. And this only take effects for Power9. Differential Revision: https://reviews.llvm.org/D61930 llvm-svn: 361600
* [AArch64] Preserve X8 for thunks ending in variadic musttail callsReid Kleckner2019-05-241-0/+6
| | | | | | | | | | | | | | | | | | | Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585
* [AArch64] Add nvcast patterns for v2f32 -> v1f64Serge Pavlov2019-05-241-0/+1
| | | | | | | | | | | | | | Summary: Constant stores of f32 values can create such NvCast nodes. Reviewers: t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62285 llvm-svn: 361584
* [WebAssembly] Expand more SIMD float opsThomas Lively2019-05-241-1/+2
| | | | | | | | | | | | | | Summary: These were previously causing ISel failures. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62354 llvm-svn: 361577
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-234-28/+14
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without ↵Robert Lougher2019-05-231-4/+10
| | | | | | | | | | | | | | | | | | | | debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527
* [WebAssembly] Implement ReplaceNodeResults to fix a SIMD crashThomas Lively2019-05-232-0/+18
| | | | | | | | | | | | Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61037 llvm-svn: 361526
* AMDGPU/GlobalISel: Legality for integer min/maxMatt Arsenault2019-05-232-0/+30
| | | | llvm-svn: 361519
* [WebAssembly] Add multivalue and tail-call target featuresThomas Lively2019-05-233-7/+22
| | | | | | | | | | | | | | | | Summary: These features will both be implemented soon, so I thought I would save time by adding the boilerplate for both of them at the same time. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62047 llvm-svn: 361516
* [RISCV] Support assembling TLS LA pseudo instructionsLewis Revill2019-05-232-0/+53
| | | | | | | | | This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in the initial-exec and global-dynamic TLS models respectively when addressing a global. The pseudo instructions are expanded in the assembly parser. llvm-svn: 361499
* [ARM][CGP] Clear SafeWrap before each searchSam Parker2019-05-231-0/+1
| | | | | | | | | | | The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462
* [WebAssembly] Implement __builtin_return_address for emscriptenThomas Lively2019-05-233-3/+33
| | | | | | | | | | | | | | | | | | | | | Summary: In this patch, `ISD::RETURNADDR` is lowered on the emscripten target to the new Emscripten runtime function `emscripten_return_address`, which implements the functionality. Patch by Guanzhong Chen Reviewers: tlively, aheejin Reviewed By: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62210 llvm-svn: 361454
* [X86] Support -fno-plt __tls_get_addr callsFangrui Song2019-05-231-51/+72
| | | | | | | | | | | | | | | | | In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. * x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453
* [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵Craig Topper2019-05-223-143/+8
| | | | | | | | | | | | | | | | | | | | | | Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
* Reverted r361134 because of a failing test left unattended for a long time.Galina Kistanova2019-05-222-1/+2
| | | | | | | | http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio Failing Tests (1): LLVM :: CodeGen/AMDGPU/regbank-reassign.mir llvm-svn: 361430
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-222-72/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [DebugInfo][AArch64] Recognise target specific instruction as mov instrAlexey Lapshin2019-05-222-0/+32
| | | | | | | | | | | | | | This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714. Specifically, Simple Register Coalescing creates following conversion : undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24; It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis is not able to create debug location record for that instruction. So the problem is in that debug info for argc variable is incorrect. The fix is to write custom isCopyInstrImpl() which would recognize the ORRWrs instr. llvm-svn: 361417
* AMDGPU: Move disassembler support check to constructorMatt Arsenault2019-05-221-5/+6
| | | | | | Don't check for unsupported targets for every instruction. llvm-svn: 361406
* MC: Allow getMaxInstLength to depend on the subtargetMatt Arsenault2019-05-227-13/+42
| | | | | | | | | | | | Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405
* [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiersDmitry Preobrazhensky2019-05-221-34/+32
| | | | | | | | | | See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61012 llvm-svn: 361386
* [Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI.Simon Pilgrim2019-05-221-2/+3
| | | | | | Fixes scan-build warning. llvm-svn: 361375
* [TargetMachine] error message unsupported code modelSjoerd Meijer2019-05-224-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370
* [PPC64] Parse -elfv1 -elfv2 when specified on target tripleFangrui Song2019-05-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355
* [AArch64] Subtarget crypto extension defaultsSjoerd Meijer2019-05-221-6/+6
| | | | | | | | | The Armv8.2-A crypto extensions all defaulted to true, but should default to false, like all the other extensions. Differential Revision: https://reviews.llvm.org/D62180 llvm-svn: 361354
* [X86] Don't compare i128 through vector if construction not cheap (PR41971)Nikita Popov2019-05-221-3/+8
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352
* [PowerPC] use meaningful name for displacement form aligned with x-form - NFCChen Zheng2019-05-224-81/+81
| | | | llvm-svn: 361347
* [PowerPC] [ISEL] select x-form instruction for unaligned offsetChen Zheng2019-05-226-75/+125
| | | | | | Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346
* [X86] [CET] Deal with return-twice function such as vfork, setjmp whenPengfei Wang2019-05-221-12/+30
| | | | | | | | | | | | | | CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342
* AMDGPU: Assume calls read execMatt Arsenault2019-05-211-0/+4
| | | | llvm-svn: 361333
* AMDGPU: Assume call pseudos are convergentMatt Arsenault2019-05-211-0/+6
| | | | | | | There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331
* AMDGPU: Fix not marking new gfx10 SGPRs as CSRsMatt Arsenault2019-05-211-3/+3
| | | | llvm-svn: 361330
* [WebAssembly] Add the signature for the new llround builtin functionDan Gohman2019-05-211-0/+22
| | | | | | | | | | | | | r360889 added new llround builtin functions. This patch adds their signatures for the WebAssembly backend. It also adds wasm32 support to utils/update_llc_test_checks.py, since that's the script other targets are using for their testcases for this feature. Differential Revision: https://reviews.llvm.org/D62207 llvm-svn: 361327
* [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFCCraig Topper2019-05-211-2/+1
| | | | | | We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288
* [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP supportSimon Pilgrim2019-05-211-0/+9
| | | | | | Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270
* [PPC64] Update LocalEntry from assigned symbolsFangrui Song2019-05-211-6/+24
| | | | | | | | | | | | | | | | | On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local. The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point. In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference. This change keeps track of these assignments, and update all needed st_other fields when finish() is called. Patch by Leandro Lupori! Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D56586 llvm-svn: 361237
* [AArch64] Skip mask checks for masks with an odd number of elements.Florian Hahn2019-05-211-0/+6
| | | | | | | | | | | | | | | | | Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235
* [AArch64][SVE2] Asm: add integer unary instructions (predicated)Cullen Rhodes2019-05-212-0/+42
| | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * URECPE, URSQRTE, SQABS, SQNEG The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62129 llvm-svn: 361230
OpenPOWER on IntegriCloud