summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Preserve X8 for thunks ending in variadic musttail callsReid Kleckner2019-05-241-0/+6
| | | | | | | | | | | | | | | | | | | Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585
* [AArch64] Add nvcast patterns for v2f32 -> v1f64Serge Pavlov2019-05-241-0/+1
| | | | | | | | | | | | | | Summary: Constant stores of f32 values can create such NvCast nodes. Reviewers: t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62285 llvm-svn: 361584
* [WebAssembly] Expand more SIMD float opsThomas Lively2019-05-241-1/+2
| | | | | | | | | | | | | | Summary: These were previously causing ISel failures. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62354 llvm-svn: 361577
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-234-28/+14
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without ↵Robert Lougher2019-05-231-4/+10
| | | | | | | | | | | | | | | | | | | | debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527
* [WebAssembly] Implement ReplaceNodeResults to fix a SIMD crashThomas Lively2019-05-232-0/+18
| | | | | | | | | | | | Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61037 llvm-svn: 361526
* AMDGPU/GlobalISel: Legality for integer min/maxMatt Arsenault2019-05-232-0/+30
| | | | llvm-svn: 361519
* [WebAssembly] Add multivalue and tail-call target featuresThomas Lively2019-05-233-7/+22
| | | | | | | | | | | | | | | | Summary: These features will both be implemented soon, so I thought I would save time by adding the boilerplate for both of them at the same time. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62047 llvm-svn: 361516
* [RISCV] Support assembling TLS LA pseudo instructionsLewis Revill2019-05-232-0/+53
| | | | | | | | | This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in the initial-exec and global-dynamic TLS models respectively when addressing a global. The pseudo instructions are expanded in the assembly parser. llvm-svn: 361499
* [ARM][CGP] Clear SafeWrap before each searchSam Parker2019-05-231-0/+1
| | | | | | | | | | | The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462
* [WebAssembly] Implement __builtin_return_address for emscriptenThomas Lively2019-05-233-3/+33
| | | | | | | | | | | | | | | | | | | | | Summary: In this patch, `ISD::RETURNADDR` is lowered on the emscripten target to the new Emscripten runtime function `emscripten_return_address`, which implements the functionality. Patch by Guanzhong Chen Reviewers: tlively, aheejin Reviewed By: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62210 llvm-svn: 361454
* [X86] Support -fno-plt __tls_get_addr callsFangrui Song2019-05-231-51/+72
| | | | | | | | | | | | | | | | | In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. * x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453
* [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵Craig Topper2019-05-223-143/+8
| | | | | | | | | | | | | | | | | | | | | | Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
* Reverted r361134 because of a failing test left unattended for a long time.Galina Kistanova2019-05-222-1/+2
| | | | | | | | http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio Failing Tests (1): LLVM :: CodeGen/AMDGPU/regbank-reassign.mir llvm-svn: 361430
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-222-72/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [DebugInfo][AArch64] Recognise target specific instruction as mov instrAlexey Lapshin2019-05-222-0/+32
| | | | | | | | | | | | | | This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714. Specifically, Simple Register Coalescing creates following conversion : undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24; It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis is not able to create debug location record for that instruction. So the problem is in that debug info for argc variable is incorrect. The fix is to write custom isCopyInstrImpl() which would recognize the ORRWrs instr. llvm-svn: 361417
* AMDGPU: Move disassembler support check to constructorMatt Arsenault2019-05-221-5/+6
| | | | | | Don't check for unsupported targets for every instruction. llvm-svn: 361406
* MC: Allow getMaxInstLength to depend on the subtargetMatt Arsenault2019-05-227-13/+42
| | | | | | | | | | | | Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405
* [AMDGPU][MC] Corrected parsing of op_sel* and neg_* modifiersDmitry Preobrazhensky2019-05-221-34/+32
| | | | | | | | | | See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61012 llvm-svn: 361386
* [Hexagon] assert getRegisterBitWidth returns non-zero value. NFCI.Simon Pilgrim2019-05-221-2/+3
| | | | | | Fixes scan-build warning. llvm-svn: 361375
* [TargetMachine] error message unsupported code modelSjoerd Meijer2019-05-224-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370
* [PPC64] Parse -elfv1 -elfv2 when specified on target tripleFangrui Song2019-05-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355
* [AArch64] Subtarget crypto extension defaultsSjoerd Meijer2019-05-221-6/+6
| | | | | | | | | The Armv8.2-A crypto extensions all defaulted to true, but should default to false, like all the other extensions. Differential Revision: https://reviews.llvm.org/D62180 llvm-svn: 361354
* [X86] Don't compare i128 through vector if construction not cheap (PR41971)Nikita Popov2019-05-221-3/+8
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352
* [PowerPC] use meaningful name for displacement form aligned with x-form - NFCChen Zheng2019-05-224-81/+81
| | | | llvm-svn: 361347
* [PowerPC] [ISEL] select x-form instruction for unaligned offsetChen Zheng2019-05-226-75/+125
| | | | | | Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346
* [X86] [CET] Deal with return-twice function such as vfork, setjmp whenPengfei Wang2019-05-221-12/+30
| | | | | | | | | | | | | | CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342
* AMDGPU: Assume calls read execMatt Arsenault2019-05-211-0/+4
| | | | llvm-svn: 361333
* AMDGPU: Assume call pseudos are convergentMatt Arsenault2019-05-211-0/+6
| | | | | | | There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331
* AMDGPU: Fix not marking new gfx10 SGPRs as CSRsMatt Arsenault2019-05-211-3/+3
| | | | llvm-svn: 361330
* [WebAssembly] Add the signature for the new llround builtin functionDan Gohman2019-05-211-0/+22
| | | | | | | | | | | | | r360889 added new llround builtin functions. This patch adds their signatures for the WebAssembly backend. It also adds wasm32 support to utils/update_llc_test_checks.py, since that's the script other targets are using for their testcases for this feature. Differential Revision: https://reviews.llvm.org/D62207 llvm-svn: 361327
* [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFCCraig Topper2019-05-211-2/+1
| | | | | | We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288
* [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP supportSimon Pilgrim2019-05-211-0/+9
| | | | | | Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270
* [PPC64] Update LocalEntry from assigned symbolsFangrui Song2019-05-211-6/+24
| | | | | | | | | | | | | | | | | On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local. The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point. In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference. This change keeps track of these assignments, and update all needed st_other fields when finish() is called. Patch by Leandro Lupori! Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D56586 llvm-svn: 361237
* [AArch64] Skip mask checks for masks with an odd number of elements.Florian Hahn2019-05-211-0/+6
| | | | | | | | | | | | | | | | | Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235
* [AArch64][SVE2] Asm: add integer unary instructions (predicated)Cullen Rhodes2019-05-212-0/+42
| | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * URECPE, URSQRTE, SQABS, SQNEG The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62129 llvm-svn: 361230
* [AArch64][SVE2] Asm: add integer pairwise arithmetic instructionsCullen Rhodes2019-05-211-0/+7
| | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: ADDP, SMAXP, UMAXP, SMINP, UMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62128 llvm-svn: 361229
* [ARM][CGP] Skip nuw in PrepareConstantsSam Parker2019-05-211-72/+52
| | | | | | | | | | | | | | | PrepareConstants step converts add/sub with 'negative' immediates to sub/add with a 'positive' imm to make promotion more simple. nuw already states that the add shouldn't cause an unsigned wrap, so it shouldn't need any tweaking. Plus, we also don't allow a sub with a 'negative' immediate to be safe wrap, so this functionality has been removed. The PrepareConstants step now just handles the add instructions that we've determined would be safe if they wrap around zero. Differential Revision: https://reviews.llvm.org/D62057 llvm-svn: 361227
* Add TargetLoweringInfo hook for explicitly setting the ABI calling ↵Dylan McKay2019-05-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222
* [PowerPC] use more meaningful name - NFCChen Zheng2019-05-211-6/+7
| | | | llvm-svn: 361218
* AMDGPU: Force skip branches over callsMatt Arsenault2019-05-201-1/+1
| | | | | | | | | | | | | | Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202
* [AArch64] Handle lowering lround on windows, where long is 32 bitMartin Storsjo2019-05-201-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D62108 llvm-svn: 361192
* [AMDGPU] Fix std::array initializers to avoid warnings with older tool ↵Bjorn Pettersson2019-05-201-2/+2
| | | | | | | | | | | | | | | chains. NFC A std::array is implemented as a template with an array inside a struct. Older versions of clang, like 3.6, require an extra set of curly braces around std::array initializations to avoid warnings. The C++ language was changed regarding this by CWG 1270. So more modern tool chains does not complaing even if leaving out one level of braces. llvm-svn: 361171
* R600: Fix unconditional return in loopMatt Arsenault2019-05-201-10/+5
| | | | llvm-svn: 361167
* [AArch64][SVE2] Asm: add SADALP and UADALP instructionsCullen Rhodes2019-05-202-0/+31
| | | | | | | | | | | | | | | Summary: This patch adds support for the integer pairwise add and accumulate long instructions SADALP/UADALP. These instructions are predicated. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62001 llvm-svn: 361154
* [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)Petar Jovanovic2019-05-202-6/+3
| | | | | | | | | | | Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137
* [AArch64][SVE2] Asm: add int halving add/sub (predicated) instructionsCullen Rhodes2019-05-202-0/+43
| | | | | | | | | | | | | | | | | | Summary: This patch adds support for the predicated integer halving add/sub instructions: * SHADD, UHADD, SRHADD, URHADD * SHSUB, UHSUB, SHSUBR, UHSUBR The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D62000 llvm-svn: 361136
* [AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructionsCullen Rhodes2019-05-201-0/+4
| | | | | | | | | | | | | | Summary: Patch adds support for SQDMLALBT and SQDMLSLBT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D61998 llvm-svn: 361135
* Use llvm::sort. NFCFangrui Song2019-05-202-2/+1
| | | | llvm-svn: 361134
* [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt valuesCarl Ritson2019-05-201-6/+22
| | | | | | | | | | | | | | | | | | Summary: Avoid introducing hazard mitigation when lgkmcnt is reduced to 0. Clarify code comments to explain assumptions made for this hazard mitigation. Expand and correct test cases to cover variants of s_waitcnt. Reviewers: nhaehnle, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62058 llvm-svn: 361124
OpenPOWER on IntegriCloud