summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16Simon Pilgrim2018-09-161-3/+1
| | | | | | Now that rL340913 has landed with improved v16i16 selects as shuffles. llvm-svn: 342349
* [x86] fix uses check in broadcast transform (PR38949)Sanjay Patel2018-09-161-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugs.llvm.org/show_bug.cgi?id=38949 It's not clear to me that we even need a one-use check in this fold. Ie, 2 independent loads might be better than a load+dependent shuffle. Note that the existing re-use tests are not affected. We actually do form a broadcast node in those tests now because there's no extra use of the insert_subvector node in those cases. But something later in isel pattern matching decides that it is not worth using a broadcast for the full load in those tests: Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:' t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0> t20: v4f64 = insert_subvector t18, t7, Constant:i64<2> Becomes: t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ... Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7> Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1> llvm-svn: 342347
* [X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64.Craig Topper2018-09-151-5/+3
| | | | | | | | | | | | | | Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327
* [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))Craig Topper2018-09-151-0/+26
| | | | | | | | | | | | | | | | | Summary: MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load. Inspired by PR38840. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52121 llvm-svn: 342326
* [WebAssembly] SIMD shiftsThomas Lively2018-09-151-0/+26
| | | | | | | | | | | | | | | | | Summary: Implement shifts of vectors by i32. Since LLVM defines shifts as binary operations between two vectors, this involves pattern matching on splatted shift operands. For v2i64 shifts any i32 shift operands have to be zero extended in the input and any i64 shift operands have to be wrapped in the output. Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51906 llvm-svn: 342302
* [WebAssembly] SIMD negThomas Lively2018-09-141-0/+30
| | | | | | | | | | | | Summary: Depends on D52007. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52009 llvm-svn: 342296
* [PowerPC] Fix the calling convention for i1 arguments on PPC32Lion Yang2018-09-141-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Integer types smaller than i32 must be extended to i32 by default. The feature "crbits" introduced at r202451 handles i1 as a special case, but it did not extend properly. The caller was, therefore, passing i1 stack arguments by writing 0/1 to the first byte of the 4-byte stack object and callee was reading the first byte for the value. "crbits" is enabled if the optimization level is greater than 1, which is very common in "release builds". Such discrepancies with ABI specification also introduces potential incompatibility with programs or libraries built with other compilers e.g. GCC. Fixes PR38661 Reviewers: hfinkel, cuviper Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D51108 llvm-svn: 342288
* AMDGPU: Clear the bits before they are being set in program resource registersKonstantin Zhuravlyov2018-09-141-0/+1
| | | | | | Change by Tony Tye llvm-svn: 342270
* Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"Reid Kleckner2018-09-141-152/+27
| | | | | | | | | | It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260
* [X86][SSE] Lower shuffles to permute(unpack(x,y)) (PR31151)Simon Pilgrim2018-09-141-5/+75
| | | | | | | | | | Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation. As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work. Differential Revision: https://reviews.llvm.org/D52043 llvm-svn: 342257
* [X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2Simon Pilgrim2018-09-141-1/+1
| | | | | | These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64. llvm-svn: 342235
* [X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructionsSimon Pilgrim2018-09-1411-48/+35
| | | | llvm-svn: 342234
* [AMDGPU] Ensure trig range reduction only used for subtargets that require itDavid Stuttard2018-09-144-9/+28
| | | | | | | | | | | | | | | | | | Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-09-141-27/+152
| | | | | | | | | | On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210
* [SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPMJonas Paulsson2018-09-141-4/+8
| | | | | | | | | | | | | | | After recent improvements which makes better use of LOC instead of IPM, the TTI cost functions also needs to be updated to reflect this. This involves sext, zext and xor of i1. The tests were updated so that for z13 the new costs are expected, while the old costs are still checked for on zEC12. Review: Ulrich Weigand https://reviews.llvm.org/D51339 llvm-svn: 342207
* [AMDGPU] Removed unused methodTim Renouf2018-09-131-22/+0
| | | | | | | | | | | | | Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189
* [X86] Fix register resizings for inline assembly register operands.Nirav Dave2018-09-132-7/+39
| | | | | | | | | | | | | | When replacing a named register input to the appropriately sized sub/super-register. In the case of a 64-bit value being assigned to a register in 32-bit mode, match GCC's assignment. Reviewers: eli.friedman, craig.topper Subscribers: nickdesaulniers, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D51502 llvm-svn: 342175
* [X86] Cleanup pair returns. NFCI.Nirav Dave2018-09-131-32/+14
| | | | llvm-svn: 342174
* [RISCV][MC] Reject bare symbols for the simm6 and simm6nonzero operand typesAna Pazos2018-09-131-14/+4
| | | | | | | | | | | | | | | | | | | | Summary: Fixed assertions due to invalid fixup when encoding compressed instructions (c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers. This matches GAS behavior as well. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D52005 llvm-svn: 342160
* [RISCV] Fix decoding of invalid instruction with C extension enabled.Ana Pazos2018-09-132-2/+20
| | | | | | | | | | | | | | | | | | | | | | Summary: The illegal instruction 0x00 0x00 is being wrongly decoded as c.addi4spn with 0 immediate. The invalid instruction 0x01 0x61 is being wrongly decoded as c.addi16sp with 0 immediate. This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer for the RISC-V assembly language. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb Differential Revision: https://reviews.llvm.org/D51815 llvm-svn: 342159
* [WebAssembly] Fix signature of `main` in FixFunctionBitcastsSam Clegg2018-09-131-2/+4
| | | | | | | | | Also, add a check to ensure that when main has the expected signature we do not create a wrapper. Differential Revision: https://reviews.llvm.org/D51562 llvm-svn: 342157
* [ARM] Allow truncs as sources in ARM CGPSam Parker2018-09-131-19/+23
| | | | | | | | | | We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141
* [ARM] Fix FixConst for ARMCodeGenPrepareSam Parker2018-09-131-20/+3
| | | | | | | | | | Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140
* AMDGPU: Fix not preserving alignent in call setupsMatt Arsenault2018-09-131-1/+7
| | | | | | | | | | | | If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133
* ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.Tim Northover2018-09-134-0/+22
| | | | | | | | | | | | The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127
* [AMDGPU] Load divergence predicate refactoringAlexander Timofeev2018-09-132-8/+26
| | | | | | | | Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120
* [mips] Enable the mnemonic spell correctorSimon Atanasyan2018-09-131-1/+7
| | | | | | | | | | | | This implements suggesting alternative mnemonics when an invalid one is specified. For example `addru $9, $6, 17767` leads to the following error message: error: unknown instruction, did you mean: add, addiu, addu, maddu? Differential revision: https://reviews.llvm.org/D40646 llvm-svn: 342119
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-131-0/+1
| | | | | | | | | | Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115
* [X86] Type legalize v2i32 div/rem by scalarizing rather than promotingCraig Topper2018-09-131-0/+17
| | | | | | | | | | | | | | | | | | | | | Summary: Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall. This patch switches type legalization to do the scalarizing itself using i32. It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support. Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51325 llvm-svn: 342114
* ARM: correct the relocation type for `bl` on WoASaleem Abdulrasool2018-09-131-1/+1
| | | | | | | | | | The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`. Correct the relocation that we emit in such a case. Resolves PR38620! Based on the patch by Jordan Rhee! llvm-svn: 342109
* Remove isAsCheapAsAMove from v128.constThomas Lively2018-09-131-1/+1
| | | | llvm-svn: 342106
* Remove isAsCheapAsAMove from mem opsThomas Lively2018-09-131-2/+2
| | | | llvm-svn: 342105
* [WebAssembly] Add missing SIMD instruction attributesThomas Lively2018-09-131-2/+3
| | | | | | | | | | | | | | Summary: These attributes are copied from equivalent instructions in WebAssemblyInstrInfo.td. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51518 llvm-svn: 342104
* [Hexagon] Use shuffles when lowering "gather" shufflevectorsKrzysztof Parzyszek2018-09-121-0/+70
| | | | | | | | Shufflevector instructions in LLVM IR that extract a subset of elements of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs. This will avoid expanding them into constly extracts and inserts. llvm-svn: 342091
* [Hexagon] Improve the selection algorithm in scalarizeShuffleKrzysztof Parzyszek2018-09-121-22/+89
| | | | | | Use topological ordering for newly generated nodes. llvm-svn: 342090
* [WebAssembly] Make tied inline asm operands work againHeejin Ahn2018-09-121-0/+5
| | | | | | | | | | | | | | | | | Summary: rL341389 broke code with tied register operands in inline assembly. For example, `asm("" : "=r"(var) : "0"(var));` The code above specifies the input operand to be in the same register with the output operand, tying the two register. This patch makes this kind of code work again. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51991 llvm-svn: 342084
* [Hexagon] Use legalized type for extracted elements in scalarizeShuffleKrzysztof Parzyszek2018-09-121-2/+4
| | | | | | | | | Scalarization of a shuffle will break up the source vectors into individual elements, and use them to assemble the resulting vector. An element type of a legal vector type may not necessarily be a legal scalar type, so make sure that the extracted values are extended to a legal scalar type. llvm-svn: 342079
* AMDGPU: Print all kernel descriptor directives (including the ones with ↵Konstantin Zhuravlyov2018-09-121-101/+88
| | | | | | | | | | default values) Change by Tony Tye Differential Revision: https://reviews.llvm.org/D51954 llvm-svn: 342077
* AMDGPU: Re-apply r341982 after fixing the layering issueKonstantin Zhuravlyov2018-09-1211-391/+364
| | | | | | | | | | | | Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
* [WebAssembly] SIMD comparisonsThomas Lively2018-09-122-1/+55
| | | | | | | | | | | | | | | | Summary: Match the ordering semantics of non-vector comparisons. For floating point comparisons that do not correspond to instructions, the tests check that some vector comparison instruction was emitted but do not care about the full implementation. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51765 llvm-svn: 342064
* [ARM] Tighten f64<->f16 conversion requirementsDiogo N. Sampaio2018-09-121-4/+8
| | | | | | | | | | | | | | Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061
* [X86] Remove isel patterns for ADCX instructionCraig Topper2018-09-121-30/+7
| | | | | | | | | | | | There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags. I don't think gcc will generate this instruction either. Fixes PR38852. Differential Revision: https://reviews.llvm.org/D51754 llvm-svn: 342059
* [AArch64] Implement aarch64_vector_pcs codegen support.Sander de Smalen2018-09-123-41/+92
| | | | | | | | | | | | | | This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049
* [ARM] Follow-up to rL342033Sam Parker2018-09-121-1/+1
| | | | | | Fixed typo which can cause segfault. llvm-svn: 342040
* [AArch64] NFC: Refactoring to prepare for vector PCS.Sander de Smalen2018-09-121-39/+74
| | | | | | | | | | | | | | This patch refactors several parts of AArch64FrameLowering so that it can be easily extended to support saving/restoring of FPR128 (Q) registers. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51478 llvm-svn: 342038
* [ARM] Exchange MAC operands in ARMParallelDSPSam Parker2018-09-121-115/+154
| | | | | | | | | | | | | | | | SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033
* [ARM] Allow bitcasts in ARMCodeGenPrepareSam Parker2018-09-121-5/+4
| | | | | | | | Allow bitcasts in the use-def chains, treating them as sources. Differential Revision: https://reviews.llvm.org/D50758 llvm-svn: 342032
* [AArch64] Add parsing of aarch64_vector_pcs attribute.Sander de Smalen2018-09-122-0/+8
| | | | | | | | | | | | | | | | | | | This patch adds parsing support for the 'aarch64_vector_pcs' calling convention attribute to calls and function declarations. More information describing the vector ABI and procedure call standard can be found here: https://developer.arm.com/products/software-development-tools/\ hpc/arm-compiler-for-hpc/vector-function-abi Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D51477 llvm-svn: 342030
* Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵Ilya Biryukov2018-09-1211-274/+391
| | | | | | | | | | | TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
* [X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32Craig Topper2018-09-122-24/+36
| | | | | | | | | | | | | | | | | Summary: In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match. Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger. Reviewers: rnk, efriedma, MatzeB, echristo Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51893 llvm-svn: 342015
OpenPOWER on IntegriCloud