summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [GlobalISel][AArch64] Add support for @llvm.ceilJessica Paquette2018-12-191-0/+4
| | | | | | | | | | | | This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664
* [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post ↵Craig Topper2018-12-192-6/+30
| | | | | | | | | | | | processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision: https://reviews.llvm.org/D55870 llvm-svn: 349661
* [X86] Fix assert fails in pass X86AvoidSFBPassCraig Topper2018-12-191-13/+14
| | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=38743 The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D55642 llvm-svn: 349660
* [AArch64] Improve the Exynos M3 pipeline modelEvandro Menezes2018-12-191-4/+4
| | | | llvm-svn: 349652
* [BPF] Generate BTF DebugInfo under BPF targetYonghong Song2018-12-196-0/+1301
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements BTF (BPF Type Format). The BTF is the debug info format for BPF, introduced in the below linux patch: https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332 and further extended several times, e.g., https://www.spinics.net/lists/netdev/msg534640.html https://www.spinics.net/lists/netdev/msg538464.html https://www.spinics.net/lists/netdev/msg540246.html The main advantage of implementing in LLVM is: . better integration/deployment as no extra tools are needed. . bpf JIT based compilation (like bcc, bpftrace, etc.) can get BTF without much extra effort. . BTF line_info needs selective source codes, which can be easily retrieved when inside the compiler. This patch implemented BTF generation by registering a BPF specific DebugHandler in BPFAsmPrinter. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D55752 llvm-svn: 349640
* AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1Nicolai Haehnle2018-12-191-4/+2
| | | | | | | | | | | | | | | | | | | Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620
* AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are mergedCarl Ritson2018-12-191-0/+3
| | | | | | | | | | | | | | | | | | Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611
* [ARM GlobalISel] Support G_CONSTANT for Thumb2Diana Picus2018-12-191-4/+4
| | | | | | | | | | | All we have to do is mark it as legal. This allows us to select a lot of new patterns handled by TableGen. This patch adds tests for them and splits up the existing test file for binary operators into 2 files, one for arithmetic ops and one for logical ones. llvm-svn: 349610
* AMDGPU/GlobalISel: Regbankselect for fsubMatt Arsenault2018-12-191-0/+1
| | | | llvm-svn: 349608
* [PowerPC]Exploit P9 vabsdu for unsigned vselect patternsKewen Lin2018-12-192-0/+66
| | | | | | | | | | | | For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision: https://reviews.llvm.org/D55812 llvm-svn: 349599
* [AArch64] Simplify the Exynos M3 pipeline modelEvandro Menezes2018-12-181-12/+9
| | | | llvm-svn: 349569
* [AArch64] Fix instructions order (NFC)Evandro Menezes2018-12-181-4/+4
| | | | llvm-svn: 349568
* [X86] Add BSR to isUseDefConvertible.Craig Topper2018-12-181-6/+6
| | | | | | | | We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531
* [AMDGPU] Removed the unnecessary operand size-check-assert from ↵Farhana Aleen2018-12-181-2/+0
| | | | | | | | | | processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529
* [X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. ↵Craig Topper2018-12-181-30/+8
| | | | | | | | Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526
* [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBSNikita Popov2018-12-186-34/+48
| | | | | | | | | | | | Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520
* [X86] Create PSUBUS from (add (umax X, C), -C)Craig Topper2018-12-181-0/+44
| | | | | | | | | | | | InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519
* [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant ↵Simon Pilgrim2018-12-181-0/+3
| | | | | | | | rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510
* [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation ↵Simon Pilgrim2018-12-181-3/+4
| | | | | | | | amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500
* [MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREMPetar Avramovic2018-12-183-0/+37
| | | | | | | | | | | | Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM and use integer type of correct size when creating arguments for CLI.lowerCall. Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64 on MIPS32. Differential Revision: https://reviews.llvm.org/D55651 llvm-svn: 349499
* [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUSNikita Popov2018-12-185-38/+75
| | | | | | | | | | | | | Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481
* [MIPS GlobalISel] ClampScalar G_AND G_OR and G_XORPetar Avramovic2018-12-181-1/+5
| | | | | | | | | | Add narrowScalar for G_AND and G_XOR. Legalize G_AND G_OR and G_XOR for types other then s32 with clampScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D55362 llvm-svn: 349475
* [AArch64] - Return address signing dwarf supportLuke Cheeseman2018-12-182-0/+16
| | | | | | | | | | | | | | - Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision: https://reviews.llvm.org/D55774 llvm-svn: 349472
* AMDGPU: Legalize/regbankselect frame_indexMatt Arsenault2018-12-182-0/+3
| | | | llvm-svn: 349468
* AMDGPU: Legalize/regbankselect fmaMatt Arsenault2018-12-182-1/+2
| | | | llvm-svn: 349467
* AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsubMatt Arsenault2018-12-182-2/+10
| | | | llvm-svn: 349463
* [X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBitsSimon Pilgrim2018-12-181-11/+11
| | | | | | | | (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462
* [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold.Simon Pilgrim2018-12-181-8/+5
| | | | | | | | This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459
* Introduce control flow speculation tracking pass for AArch64Kristof Beyls2018-12-189-9/+443
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456
* [AArch64] [MinGW] Allow enabling SEH exceptionsMartin Storsjo2018-12-181-0/+3
| | | | | | | | | The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision: https://reviews.llvm.org/D55748 llvm-svn: 349451
* [PowerPC] Exploit power9 new instruction setbKewen Lin2018-12-184-3/+175
| | | | | | | | | | | | | Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision: https://reviews.llvm.org/D53275 llvm-svn: 349445
* [X86] Const correct some helper functions X86InstrInfo.cpp. NFCCraig Topper2018-12-181-6/+7
| | | | llvm-svn: 349440
* [PowerPC] Improve vec_abs on P9Kewen Lin2018-12-184-130/+169
| | | | | | | | | | Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision: https://reviews.llvm.org/D54783 llvm-svn: 349437
* [X86][SSE] Improve immediate vector shift known bits handling.Simon Pilgrim2018-12-171-9/+39
| | | | | | | | Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407
* [WebAssembly] Fix assembler parsing of br_table.Wouter van Oortmerssen2018-12-176-60/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We use `variable_ops` in the tablegen defs to denote the list of branch targets in `br_table`, but unlike other uses of `variable_ops` (e.g. call) the these branch targets need to actually be encoded in the instruction. The existing tables for `variable_ops` cause not operands to be accepted by the assembly matcher. Following the example of ARM: https://github.com/llvm-mirror/llvm/blob/2cc0a7da876c1d8c32775b0119e1e15aaa759b9e/lib/Target/ARM/ARMInstrInfo.td#L550-L555 we introduce a new operand type to capture this list, and we use the same {} syntax as ARM as well to differentiate them from regular integer operands. Also removed definition and use of TSFlags in tablegen defs, since `br_table` now has a non-variable_ops immediate operand, so the previous logic of only the variable_ops arguments being labels didn't make sense anymore. Reviewers: dschuff, aheejin, sunfish Subscribers: javed.absar, sbc100, jgravelle-google, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D55401 llvm-svn: 349405
* [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.Craig Topper2018-12-171-0/+4
| | | | | | These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404
* [X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling.Simon Pilgrim2018-12-171-3/+16
| | | | | | First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400
* Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero ↵Craig Topper2018-12-171-22/+50
| | | | | | | | | | | | flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385
* [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)Simon Pilgrim2018-12-172-9/+11
| | | | | | | | | | This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374
* FastIsel: take care to update iterators when removing instructions.Tim Northover2018-12-174-7/+13
| | | | | | | | | | We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365
* [MIPS GlobalISel] Remove switch statement (fix r349346 for MSVC)Petar Avramovic2018-12-171-6/+1
| | | | | | | Temporarily remove switch statement without any case labels in function legalizeCustom in order to fix r349346 for MSVC. llvm-svn: 349356
* ARM: use acquire/release instruction variants when available.Tim Northover2018-12-172-8/+9
| | | | | | | | These features (fairly) recently got split out into their own feature, so we should make CodeGen use them when available. The main change here is that the check used to be based on the triple, but now it's based on CPU features. llvm-svn: 349355
* [MIPS GlobalISel] Lower G_UADDE and narrowScalar G_ADDPetar Avramovic2018-12-171-30/+5
| | | | | | | | Lower G_UADDE and legalize G_ADD using narrowScalar on MIPS32. Differential Revision: https://reviews.llvm.org/D54580 llvm-svn: 349346
* [AArch64] Re-run load/store optimizer after aggressive tail duplicationAlexandros Lamprineas2018-12-171-0/+6
| | | | | | | | | The Load/Store Optimizer runs before Machine Block Placement. At O3 the Tail Duplication Threshold is set to 4 instructions and this can create new opportunities for the Load/Store Optimizer. It seems worthwhile to run it once again. llvm-svn: 349338
* [X86] Fix bad operand lookup for cmov introduced in r349315Craig Topper2018-12-171-1/+1
| | | | | | The CC is operand 2 not operand 3. llvm-svn: 349330
* [X86] Pull out constant splat rotation detection.Simon Pilgrim2018-12-161-21/+28
| | | | | | We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319
* [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.Craig Topper2018-12-161-50/+105
| | | | | | | | | | I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315
* [x86] increment/decrement constant vector with min/max in vsetcc lowering ↵Sanjay Patel2018-12-161-3/+16
| | | | | | | | | | | | | | | | | | | | (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304
* [X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI.Simon Pilgrim2018-12-151-5/+5
| | | | | | In preparation for converting to funnel shifts. llvm-svn: 349286
* [X86] Lower to SHLD/SHRD on slow machines for optsizeSimon Pilgrim2018-12-151-3/+3
| | | | | | Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285
OpenPOWER on IntegriCloud