summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Hexagon: Rename Register classMatt Arsenault2019-06-241-32/+33
| | | | | | This avoids a naming conflict in a future patch. llvm-svn: 364188
* AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyextMatt Arsenault2019-06-241-10/+76
| | | | | | | | This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186
* AMDGPU: Fold frame index into MUBUFMatt Arsenault2019-06-242-10/+49
| | | | | | | | | | | | | | | | This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185
* AMDGPU: Cleanup checking when spills need emergency slotsMatt Arsenault2019-06-241-7/+6
| | | | | | Address fixme, which should no longer be a problem since r363757. llvm-svn: 364182
* [ARM] Add MVE interleaving load/store family.Simon Tatham2019-06-247-33/+272
| | | | | | | | | | | | | | | | | | This adds the family of loads and stores with names like VLD20.8 and VST42.32, which load and store parts of multiple q-registers in such a way that executing both VLD20 and VLD21, or all four of VLD40..VLD43, will distribute 2 or 4 vectors' worth of memory data across the lanes of the same number of registers but in a transposed order. In addition to the Tablegen descriptions of the instructions themselves, this patch also adds encode and decode support for the QQPR and QQQQPR register classes (representing the range of loaded or stored vector registers), and tweaks to the parsing system for lists of vector registers to make it return the right format in this case (since, unlike NEON, MVE regards q-registers as primitive, and not just an alias for two d-registers). llvm-svn: 364172
* [X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we ↵Craig Topper2019-06-232-3/+14
| | | | | | | | | | | | | | | | avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163
* [X86] Fix isel pattern that was looking for a bitcasted load. Remove what ↵Craig Topper2019-06-231-13/+1
| | | | | | | | | | appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158
* [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in ↵Craig Topper2019-06-233-59/+35
| | | | | | | | | | | | | | | | | | | | tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150
* [X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> ↵Simon Pilgrim2019-06-221-0/+10
| | | | | | vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136
* [NFC] Fix indentation in PPCAsmPrinter.cppHubert Tong2019-06-221-54/+54
| | | | | | | | After r248261, the indentation switches, inside a namespace definition, between indenting and not indenting one level in for that namespace; the abomination occurs in the middle of a class definition. Fix that. llvm-svn: 364133
* [PowerPC][NFC] Move comment to the relevant functionHubert Tong2019-06-221-1/+1
| | | | | | | A comment that applies to a virtual destructor was placed on a class constructor. Move the comment to where it belongs. llvm-svn: 364132
* AArch64: Add support for reading pc using llvm.read_register.Peter Collingbourne2019-06-221-0/+8
| | | | | | | | | | | | | | | This is useful for allowing code to efficiently take an address that can be later mapped onto debug info. Currently the hwasan pass achieves this by taking the address of the current function: http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921 but this costs two instructions (plus a GOT entry in PIC code) per function with stack variables. This will allow the cost to be reduced to a single instruction. Differential Revision: https://reviews.llvm.org/D63471 llvm-svn: 364126
* AArch64: Prefer FP-relative debug locations in HWASANified functions.Peter Collingbourne2019-06-223-11/+18
| | | | | | | | | | | | | | | | | | | | To help produce better diagnostics for stack use-after-return, we'd like to be able to determine the addresses of each HWASANified function's local variables given a small amount of information recorded on entry to the function. Currently we require all HWASANified functions to use frame pointers and record (PC, FP) on function entry. This works better than recording SP because FP cannot change during the function, unlike SP which can change e.g. due to dynamic alloca. However, most variables currently end up using SP-relative locations in their debug info. This prevents us from recomputing the address of most variables because the distance between SP and FP isn't recorded in the debug info. To address this, make the AArch64 backend prefer FP-relative debug locations when producing debug info for HWASANified functions. Differential Revision: https://reviews.llvm.org/D63300 llvm-svn: 364117
* [COFF, ARM64] Fix encoding of debugtrap for WindowsTom Tan2019-06-214-0/+17
| | | | | | | | | | | | On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break point instruction on ARM64 which is recognized by Windows debugger and exception handling code, so llvm.debugtrap should map to it instead of redirecting to llvm.trap (brk #1) as the default implementation. Differential Revision: https://reviews.llvm.org/D63635 llvm-svn: 364115
* AMDGPU: Fix not using s33 for scratch wave offset in kernelsMatt Arsenault2019-06-211-7/+11
| | | | | | Fixes missing piece from r363990. llvm-svn: 364099
* [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into ↵Craig Topper2019-06-213-57/+16
| | | | | | | | | | | | | | | | (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095
* [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.Craig Topper2019-06-211-7/+4
| | | | | | | | | We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093
* [X86] Add a debug print of the node in the default case for unhandled ↵Craig Topper2019-06-211-0/+4
| | | | | | | | | | opcodes in ReplaceNodeResults. This should be unreachable, but bugs can make it reachable. This adds a debug print so we can see the bad node in the output when the llvm_unreachable triggers. llvm-svn: 364091
* [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffleSimon Pilgrim2019-06-211-2/+15
| | | | | | Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090
* [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE ↵Amara Emerson2019-06-212-0/+52
| | | | | | | | | | and G_BRJT ops. With this we can now fully code generate jump tables, which is important for code size. Differential Revision: https://reviews.llvm.org/D63223 llvm-svn: 364086
* [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.Craig Topper2019-06-212-53/+35
| | | | | | | | | | | | | | We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079
* [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.Amara Emerson2019-06-212-4/+7
| | | | | | | | | | | | | | | | | | | | | We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075
* [AMDGPU] hazard recognizer for fp atomic to s_denorm_modeStanislav Mekhanoshin2019-06-219-28/+112
| | | | | | | | | This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074
* [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.Simon Pilgrim2019-06-211-6/+6
| | | | | | TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate. llvm-svn: 364072
* Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-06-211-1/+1
| | | | llvm-svn: 364068
* [RISCV] Add RISCV-specific TargetTransformInfoSam Elliott2019-06-216-2/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: LLVM Allows Targets to provide information that guides optimisations made to LLVM IR. This is done with callbacks on a TargetTransformInfo object. This patch adds a TargetTransformInfo class for RISC-V. This will allow us to implement RISC-V specific callbacks as they become necessary. This commit also adds the getIntImmCost callbacks, and tests them with a simple constant hoisting test. Our immediate costs are on the conservative side, for the moment, but we prevent hoisting in most circumstances anyway. Previous review was on D63007 Reviewers: asb, luismarques Reviewed By: asb Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny Tags: #llvm Differential Revision: https://reviews.llvm.org/D63433 llvm-svn: 364046
* [ARM] Add MVE 64-bit GPR <-> vector move instructions.Simon Tatham2019-06-215-0/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | These instructions let you load half a vector register at once from two general-purpose registers, or vice versa. The assembly syntax for these instructions mentions the vector register name twice. For the move _into_ a vector register, the MC operand list also has to mention the register name twice (once as the output, and once as an input to represent where the unchanged half of the output register comes from). So we can conveniently assign one of the two asm operands to be the output $Qd, and the other $QdSrc, which avoids confusing the auto-generated AsmMatcher too much. For the move _from_ a vector register, there's no way to get round the fact that both instances of that register name have to be inputs, so we need a custom AsmMatchConverter to avoid generating two separate output MC operands. (And even that wouldn't have worked if it hadn't been for D60695.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62679 llvm-svn: 364041
* [ARM] Add MVE vector instructions that take a scalar input.Simon Tatham2019-06-216-2/+440
| | | | | | | | | | | | | | | | | | | This adds the `MVE_qDest_rSrc` superclass and all its instances, plus a few other instructions that also take a scalar input register or two. I've also belatedly added custom diagnostic messages to the operand classes for odd- and even-numbered GPRs, which required matching changes in two of the existing MVE assembly test files. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62678 llvm-svn: 364040
* [X86] X86ISD::ANDNP is a (non-commutative) binopSimon Pilgrim2019-06-211-0/+2
| | | | | | The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038
* [ARM] Add a batch of similarly encoded MVE instructions.Simon Tatham2019-06-213-1/+345
| | | | | | | | | | | | | | | | | | | | | | | Summary: This adds the `MVE_qDest_qSrc` superclass and all instructions that inherit from it. It's not the complete class of _everything_ with a q-register as both destination and source; it's a subset of them that all have similar encodings (but it would have been hopelessly unwieldy to call it anything like MVE_111x11100). This category includes add/sub with carry; long multiplies; halving multiplies; multiply and accumulate, and some more complex instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62677 llvm-svn: 364037
* [X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI.Simon Pilgrim2019-06-211-7/+5
| | | | llvm-svn: 364030
* [ARM] Fix -Wimplicit-fallthrough after D62675Fangrui Song2019-06-211-0/+2
| | | | llvm-svn: 364028
* [ARM] Add MVE vector compare instructions.Simon Tatham2019-06-213-6/+201
| | | | | | | | | | | | | | | | | | Summary: These take a pair of vector register to compare, and a comparison type (written in the form of an Arm condition suffix); they output a vector of booleans in the VPR register, where predication can conveniently use them. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62676 llvm-svn: 364027
* [X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1)Simon Pilgrim2019-06-211-5/+3
| | | | llvm-svn: 364026
* [X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI.Simon Pilgrim2019-06-211-7/+4
| | | | | | Use the isConstOrConstSplat helper instead of inspecting the build vector manually. llvm-svn: 364024
* [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) patternSimon Pilgrim2019-06-211-0/+10
| | | | llvm-svn: 364022
* [ARM] Add a batch of MVE floating-point instructions.Simon Tatham2019-06-213-4/+456
| | | | | | | | | | | | | | | | | Summary: This includes floating-point basic arithmetic (add/sub/multiply), complex add/multiply, unary negation and absolute value, rounding to integer value, and conversion to/from integer formats. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62675 llvm-svn: 364013
* Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFCFangrui Song2019-06-2110-33/+17
| | | | llvm-svn: 364006
* [MIPS GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off ↵Fangrui Song2019-06-212-2/+4
| | | | | | builds after D63541 llvm-svn: 364003
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-202-9/+1
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* [ARM GlobalISel] Add support for s64 G_ADD and G_SUB.Eli Friedman2019-06-202-2/+19
| | | | | | | | | | | | | Teach RegisterBankInfo to use the correct register class, and tell the legalizer it's legal. Everything else just works. The one thing that's slightly weird about this compared to SelectionDAG isel is that legalization can't distinguish between i64 and <1 x i64>, so we might end up with more NEON instructions than the user expects. Differential Revision: https://reviews.llvm.org/D63585 llvm-svn: 363989
* [PowerPC][NFC] Fix comments for AltVSXFMARel mapping.Jinsong Ji2019-06-201-3/+2
| | | | llvm-svn: 363987
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-205-25/+72
| | | | llvm-svn: 363983
* AMDGPU: Insert mem_viol check loop around GWS pre-GFX9Matt Arsenault2019-06-205-19/+129
| | | | | | | It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979
* [X86] Add BLSI to isUseDefConvertible.Craig Topper2019-06-201-0/+4
| | | | | | | | | | | | | | | | | | | | | Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957
* AMDGPU: Fix ignoring DisableFramePointerElim in leaf functionsMatt Arsenault2019-06-201-11/+7
| | | | | | | | The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949
* AMDGPU: Treat undef as an inline immediateMatt Arsenault2019-06-202-5/+19
| | | | | | | This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941
* [ARM] Add a batch of MVE integer instructions.Simon Tatham2019-06-203-1/+406
| | | | | | | | | | | | | | | | This includes integer arithmetic of various kinds (add/sub/multiply, saturating and not), and the immediate forms of VMOV and VMVN that load an immediate into all lanes of a vector. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62674 llvm-svn: 363936
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-2010-40/+56
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* [X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well.Simon Pilgrim2019-06-201-6/+10
| | | | llvm-svn: 363922
OpenPOWER on IntegriCloud