summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* test commit (add blank line) NFCRoland Froese2019-02-011-0/+1
| | | | llvm-svn: 352897
* [AMDGPU] Fix for vector element insertionTim Corringham2019-02-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885
* [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffleSimon Pilgrim2019-02-011-0/+73
| | | | | | | | | | As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section. For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts). Differential Revision: https://reviews.llvm.org/D56784 llvm-svn: 352883
* [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, ↵Simon Pilgrim2019-02-011-3/+4
| | | | | | | | | | BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1))) Enable peeking through one use bitcasts to the subvector shuffle. This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns llvm-svn: 352879
* [AArch64] Optimize floating point materializationAdhemerval Zanella2019-02-012-28/+23
| | | | | | | | | | | | | | | This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866
* [X86][BdVer2] Transfer delays from the integer to the floating point unit.Roman Lebedev2019-02-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I'm unable to find this number in the "AMD SOG for family 15h". llvm-exegesis measures the latencies of these instructions as `2`, which matches the latencies specified in "AMD SOG for family 15h". However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver, Steamroller and Excavator pipeline", "Data delay between different execution domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency. Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`, which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay. Additional data point comes from the fact that Agner's "Instruction tables", for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`. Reviewers: andreadb, RKSimon, craig.topper Reviewed By: andreadb Subscribers: gbedwell, courbet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57300 llvm-svn: 352861
* Provide reason messages for unviable inliningYevgeny Rouban2019-02-011-2/+3
| | | | | | | | | | | | | InlineCost's isInlineViable() is changed to return InlineResult instead of bool. This provides messages for failure reasons and allows to get more specific messages for cases where callsites are not viable for inlining. Reviewed By: xbolva00, anemet Differential Revision: https://reviews.llvm.org/D57089 llvm-svn: 352849
* [RISCV] Implement RV64D codegenAlex Bradbury2019-02-012-4/+30
| | | | | | | | | | | | This patch: * Adds necessary RV64D codegen patterns * Modifies CC_RISCV so it will properly handle f64 types (with soft float ABI) Note that in general there is no reason to try to select fcvt.w[u].d rather than fcvt.l[u].d for i32 conversions because fptosi/fptoui produce poison if the input won't fit into the target type. Differential Revision: https://reviews.llvm.org/D53237 llvm-svn: 352833
* [opaque pointer types] Add a FunctionCallee wrapper type, and use it.James Y Knight2019-02-019-68/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc doesn't choke on it, hopefully. Original Message: The FunctionCallee type is effectively a {FunctionType*,Value*} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352827
* [WebAssembly] Fix a regression selecting negative build_vector lanesThomas Lively2019-01-311-1/+1
| | | | | | | | | | | | | | | | Summary: The custom lowering introduced in rL352592 creates build_vector nodes with negative i32 operands, but these operands did not meet the value range constraints necessary to match build_vector nodes. This CL fixes the issue by removing the unnecessary constraints. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57481 llvm-svn: 352813
* [RISCV] Add RV64F codegen supportAlex Bradbury2019-01-313-2/+130
| | | | | | | | | | | | | This requires a little extra work due tothe fact i32 is not a legal type. When call lowering happens post-legalisation (e.g. when an intrinsic was inserted during legalisation). A bitcast from f32 to i32 can't be introduced. This is similar to the challenges with RV32D. To handle this, we introduce target-specific DAG nodes that perform bitcast+anyext for f32->i64 and trunc+bitcast for i64->f32. Differential Revision: https://reviews.llvm.org/D53235 llvm-svn: 352807
* [Hexagon] Rename textually included file from .h to .incRichard Trieu2019-01-312-1/+1
| | | | llvm-svn: 352802
* Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."James Y Knight2019-01-319-71/+68
| | | | | | | | | This reverts commit f47d6b38c7a61d50db4566b02719de05492dcef1 (r352791). Seems to run into compilation failures with GCC (but not clang, where I tested it). Reverting while I investigate. llvm-svn: 352800
* [WebAssembly] Add bulk memory target featureThomas Lively2019-01-313-16/+30
| | | | | | | | | | | | Summary: Also clean up some preexisting target feature code. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, jfb Differential Revision: https://reviews.llvm.org/D57495 llvm-svn: 352793
* [opaque pointer types] Add a FunctionCallee wrapper type, and use it.James Y Knight2019-01-319-68/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FunctionCallee type is effectively a {FunctionType*,Value*} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352791
* [DAG][SystemZ] Define unwrapAddress for PCREL_WRAPPER.Nirav Dave2019-01-312-0/+8
| | | | | | | | | | | | | | | | Summary: Like with X86, this allows better DAG-level alias analysis and alignment inference for wrapped addresses. Reviewers: jonpa, uweigand Reviewed By: uweigand Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D57407 llvm-svn: 352786
* Revert "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."Craig Topper2019-01-312-6/+2
| | | | | | This is causing a failure in chromium llvm-svn: 352782
* Trim trailing whitespace. NFCI.Simon Pilgrim2019-01-311-1/+1
| | | | llvm-svn: 352775
* [X86][AVX] Fold concat(broadcast(x),broadcast(x)) -> broadcast(x)Simon Pilgrim2019-01-311-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352774
* [X86][AVX] insert_subvector(bitcast(v), bitcast(s), c1) -> ↵Simon Pilgrim2019-01-311-0/+36
| | | | | | | | | | bitcast(insert_subvector(v,s,c2)) Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling. Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352773
* [X86][AVX] Fold broadcast(bitcast(src)) -> bitcast(broadcast(src))Simon Pilgrim2019-01-311-0/+8
| | | | llvm-svn: 352751
* [X86] combineExtractWithShuffle - more aggressively peek through bitcastsSimon Pilgrim2019-01-311-4/+8
| | | | | | Fixes regression introduced by rL352743 llvm-svn: 352745
* [X86][AVX] Enable AVX1 broadcasts in shuffle combiningSimon Pilgrim2019-01-311-7/+19
| | | | | | | | Enables 32/64-bit scalar load broadcasts on AVX1 targets The extractelement-load.ll regression will be fixed shortly in a followup commit. llvm-svn: 352743
* [X86][AVX] Fold vt1 concat_vectors(vt2 undef, vt2 broadcast(x)) --> vt1 ↵Simon Pilgrim2019-01-311-1/+5
| | | | | | | | | | broadcast(x) If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast. Avoids a regression when we enable AVX1 broadcasts in shuffle combining llvm-svn: 352742
* [ARM] Thumb2: ConstantMaterializationCostSjoerd Meijer2019-01-311-2/+4
| | | | | | | | | | | | | | | | Constants can also be materialised using the negated value and a MVN, and this case seem to have been missed for Thumb2. To check the constant materialisation costs, we now call getT2SOImmVal twice, once for the original constant and then also for its negated value, and this function checks if the constant can both be splatted or rotated. This was revealed by a test that optimises for minsize: instead of a LDR literal pool load and having a literal pool entry, just a MVN with an immediate is smaller (and also faster). Differential Revision: https://reviews.llvm.org/D57327 llvm-svn: 352737
* [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTSSjoerd Meijer2019-01-313-0/+18
| | | | | | | | | | | | | | | | And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
* GlobalISel: Handle odd splits in fewerElementsVector for load/storeMatt Arsenault2019-01-311-1/+2
| | | | llvm-svn: 352720
* GlobalISel: Implement narrowScalar for bswapMatt Arsenault2019-01-311-1/+5
| | | | llvm-svn: 352719
* GlobalISel: Allow bitcount ops to have different result typeMatt Arsenault2019-01-312-9/+18
| | | | | | For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
* GlobalISel: Fix creating MMOs with align 0Matt Arsenault2019-01-315-14/+21
| | | | llvm-svn: 352712
* [X86] Remove handling of ISD::INTRINSIC_WO_CHAIN in ReplaceNodeResults.Craig Topper2019-01-311-6/+0
| | | | | | I believe this was there to handle avx512bw intrinsics that returned i64 type in 32-bit mode. But all those intrinsics have since been changed to v64i1 results or replaced with generic IR. llvm-svn: 352698
* [GlobalISel][AArch64] Select G_FEXPJessica Paquette2019-01-302-1/+3
| | | | | | | | | | | | | | | This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also allows us to select G_FEXP. It... - Updates the legalizer-info tests - Adds a test for legalizing exp - Updates the existing fp tests to show that we can now select G_FEXP https://reviews.llvm.org/D57483 llvm-svn: 352692
* [PowerPC] delete no more needed workaround for readsRegister() in PowerPCChen Zheng2019-01-301-14/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D57439 llvm-svn: 352689
* [GlobalISel][AArch64] Select G_FABSJessica Paquette2019-01-302-1/+2
| | | | | | | | | This adds instruction selection support for G_FABS in AArch64. It also updates the existing basic FP tests, adds a selection test for G_FABS. https://reviews.llvm.org/D57418 llvm-svn: 352684
* [WebAssembly] Restore stack pointer right after catch instructionHeejin Ahn2019-01-305-98/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After the staack is unwound due to a thrown exxception, `__stack_pointer` global can point to an invalid address. So a `global.set` to restore `__stack_pointer` should be inserted right after `catch` instruction. But after r352598 the `global.set` instruction is inserted not right after `catch` but after `block` - `br-on-exn` - `end_block` - `extract_exception` sequence. This CL fixes it. While doing that, we can actually move ReplacePhysRegs pass after LateEHPrepare and merge EHRestoreStackPointer pass into LateEHPrepare, and now placing `global.set` to `__stack_pointer` right after `catch` is much easier. Otherwise it is hard to guarantee that `global.set` is still right after `catch` and not touched with other transformations, in which case we have to do something to hoist it. Reviewers: dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D57421 llvm-svn: 352681
* [GlobalISel][AArch64] Add instruction selection support for @llvm.log2Jessica Paquette2019-01-302-1/+2
| | | | | | | | | | | | | This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters it. It updates the existing floating point tests to show that we don't fall back on the intrinsic, and select the correct instructions. It also adds a legalizer test for G_FLOG2. https://reviews.llvm.org/D57357 llvm-svn: 352673
* [GlobalISel][AArch64] Add instruction selection support for @llvm.sqrtJessica Paquette2019-01-302-1/+2
| | | | | | | | | | This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer test for G_FSQRT, a selection test for it, and updates existing floating point tests. https://reviews.llvm.org/D57361 llvm-svn: 352671
* Add a 'dynamic' parameter to the objectsize intrinsicErik Pilkington2019-01-301-1/+2
| | | | | | | | | | | | | | This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
* [X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7.Craig Topper2019-01-302-2/+6
| | | | | | | | | | This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 352660
* AMDGPU: Stop generating unused intrinsic .inc filesMatt Arsenault2019-01-301-2/+0
| | | | llvm-svn: 352635
* [X86][AVX] Prefer to combine shuffle to broadcasts whenever possibleSimon Pilgrim2019-01-301-11/+14
| | | | | | This is the first step towards improving broadcast support on AVX1 targets. llvm-svn: 352634
* [RISCV] Insert R_RISCV_ALIGN relocation type and Nops for code alignment ↵Shiva Chen2019-01-304-1/+70
| | | | | | | | | | | | | | | | | | | | when linker relaxation enabled Linker relaxation may change code size. We need to fix up the alignment of alignment directive in text section by inserting Nops and R_RISCV_ALIGN relocation type. So then linker could satisfy the alignment by removing Nops. To do this: 1. Add shouldInsertExtraNopBytesForCodeAlign target hook to calculate the Nops we need to insert. 2. Add shouldInsertFixupForCodeAlign target hook to insert R_RISCV_ALIGN fixup type. Differential Revision: https://reviews.llvm.org/D47755 llvm-svn: 352616
* [X86] Remove unnecessary code from the top of handleCompareFP in ↵Craig Topper2019-01-301-2/+0
| | | | | | | | X86FloatingPoint.cpp. There were checks to ensure some tables were sorted, but those tables aren't used by this function. The same tables are checked in the function that does use them. Maybe this was copy/pasted? llvm-svn: 352609
* [X86] Remove a couple places where we unnecessarily pass 0 to the ↵Craig Topper2019-01-301-4/+4
| | | | | | | | | | EmitPriority of some FP instruction aliases. NFC As far as I can tell we already won't emit these aliases due to an operand count check in the tablegen code. Removing these because I couldn't make sense of the inconsistency between fadd and fmul from reading the code. I checked the AsmMatcher and AsmWriter files before and after this change and there were no differences. llvm-svn: 352608
* [X86] Add FPSW as a Def on some FP instructions that were missing it.Craig Topper2019-01-301-5/+5
| | | | llvm-svn: 352607
* GlobalISel: Implement fewerElementsVector for selectMatt Arsenault2019-01-301-1/+20
| | | | llvm-svn: 352601
* AMDGPU/GlobalISel: Fix clamping shifts with 16-bit instsMatt Arsenault2019-01-301-2/+3
| | | | llvm-svn: 352599
* [WebAssembly] Exception handling: Switch to the new proposalHeejin Ahn2019-01-3016-513/+351
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This switches the EH implementation to the new proposal: https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md (The previous proposal was https://github.com/WebAssembly/exception-handling/blob/master/proposals/old/Exceptions.md) - Instruction changes - Now we have one single `catch` instruction that returns a except_ref value - `throw` now can take variable number of operations - `rethrow` does not have 'depth' argument anymore - `br_on_exn` queries an except_ref to see if it matches the tag and branches to the given label if true. - `extract_exception` is a pseudo instruction that simulates popping values from wasm stack. This is to make `br_on_exn`, a very special instruction, work: `br_on_exn` puts values onto the stack only if it is taken, and the # of values can vay depending on the tag. - Now there's only one `catch` per `try`, this patch removes all special handling for terminate pad with a call to `__clang_call_terminate`. Before it was the only case there are two catch clauses (a normal `catch` and `catch_all` per `try`). - Make `rethrow` act as a terminator like `throw`. This splits BB after `rethrow` in WasmEHPrepare, and deletes an unnecessary `unreachable` after `rethrow` in LateEHPrepare. - Now we stop at all catchpads (because we add wasm `catch` instruction that catches all exceptions), this creates new `findWasmUnwindDestinations` function in SelectionDAGBuilder. - Now we use `br_on_exn` instrution to figure out if an except_ref matches the current tag or not, LateEHPrepare generates this sequence for catch pads: ``` catch block i32 br_on_exn $__cpp_exception end_block extract_exception ``` - Branch analysis for `br_on_exn` in WebAssemblyInstrInfo - Other various misc. changes to switch to the new proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D57134 llvm-svn: 352598
* [PowerPC] [NFC] Create a helper function to copy register to particular ↵Zi Xuan Wu2019-01-301-35/+18
| | | | | | | | | | | | register class at PPCFastISel Make copy register code as common function as following. unsigned copyRegToRegClass(const TargetRegisterClass *ToRC, unsigned SrcReg, unsigned Flag = 0, unsigned SubReg = 0); Differential Revision: https://reviews.llvm.org/D57368 llvm-svn: 352596
* GlobalISel: Support narrowScalar for uneven loadsMatt Arsenault2019-01-301-0/+8
| | | | llvm-svn: 352594
OpenPOWER on IntegriCloud