summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [RISCV] Add mnemonic alias: move, sbreak and scall.Alex Bradbury2018-08-081-0/+8
| | | | | | | | | Further improve compatibility with the GNU assembler. Differential Revision: https://reviews.llvm.org/D50217 Patch by Kito Cheng. llvm-svn: 339255
* [RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], ↵Alex Bradbury2018-08-081-0/+31
| | | | | | | | | | | | sra[w], slt and sltu with immediate Match the GNU assembler in supporting immediate operands for these instructions even when the reg-reg mnemonic is used. Differential Revision: https://reviews.llvm.org/D50046 Patch by Kito Cheng. llvm-svn: 339252
* [ARM] FP16: codegen support for VEXTSjoerd Meijer2018-08-081-6/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241
* [ARM] FP16: vector vmov and vdup supportSjoerd Meijer2018-08-081-0/+13
| | | | | | | | This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238
* [ARM] FP16: vector VMUL variantsSjoerd Meijer2018-08-081-2/+14
| | | | | | | | This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232
* [Wasm] Don't iterate over MachineBasicBlock::successors while erasing from itBenjamin Kramer2018-08-081-8/+10
| | | | | | This will read out of bounds. Found by asan. llvm-svn: 339230
* [ARM] FP16: support vector INT_TO_FP and FP_TO_INTSjoerd Meijer2018-08-081-7/+35
| | | | | | | | This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227
* [ARM] FP16: support the vector vmin and vmax variantsSjoerd Meijer2018-08-081-0/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221
* AMDGPU: Remove broken i16 ternary patternsJan Vesely2018-08-071-11/+0
| | | | | | | | | | | | | | | Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190
* [WebAssembly] Update SIMD binary arithmeticDerek Schuff2018-08-0714-18/+110
| | | | | | | | | | | | Add missing SIMD types (v2f64) and binary ops. Also adds tablegen support for automatically prepending prefix byte to SIMD opcodes. Differential Revision: https://reviews.llvm.org/D50292 Patch by Thomas Lively llvm-svn: 339186
* [Hexagon] Allow use of gather intrinsics even with no-packetsKrzysztof Parzyszek2018-08-072-9/+0
| | | | | | | | | Vgather requires must be in a packet with a store, which contradicts the no-packets feature. As a consequence, gather/scatter could not be used with no-packets. Relax this, and allow gather packets as exceptions to the no-packets requirements. llvm-svn: 339177
* [WebAssembly] CFG sort support for exception handlingHeejin Ahn2018-08-071-54/+184
| | | | | | | | | | | | | | | | | Summary: This patch extends CFGSort pass to support exception handling. Once it places a loop header, it does not place blocks that are not dominated by the loop header until all the loop blocks are sorted. This patch extends the same algorithm to exception 'catch' part, using the information calculated by WebAssemblyExceptionInfo class. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D46500 llvm-svn: 339172
* [SelectionDAG][X86][SystemZ] Add a generic ↵Craig Topper2018-08-072-8/+0
| | | | | | | | nonvolatile_store/nonvolatile_load pattern fragment in TargetSelectionDAG.td Differential Revision: https://reviews.llvm.org/D50358 llvm-svn: 339156
* [ARM] FP16: codegen support for VACGTSjoerd Meijer2018-08-071-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148
* [SystemZ] Comment update.Jonas Paulsson2018-08-071-1/+1
| | | | | | | Update the comment in nextGroup since the ProcResourceCounters are not anymore always decremented with '1'. llvm-svn: 339140
* [SystemZ] NFC: Remove redundant check in SystemZHazardRecognizer.Jonas Paulsson2018-08-071-4/+3
| | | | | | | | Remove the redundant check against zero when updating ProcResourceCounters in nextGroup(), as pointed out in https://reviews.llvm.org/D50187. Review: Ulrich Weigand. llvm-svn: 339139
* [mips] Handle branch expansion corner casesAleksandar Beserminji2018-08-072-93/+160
| | | | | | | | | | | | When potential jump instruction and target are in the same segment, use jump instruction with immediate field. In cases where offset does not fit immediate value of a bc/j instructions, offset is stored into register, and then jump register instruction is used. Differential Revision: https://reviews.llvm.org/D48019 llvm-svn: 339126
* AMDGPU: Add feature vi-instsMatt Arsenault2018-08-073-2/+10
| | | | | | | | | | | | | This is necessary to add a VI specific builtin, __builtin_amdgcn_s_dcache_wb. We already have an overly specific feature for one of these builtins, for s_memrealtime. I'm not sure whether it's better to add more of those, or to get rid of that and merge it with vi-insts. Alternatively, maybe this logically goes with scalar-stores? llvm-svn: 339104
* [SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru.Craig Topper2018-08-071-13/+14
| | | | | | Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from. llvm-svn: 339101
* [SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes.Craig Topper2018-08-072-16/+19
| | | | | | getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather. llvm-svn: 339096
* [WebAssembly] Enable atomic expansion for unsupported atomicrmwsHeejin Ahn2018-08-073-4/+23
| | | | | | | | | | | | | | | | Summary: Wasm does not have direct counterparts to some of LLVM IR's atomicrmw instructions (min, max, umin, umax, and nand). This enables atomic expansion using cmpxchg instruction within a loop for those atomicrmw instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49440 llvm-svn: 339084
* [WebAssembly] Replace SIMD expression types with V128Derek Schuff2018-08-063-23/+13
| | | | | | | | | | | | Summary: The spec only defines a SIMD expression type of V128 and leaves interpretation of different vector types to the instructions. Differential Revision: https://reviews.llvm.org/D50367 Patch by Thomas Lively llvm-svn: 339082
* AMDGPU: cvt_pk_rtz_f16 canonicalizesMatt Arsenault2018-08-062-1/+15
| | | | llvm-svn: 339078
* AMDGPU: Handle some vector operations in isCanonicalizedMatt Arsenault2018-08-061-0/+20
| | | | llvm-svn: 339077
* AMDGPU: Push fcanonicalize through partially constant build_vectorMatt Arsenault2018-08-061-1/+37
| | | | | | | This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072
* AMDGPU: Refactor fcanonicalize combineMatt Arsenault2018-08-062-36/+32
| | | | | | This will make more complex combines easier. llvm-svn: 339070
* AMDGPU: Treat more custom operations as canonicalizingMatt Arsenault2018-08-062-2/+21
| | | | | | | | | | | | | | Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065
* AMDGPU: Conversions always produce canonical resultsMatt Arsenault2018-08-061-7/+2
| | | | | | | | | Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064
* AMDGPU: Fix implementation of isCanonicalizedMatt Arsenault2018-08-062-46/+77
| | | | | | | | | | | | | | | If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061
* [X86] Recognize a splat of negate in isFNEGEaswaran Raman2018-08-061-18/+77
| | | | | | | | | | | | | | | | | | | Summary: Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB) instructions in more cases. For example, the following sequence a = _mm256_broadcast_ss(f) d = _mm256_fnmadd_ps(a, b, c) generates an fsub and fma without this patch and an fnma with this change. Reviewers: craig.topper Subscribers: llvm-commits, davidxl, wmi Differential Revision: https://reviews.llvm.org/D48467 llvm-svn: 339043
* [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make ↵Craig Topper2018-08-061-6/+12
| | | | | | | | | | sure the store isn't volatile If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source Differential Revision: https://reviews.llvm.org/D50270 llvm-svn: 339041
* AMDGPU: Fold v_lshl_or_b32 with 0 src0Matt Arsenault2018-08-061-0/+13
| | | | | | Appears from expansion of some packed cases. llvm-svn: 339025
* [AArch64] Fix assertion failure on widened f16 BUILD_VECTORBryan Chan2018-08-061-0/+9
| | | | | | | | | | | | | | | | Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013
* ARM-MachO: don't add Thumb bit for addend to non-external relocation.Tim Northover2018-08-061-0/+1
| | | | | | | | | ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes out that part of any addend in an object file. But it only does that for symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting that extra bit in other cases. llvm-svn: 339007
* Enrich inline messagesDavid Bolvansky2018-08-051-6/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338969
* [X86] Add isel patterns for atomic_load+sub+atomic_sub.Craig Topper2018-08-031-2/+1
| | | | | | Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register. llvm-svn: 338930
* [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns ↵Craig Topper2018-08-032-170/+73
| | | | | | | | | | | | and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925
* DAG: Enhance isKnownNeverNaNMatt Arsenault2018-08-033-9/+90
| | | | | | | | | | | | Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910
* [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").Artem Belevich2018-08-033-5/+12
| | | | | | | | | | | | | | Summary: libdevice in recent CUDA versions relies on __nvvm_reflect() to select GPU-specific bitcode. This patch addresses the requirement. Reviewers: jlebar Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D50207 llvm-svn: 338908
* [X86] Add a DAG combine for the __builtin_parity idiom used by clang to ↵Craig Topper2018-08-031-0/+71
| | | | | | | | | | | | | | enable better codegen Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag. Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and. I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful. Differential Revision: https://reviews.llvm.org/D50165 llvm-svn: 338907
* [WebAssembly] Cleanup of the way globals and global flags are handledNicholas Wilson2018-08-038-16/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D44030 llvm-svn: 338894
* [SystemZ] Improve handling of instructions which expand to several groupsJonas Paulsson2018-08-036-129/+170
| | | | | | | | | | | Some instructions expand to more than one decoder group. This has been hitherto ignored, but is handled with this patch. Review: Ulrich Weigand https://reviews.llvm.org/D50187 llvm-svn: 338849
* [ARM] FP16: support vector zip and unzipSjoerd Meijer2018-08-031-0/+4
| | | | | | | | This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835
* [ARM] FP16: support VFMASjoerd Meijer2018-08-031-0/+6
| | | | | | This is addressing PR38404. llvm-svn: 338830
* [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in ↵Craig Topper2018-08-032-117/+10
| | | | | | | | | | the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824
* [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded ↵Craig Topper2018-08-033-47/+81
| | | | | | | | | | | | instructions. Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place. To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64. I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously. llvm-svn: 338821
* [X86] When post-processing the DAG to remove zero extending moves for ↵Craig Topper2018-08-031-0/+8
| | | | | | | | YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded. If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX. llvm-svn: 338812
* [X86] Add R13D to the isInefficientLEAReg in FixupLEAs.Craig Topper2018-08-031-1/+2
| | | | | | I'm assuming the R13 restriction extends to R13D. Guessing this restriction is related to the funny encoding of this register as base always requiring a displacement to be encoded. llvm-svn: 338806
* [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an ↵Craig Topper2018-08-033-2/+32
| | | | | | | | atomic load and atomic store. This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC. llvm-svn: 338795
* [AMDGPU] Minor change to d16 buffer load implementationTim Renouf2018-08-022-18/+8
| | | | | | | | | | | | | Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784
OpenPOWER on IntegriCloud