summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Reduce the number of times we query the subtarget for the same information.Eric Christopher2017-03-311-5/+4
| | | | llvm-svn: 299278
* Small cleanup to remove extraneous cast.Eric Christopher2017-03-311-2/+1
| | | | llvm-svn: 299277
* [Hexagon] Remove unused variablesKrzysztof Parzyszek2017-03-313-18/+4
| | | | | | Found by PVS-Studio. Fixes llvm.org/PR31676. llvm-svn: 299262
* [Hexagon] Fix typo in HexagonEarlyIfCConv.cppKrzysztof Parzyszek2017-03-311-1/+1
| | | | | | Found by PVS-Studio. Fixes llvm.org/PR32480. llvm-svn: 299258
* [AMDGPU] Remove assumption that vector and scalar types do not aliasStanislav Mekhanoshin2017-03-311-8/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D31547 llvm-svn: 299250
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-316-2/+57
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU/R600: Fix amdgpu alias analysis pass.Jan Vesely2017-03-312-5/+11
| | | | | | | | | R600 uses higher AS number to access kernel parameters Fixes: r298846 Differential Revision: https://reviews.llvm.org/D31520 llvm-svn: 299245
* [AArch64] Add new subtarget feature to fold LSL into address mode.Balaram Makam2017-03-313-5/+53
| | | | | | | | | | | | | Summary: This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL. Reviewers: mcrosier, rengolin, t.p.northover Subscribers: junbuml, gberry, llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D31113 llvm-svn: 299240
* [AVX-512] Update lowering for gather/scatter prefetch intrinsics to match ↵Craig Topper2017-03-311-3/+3
| | | | | | | | | | the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers. Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too. Fixes PR32411. llvm-svn: 299234
* [mips][msa] Prevent output operand from commuting for dpadd_[su].df insPetar Jovanovic2017-03-312-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target, restricting commutativity to second and third operand only for dpaadd_[su].df instructions therein. Prior to this change, there were cases where the vector that is to be added to the dot product of the other two could take a position other than the first one in the instruction, generating false output in the destination vector. Such behavior has been noticed in the two functions generating v2i64 output values so far. Other ones may exhibit such behavior as well, just not for the vector operands which are present in the test at the moment. Tests altered so that the function's first operand is a constant splat so that it can be loaded with a ldi instruction, since that is the case in which the erroneous instruction operand placement has occurred. We check that the register which is present in the ldi instruction is placed as the first operand in the corresponding dpadd instruction. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D30827 llvm-svn: 299223
* [SystemZ] Make sure of correct regclasses in insertSelect()Jonas Paulsson2017-03-311-0/+6
| | | | | | | | | Since LOCR only accepts GR32 virtual registers, its operands must be copied into this regclass in insertSelect(), when an LOCR is built. Otherwise, the case where the source operand was GRX32 will produce invalid IR. Review: Ulrich Weigand llvm-svn: 299220
* [DAGCombiner] Add vector demanded elements support to ComputeNumSignBitsSimon Pilgrim2017-03-314-3/+7
| | | | | | | | | | | | | | Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
* [SystemZ] Skip DAGCombining of vector node for older subtargets.Jonas Paulsson2017-03-311-0/+6
| | | | | | | | | | | | | Even on older subtargets that lack vector support, there may be vector values with just one element in the input program. These are converted during DAG legalization to scalar values. The pre-legalize SystemZ DAGCombiner methods should in this circumstance not touch these nodes. This patch adds a check for this in SystemZTargetLowering::combineEXTRACT_VECTOR_ELT(). Review: Ulrich Weigand llvm-svn: 299213
* [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patternsSam Kolton2017-03-314-43/+75
| | | | | | | | | | | | | | | | | Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-3114-3/+15
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if ↵Eric Christopher2017-03-313-37/+19
| | | | | | | | the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline. This reverts commit r298955. llvm-svn: 299153
* [WebAssembly] Initial linking metadata supportDan Gohman2017-03-306-13/+90
| | | | | | | | | | | | | | | | Add support for the new relocations and linking metadata section support in https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In particular, this allows LLVM to indicate which variable is the stack pointer, so that it can be linked with other objects. This also adds support for emitting type relocations for call_indirect instructions. Right now, this is mainly tested by using wabt and hexdump to examine the output on selected testcases. We'll add more tests as the design stablizes and more of the pieces are in place. llvm-svn: 299141
* AMDGPU: Rename isKernelMatt Arsenault2017-03-303-6/+22
| | | | | | | | What we really want to do is distinguish functions that may be called by other functions, and graphics shaders are not called kernels. llvm-svn: 299140
* AMDGPU: Add all atomicrmw fields to atomic.inc/decMatt Arsenault2017-03-301-2/+5
| | | | | | Add scope, order, isVolatile llvm-svn: 299122
* [AVX-512] Fix bad comment from r299112. NFCCraig Topper2017-03-301-1/+2
| | | | llvm-svn: 299114
* [AVX-512] Fix another case where fastisel was generating a GR8 to VK1 copy. ↵Craig Topper2017-03-301-2/+12
| | | | | | | | This time after calls returning i1. Fixes PR32472. llvm-svn: 299112
* [AMDGPU] Add GlobalOpt parameter to Always Inliner passStanislav Mekhanoshin2017-03-303-7/+11
| | | | | | | | | If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108
* [AArch64ISelLowering] Remove `else` after `return` in LowerGlobalTLSAddress.Davide Italiano2017-03-301-1/+1
| | | | llvm-svn: 299103
* [AArch64] Simplify isSingExtended()/isZeroExtended(). NFCI.Davide Italiano2017-03-301-10/+4
| | | | llvm-svn: 299102
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-304-5/+5
| | | | | | Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-301-15/+15
| | | | llvm-svn: 299069
* [X86IselLowering] Remove extraneous semicolon. NFCI.Davide Italiano2017-03-291-1/+1
| | | | | | Unbreaks the build with GCC -Werror. llvm-svn: 299030
* [X86] Tidied up comment - we don't custom lower add/sub i64 on i686 anymore. ↵Simon Pilgrim2017-03-291-1/+2
| | | | | | NFCI. llvm-svn: 299004
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-291-5/+5
| | | | llvm-svn: 299000
* [X86][AVX2] Prevent unary interleaving patterns from calling ↵Simon Pilgrim2017-03-291-3/+4
| | | | | | lowerVectorShuffleAsSplitOrBlend (PR32453) llvm-svn: 298993
* [AMDGPU] Tidy up ↵Simon Pilgrim2017-03-291-13/+6
| | | | | | | | computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991
* [X86] Removed old comment. NFCI.Simon Pilgrim2017-03-291-2/+1
| | | | | | No longer makes sense as the previous opcode mnemonic it was referring to is long gone. llvm-svn: 298988
* Move the x86 cpu feature rtm from Haswell to Skylake matching clang commit ↵Eric Christopher2017-03-291-1/+1
| | | | | | r298956. llvm-svn: 298986
* [AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to ↵Craig Topper2017-03-291-8/+8
| | | | | | GR32 is enough. llvm-svn: 298985
* [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where ↵Craig Topper2017-03-291-16/+12
| | | | | | | | we can just use COPY_TO_REGCLASS instead. This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together. llvm-svn: 298984
* [AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.Craig Topper2017-03-281-1/+2
| | | | | | | | | | We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing. After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate. This fixes PR32451. llvm-svn: 298957
* [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64Guozhi Wei2017-03-283-19/+37
| | | | | | | | | | In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 298955
* [AMDGPU] Boost unroll threshold for loops reading local memoryStanislav Mekhanoshin2017-03-281-30/+72
| | | | | | | | | | | | | This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 llvm-svn: 298948
* [AMDGPU] Fix recorded region boundaries in max-occupancy schedulerStanislav Mekhanoshin2017-03-282-17/+7
| | | | | | | | | | This is incorrect to record region boundaries before scheduling, it may change after scheduling. As a result second pass may see less instructions to schedule than it should. Differential Revision: https://reviews.llvm.org/D31434 llvm-svn: 298945
* [X86][MMX] Match MMX fp_to_sint conversions from XMM registersSimon Pilgrim2017-03-282-4/+25
| | | | | | | | | | We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack). This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc. Differential Revision: https://reviews.llvm.org/D30868 llvm-svn: 298943
* [AMDGPU] Split -amdgpu-early-inline-all optionStanislav Mekhanoshin2017-03-281-3/+13
| | | | | | | | | | Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935
* [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equalitySanjay Patel2017-03-281-1/+5
| | | | | | | Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933
* Revert "Dont emit Mapping symbols for sections that contain only data."Weiming Zhao2017-03-281-68/+14
| | | | | | | | It breaks some lld tests. This reverts commit 3a50eea6d9732ab40e9a7aebe6be777b53a8b35c. llvm-svn: 298932
* [X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDWSimon Pilgrim2017-03-281-28/+47
| | | | llvm-svn: 298929
* [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registersCraig Topper2017-03-283-53/+106
| | | | | | | | | | | | | | | | We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928
* [X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support ↵Simon Pilgrim2017-03-281-34/+33
| | | | | | | | easier. NFCI. Call the matchVectorShuffleAsBlend test as early as possible. llvm-svn: 298925
* Fix signed/unsigned comparison warningSimon Pilgrim2017-03-281-2/+2
| | | | llvm-svn: 298917
* [X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining.Simon Pilgrim2017-03-281-70/+82
| | | | | | Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining. llvm-svn: 298914
* Wdocumentation fixSimon Pilgrim2017-03-281-1/+0
| | | | llvm-svn: 298911
* [X86][SSE] Set second operand to undef instead of first operand in unary ↵Simon Pilgrim2017-03-281-1/+2
| | | | | | | | shuffle combines. Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier. llvm-svn: 298910
OpenPOWER on IntegriCloud