summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Don't set exception mask bits when modifying FPCW to change rounding ↵Craig Topper2019-02-151-14/+24
| | | | | | | | | | | | mode for fp->int conversion When we need to do an fp->int conversion using x87 instructions, we need to temporarily change the rounding mode to 0b11 and perform a store. To do this we save the old value of the fpcw to the stack, then set the fpcw to 0xc7f, do the store, then restore fpcw. But the 0xc7f value forces the exception mask bits 1. While this is what they would be in the default FP environment, as we move to support changing the FP environments, we shouldn't make this assumption. This patch changes the code to explicitly OR 0xc00 with the old value so that only the rounding mode is changed. Unfortunately, this requires two stack temporaries instead of one. One to hold the old value and one to hold the new value. Without two stack temporaries we would need an additional GPR. We already need one to do the OR operation in. This is similar to what gcc and icc do for this operation. Though they are both better at reusing the stack temporaries when there are multiple truncates in a function(or at least in a basic block) Differential Revision: https://reviews.llvm.org/D57788 llvm-svn: 354178
* [X86] Fix LowerAsmOutputForConstraint.Nirav Dave2019-02-152-8/+7
| | | | | | | | | | | | | | | | | Summary: Update Flag when generating cc output. Fixes PR40737. Reviewers: rnk, nickdesaulniers, craig.topper, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58283 llvm-svn: 354163
* [X86] Move all the SSE legality checks out of FP_TO_INTHelper and up to ↵Craig Topper2019-02-151-22/+14
| | | | | | | | | | LowerFP_TO_INT. NFCI These checks aren't needed on the call to FP_TO_INTHelper from the type legalizer for splitting i64. We always want to use X87 FIST/FISTT to memory there. Moving up the SSE checks will allow this routine to focus on what it cares about and makes its return semantics cleaner. llvm-svn: 354161
* Recommit "[SystemZ] Do not emit VEXTEND or VROUND nodes without vector support."Jonas Paulsson2019-02-151-0/+8
| | | | | | | | | | | It seems there were some problem with using a .mir test. For some reason doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare' on the output .mir file results in the NoVRegs Property after instruction selection. Recommitting the same test as an .ll file instead. llvm-svn: 354160
* [X86][AVX] lowerShuffleAsLanePermuteAndPermute - fully populate the lane ↵Simon Pilgrim2019-02-151-2/+11
| | | | | | | | | | | | | | shuffle mask (PR40730) As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index. This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128. This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch. Differential Revision: https://reviews.llvm.org/D58237 llvm-svn: 354117
* [ARM GlobalISel] Style fix. NFCIDiana Picus2019-02-151-1/+5
| | | | | | | | Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for all other opcodes where the handling is otherwise the same between arm mode and thumb2. llvm-svn: 354115
* [ARM GlobalISel] Support branches for Thumb2Diana Picus2019-02-152-9/+17
| | | | | | Just like arm mode, but with different opcodes. llvm-svn: 354113
* [RISCV] Add assembler support for LA pseudo-instructionAlex Bradbury2019-02-152-18/+76
| | | | | | | | | | This patch also introduces the emitAuipcInstPair helper, which is then used for both emitLoadAddress and emitLoadLocalAddress. Differential Revision: https://reviews.llvm.org/D55325 Patch by James Clarke. llvm-svn: 354111
* [RISCV] Support assembling %got_pcrel_hi operatorAlex Bradbury2019-02-158-8/+32
| | | | | | | Differential Revision: https://reviews.llvm.org/D55279 Patch by James Clarke. llvm-svn: 354110
* [ARM CGP] Fix ConvertTruncsSam Parker2019-02-151-8/+17
| | | | | | | | | | | | | ConvertTruncs is used to replace a trunc for an AND mask, however this function wasn't working as expected. By performing the change later, we can create a wide type integer mask instead of a narrow -1 value, which could then be simply removed (incorrectly). Because we now perform this action later, it's necessary to cache the trunc type before we perform the promotion. Differential Revision: https://reviews.llvm.org/D57686 llvm-svn: 354108
* X86: Replace isSafeToClobberEFLAGS implementationMatt Arsenault2019-02-152-86/+5
| | | | | | Also use modifiesRegister instead of looping over operands. llvm-svn: 354098
* Revert "[SystemZ] Do not emit VEXTEND or VROUND nodes without vector support."Francis Visoiu Mistrih2019-02-151-8/+0
| | | | | | | | | This reverts commit aa0b77d3395dc6ab91647138139c1a15a3aa088d. This fails to pass the machine verifier: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/13579/ llvm-svn: 354096
* AMDGPU: Set ABI version to 1 for code object v3Konstantin Zhuravlyov2019-02-143-10/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D57811 llvm-svn: 354085
* GlobalISel: Add alignment to LegalityQuery MMOsMatt Arsenault2019-02-144-41/+42
| | | | | | | This allows targets to specify the minimum alignment required for the load/store. llvm-svn: 354071
* AMDGPU/GlobalISel: Fix RegBankSelect for GEP.Matt Arsenault2019-02-142-32/+15
| | | | | | | | | | This is basically a pointer typed add, so shouldn't be any different. This was assuming everything was an SGPR, which is not true. Also cleanup legality for GEP. I don't seem to be seeing the problem the hack marking s64 as a legal pointer type the comment mentions. llvm-svn: 354067
* [AMDGPU] Ressociate 'add (add x, y), z' to use SALUStanislav Mekhanoshin2019-02-142-0/+44
| | | | | | | | | | | Reassociate adds to collect scalar operands in a single instruction when possible. That will result in a scalar add followed by vector instead of two vector adds, thus better utilizing SALU. Differential Revision: https://reviews.llvm.org/D58220 llvm-svn: 354066
* AMDGPU/GlobalISel: Handle split for 64-bit VALU selectMatt Arsenault2019-02-142-12/+55
| | | | llvm-svn: 354065
* [X86] cleanup inline asm register generation. NFCI.Nirav Dave2019-02-141-8/+8
| | | | llvm-svn: 354042
* [SystemZ] Do not emit VEXTEND or VROUND nodes without vector support.Jonas Paulsson2019-02-141-0/+8
| | | | | | | Review: Ulrich Weigand https://reviews.llvm.org/D58240 llvm-svn: 354039
* [MIPS GlobalISel] Select phi instruction for integers Petar Avramovic2019-02-142-0/+16
| | | | | | | | Select G_PHI for integers for MIPS32. Differential Revision: https://reviews.llvm.org/D58183 llvm-svn: 354025
* [MIPS GlobalISel] Select branch instructionsPetar Avramovic2019-02-143-0/+12
| | | | | | | | | | | | | Select G_BR and G_BRCOND for MIPS32. Unconditional branch G_BR does not have register operand, for that reason we only add tests. Since conditional branch G_BRCOND compares register to zero on MIPS32, explicit extension must be performed on i1 condition in order to set high bits to appropriate value. Differential Revision: https://reviews.llvm.org/D58182 llvm-svn: 354022
* [ARM] Ensure we update the correct flags in the peephole optimiserDavid Green2019-02-141-2/+5
| | | | | | | | | | | | | | | | The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can be used to optimise away a CMP. In the rare case that both are found and not ruled-out as valid, we could end up setting the flags on the wrong one. Instead make sure we are using SubAdd if it exists, as it will be closer to the CMP. The testcase here is a little theoretical, with a dead def of cpsr. It should hopefully show the point. Differential Revision: https://reviews.llvm.org/D58176 llvm-svn: 354018
* [X86] Make (f80 (sint_to_fp (i16))) use fistps/fisttps instead of ↵Craig Topper2019-02-141-15/+15
| | | | | | | | | | fistpl/fisttpl when SSE is enabled. When SSE is enabled sint_to_fp with i16 is blindly promoted to i32, but that changes the behavior of f80 conversion. Move the promotion to i16 to LowerFP_TO_INT so we can limit it based on the floating point type. llvm-svn: 354003
* [AVR] Fix a typo - 's/analisys/analysis'Dylan McKay2019-02-131-1/+1
| | | | llvm-svn: 353987
* [WebAssembly] memory.fillThomas Lively2019-02-135-13/+39
| | | | | | | | | | | | | | | | Summary: memset lowering, fix argument types in memcpy lowering, and test encodings. Depends on D57736. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57791 llvm-svn: 353986
* [WebAssembly] Bulk memory intrinsics and builtinsThomas Lively2019-02-132-11/+42
| | | | | | | | | | | | | | | | Summary: implements llvm intrinsics and clang intrinsics for memory.init and data.drop. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D57736 llvm-svn: 353983
* [AArch64] Support reserving arbitrary general purpose registersPetr Hosek2019-02-135-46/+19
| | | | | | | | | | | | This is a follow up to D48580 and D48581 which allows reserving arbitrary general purpose registers with the exception of registers with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM (X0, X19). This change also generalizes some of the existing logic to rely entirely on values generated from tablegen. Differential Revision: https://reviews.llvm.org/D56305 llvm-svn: 353957
* [ARM GlobalISel] Support G_SELECT for Thumb2Diana Picus2019-02-132-5/+13
| | | | | | Same as arm mode, but slightly different opcodes. llvm-svn: 353938
* [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions ↵Anton Afanasyev2019-02-132-0/+8
| | | | | | | | | | | | | (add, sub) Try to use 64-bit SLP vectorization. In addition to horizontal instrs this change triggers optimizations for partial vector operations (for instance, using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by <2 x float>). Fixes llvm.org/PR32433 llvm-svn: 353923
* [X86] Use default expansion for (i64 fp_to_uint f80) when avx512 is enabled ↵Craig Topper2019-02-131-1/+14
| | | | | | | | on 64-bit targets to match what happens without avx512. In 64-bit mode prior to avx512 we use Expand, but with avx512 we need to make f32/f64 conversions Legal so we use Custom and then do our own expansion for f80. But this seems to produce codegen differences relative to avx2. This patch corrects this. llvm-svn: 353921
* [X86] Refactor the FP_TO_INTHelper interface. NFCICraig Topper2019-02-132-69/+38
| | | | | | | | -Pull the final stack load creation from the two callers into the helper. -Return a single SDValue instead of a std::pair. -Remove the Replace flag which isn't really needed. llvm-svn: 353920
* AMDGPU: Try to use function specific STMatt Arsenault2019-02-122-21/+22
| | | | | | | | Subtargets are a function level property, so ideally we would eliminate everywhere that needs to check the global one. Rename the function to try avoiding confusion. llvm-svn: 353900
* AMDGPU: Ignore CodeObjectV3 when inliningMatt Arsenault2019-02-121-0/+1
| | | | | | | | | | This was inhibiting inlining of library functions when clang was invoking the inliner directly. This is covering a bit of a mess with subtarget feature handling, and this shouldn't be a subtarget feature. The behavior is different depending on whether you are using a -mattr flag in clang, or llc, opt. llvm-svn: 353899
* [SystemZ] Remember to cast value to void to disable warning.Jonas Paulsson2019-02-121-2/+2
| | | | | | Hopefully fixes buildbot problems. llvm-svn: 353898
* AMDGPU/NFC: Remove SubtargetFeatureISAVersion since it is not used anywhereKonstantin Zhuravlyov2019-02-121-10/+0
| | | | llvm-svn: 353892
* AMDGPU: Remove duplicate processor (gfx900)Konstantin Zhuravlyov2019-02-121-8/+0
| | | | llvm-svn: 353889
* Fix undefined behaviour in PPCInstPrinter::printBranchOperand.Sean Fertile2019-02-121-1/+1
| | | | | | | | Fix the undefined behaviour introduced by my previous patch r353865 (left shifting a potentially negative value), which was caught by the bots that run UBSan. llvm-svn: 353874
* [AArch64] Expand v8i8 cttz (PR39729)Nikita Popov2019-02-121-9/+1
| | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=39729. Rather than adding just a case for v8i8 I'm setting cttz to expand for all vector types. Differential Revision: https://reviews.llvm.org/D58008 llvm-svn: 353872
* [SystemZ] Use VGM whenever possible to load FP immediates.Jonas Paulsson2019-02-123-2/+58
| | | | | | | | | | | | | isFPImmLegal() has been extended to recognize certain FP immediates that can be built with VGM (Vector Generate Mask). These scalar FP immediates (that were previously loaded from the constant pool) are now selected as VGMF/VGMG in Select(). Review: Ulrich Weigand https://reviews.llvm.org/D58003 llvm-svn: 353867
* [PowerPC] Fix printing of negative offsets in call instruction dissasembly.Sean Fertile2019-02-123-2/+15
| | | | llvm-svn: 353865
* [GlobalISel][AArch64] Select llvm.bswap* for non-vector typesJessica Paquette2019-02-121-0/+38
| | | | | | | | | | | | | | | | This teaches the IRTranslator to emit G_BSWAP when it runs into Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in AArch64. Add a select-bswap.mir test, and add global isel checks to a couple existing tests in test/CodeGen/AArch64. This doesn't handle every bswap case, since some of these rely on known bits stuff. This just lets us handle the naive case. Differential Revision: https://reviews.llvm.org/D58081 llvm-svn: 353861
* [X86][AVX] Enable shuffle combining support for zero_extend Simon Pilgrim2019-02-121-3/+9
| | | | | | A more limited version of rL352997 that had to be disabled in rL353198 - allow extension of any 128/256/512 bit vector that at least uses byte sized scalars. llvm-svn: 353860
* AMDGPU/GlobalISel: Only make f16 constants legal on f16 targetsMatt Arsenault2019-02-121-2/+9
| | | | | | We could deal with it, but there's no real point. llvm-svn: 353845
* [X86] Collapse FP_TO_INT16_IN_MEM/FP_TO_INT32_IN_MEM/FP_TO_INT64_IN_MEM into ↵Craig Topper2019-02-123-21/+19
| | | | | | a single opcode using memory VT to distinquish. NFC llvm-svn: 353798
* [X86] Remove the value type operand from the floating point load/store ↵Craig Topper2019-02-123-58/+84
| | | | | | | | MemIntrinsicSDNodes. Use the MemoryVT instead. NFCI We already have the memory VT, we can just match from that during isel. llvm-svn: 353797
* GlobalISel: Implement moreElementsVector for implicit_defMatt Arsenault2019-02-111-1/+19
| | | | llvm-svn: 353754
* [X86] Correct the memory operand for the FLD emitted in FP_TO_INTHelper for ↵Craig Topper2019-02-111-8/+7
| | | | | | | | | | | | | 32-bit SSE targets. We were using DstTy, but that represents the integer type we are converting to which is i64 in this case. The FLD is part of an intermediate step to get from the SSE registers to the x87 registers. If the floating point type is f32, the memory operand should reflect a 4 byte access not an 8 byte access. The store we used to get from SSE to the stack is using the corect size. While there, consistenly use TheVT in place of Op.getOperand(0).getValueType() throughout the function. llvm-svn: 353745
* [AArch64][GlobalISel] Add isel support for a couple vector exts/truncsJessica Paquette2019-02-111-2/+2
| | | | | | | | | | | | | | Add support for - v4s16 <-> v4s32 - v2s64 <-> v2s32 And update tests that use them to show that we generate the correct instructions. Differential Revision: https://reviews.llvm.org/D57832 llvm-svn: 353732
* [PowerPC] Avoid scalarization of vector truncateRoland Froese2019-02-112-0/+75
| | | | | | | | The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead. Differential Revision: https://reviews.llvm.org/D56507 llvm-svn: 353724
* [GlobalISel][AArch64] Select G_FFLOORJessica Paquette2019-02-112-1/+2
| | | | | | | | | | | This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in AArch64. It updates the existing floating point tests, and adds a select-floor.mir test. Differential Revision: https://reviews.llvm.org/D57486 llvm-svn: 353722
OpenPOWER on IntegriCloud