summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fddAlexander Timofeev2019-06-064-163/+107
| | | | | | | | | | | | "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
* [X86] Make a bunch of merge masked binops commutable for loading folding.Craig Topper2019-06-061-8/+7
| | | | | | | | | | | | | This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746
* [AIX] Implement function descriptor on SDAGJason Liu2019-06-064-18/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735
* Remove unused PPC.h includes under llvm/lib/Target/PowerPC.Dmitri Gribenko2019-06-063-4/+1
| | | | llvm-svn: 362718
* [X86] Make masked floating point equality/ordered compares commutable for ↵Craig Topper2019-06-062-7/+17
| | | | | | | | load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717
* [AIX] Implement call lowering with parameters could pass onto GPRsJason Liu2019-06-062-15/+83
| | | | | | | | | | | | Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708
* AArch64] Handle ISD::LRINT and ISD::LLRINT for float16Adhemerval Zanella2019-06-061-0/+8
| | | | | | | | | | | This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700
* [AArch64] Handle ISD::LROUND and ISD::LLROUND for float16Adhemerval Zanella2019-06-061-0/+8
| | | | | | | | | | | This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698
* Include what you use in LanaiAsmParser.cppDmitri Gribenko2019-06-061-1/+0
| | | | llvm-svn: 362696
* [MIPS GlobalISel] Select sqrtPetar Avramovic2019-06-062-2/+3
| | | | | | | | Select G_FSQRT for MIPS32. Differential Revision: https://reviews.llvm.org/D62905 llvm-svn: 362692
* [MIPS GlobalISel] Select fabsPetar Avramovic2019-06-063-2/+13
| | | | | | | | Select G_FABS for MIPS32. Differential Revision: https://reviews.llvm.org/D62903 llvm-svn: 362690
* [MIPS GlobalISel] Select fpext and fptruncPetar Avramovic2019-06-062-0/+14
| | | | | | | | Select G_FPEXT and G_FPTRUNC for MIPS32. Differential Revision: https://reviews.llvm.org/D62902 llvm-svn: 362689
* [MIPS GlobalISel] Select floor and ceilPetar Avramovic2019-06-061-0/+3
| | | | | | | | Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688
* [AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.Amara Emerson2019-06-061-0/+23
| | | | | | | | | | | We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681
* [AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed ↵Amara Emerson2019-06-061-0/+5
| | | | | | | | to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677
* [X86] Don't turn avx masked.load with constant mask into masked.load+vselect ↵Craig Topper2019-06-061-0/+3
| | | | | | | | | | when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675
* Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when ↵Amara Emerson2019-06-051-8/+96
| | | | | | | | | | | | G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666
* AMDGPU: Don't fix emergency stack slot at offset 0Matt Arsenault2019-06-052-26/+11
| | | | | | | | | | | | | | | | | | | | | This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665
* Allow target to handle STRICT floating-point nodesUlrich Weigand2019-06-054-188/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is *both* marked as mayRaiseFPException *and* FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663
* Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT ↵Petr Hosek2019-06-051-96/+8
| | | | | | | | is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662
* AMDGPU: Invert frame index offset interpretationMatt Arsenault2019-06-059-209/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661
* [X86] Fix mistake that marked ↵Craig Topper2019-06-051-1/+1
| | | | | | | | | | | | | | | VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647
* AMDGPU: Remove amdgpu-max-work-group-size attributeMatt Arsenault2019-06-051-10/+1
| | | | | | | This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641
* AMDGPU: Fix using 2 different enums for same operand flagsMatt Arsenault2019-06-053-11/+8
| | | | | | | These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640
* [WebAssembly] Limit PIC support to the Emscripten targetDan Gohman2019-06-051-2/+11
| | | | | | | | | | | The current PIC support currently only works with Emscripten, so disable it for other targets. This is the PIC portion of https://reviews.llvm.org/D62542. Reviewed By: dschuff, sbc100 llvm-svn: 362638
* [X86] Add the vector integer min/max instructions to ↵Craig Topper2019-06-051-0/+84
| | | | | | | | | | | | | | | | | isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The *reduce* test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629
* [x86] split more 256-bit stores of concatenated vectorsSanjay Patel2019-06-051-3/+4
| | | | | | | | As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620
* [X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.Simon Pilgrim2019-06-051-12/+17
| | | | | | Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617
* [MIPS GlobalISel] Select fcmpPetar Avramovic2019-06-053-0/+94
| | | | | | | | Select floating point compare for MIPS32. Differential Revision: https://reviews.llvm.org/D62721 llvm-svn: 362603
* [X86][AVX] combineX86ShuffleChain - combine ↵Simon Pilgrim2019-06-051-3/+10
| | | | | | | | | | shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599
* Include what you use in PPCFrameLowering.hDmitri Gribenko2019-06-051-1/+0
| | | | llvm-svn: 362590
* [PowerPC] Collapse RLDICL/RLDICR into RLDIC when possibleNemanja Ivanovic2019-06-051-0/+52
| | | | | | | | | | | | | | | | | | | Generally speaking, we lower to an optimal rotate sequence for nodes visible in the SDAG. However, there are instances where the two rotates are not visible at ISEL time - most notably those in a very common sequence when lowering switch statements to jump tables. A common situation is a switch on a 32-bit integer. This value has to have the upper 32 bits cleared and because jump table offsets are word offsets, the value needs to be shifted left by 2 bits. We currently emit the clear and the left shift as two separate instructions, but this is not needed as we can lower it to a single RLDIC. This patch just cleans that up. Differential revision: https://reviews.llvm.org/D60402 llvm-svn: 362576
* [X86] Cleanup convertIntLogicToFPLogic a little. NFCICraig Topper2019-06-051-23/+24
| | | | | | | | | | | | | | -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565
* [AArch64][GlobalISel] Make extloads to i64 legal.Amara Emerson2019-06-041-0/+3
| | | | | | | | Although we had the support in the prelegalizer combiner to generate the G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as lowering back to separate ops. llvm-svn: 362553
* [WebAssembly] Fix ISel crash on sext_inreg/extract type mismatchThomas Lively2019-06-041-2/+26
| | | | | | | | | | | | | | | | | | Summary: Adjusts the index and adds a bitcast around the vector operand of EXTRACT_VECTOR_ELT so that its lane type matches the source type of its parent sext_inreg. Without this bitcast the ISel patterns do not match and ISel fails. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62646 llvm-svn: 362547
* [X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE ↵Craig Topper2019-06-043-357/+82
| | | | | | | | | | | | | | | | during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535
* [X86] Fold single-use variable into assert. NFC.Benjamin Kramer2019-06-041-2/+2
| | | | | | Avoids an unused variable warning in Release builds. llvm-svn: 362534
* [x86] split 256-bit store of concatenated vectorsSanjay Patel2019-06-041-0/+11
| | | | | | | | | | | | | | | | | | | | This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524
* [AArch64][ELF] Add support for PLT decoding with BTI instructions presentPeter Smith2019-06-041-1/+9
| | | | | | | | | | | | | | | | | | | | | | Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect: bti c adrp x16, page offset to .got.plt ... Instead of: adrp x16, page offset to .got.plt ... At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single "bti c" to prefix it. A test binary has been uploaded with such a PLT sequence. A forthcoming LLD patch will make heavy use of the PLT decoding code. Differential Revision: https://reviews.llvm.org/D62598 llvm-svn: 362523
* [PowerPC] P9 Scheduling Model: dispatching rule fixesJinsong Ji2019-06-042-126/+162
| | | | | | | | | | | | | | | | This is to address some of the problems in existing P9 resource modeling, especially about the dispatching rules. Instead of using a hypothetical DISPATCHER , we try to use the number of actual dispatch slots, and define SchedWriteRes to model dispatch rules, then update instruction classes according to dispatch rules. All the dispatch rules and instruction classes update are made according to POWER9 User Manual. Differential Revision: https://reviews.llvm.org/D61873 llvm-svn: 362509
* [SelectionDAG][x86] limit post-legalization store merging by typeSanjay Patel2019-06-042-2/+6
| | | | | | | | | | | The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507
* [X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern ↵Simon Pilgrim2019-06-041-49/+66
| | | | | | | | match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504
* [Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() ↵Shawn Landden2019-06-041-1/+1
| | | | | | | | | | | | and countTrailingOnes() return unsigned This matches APInt's versions of these functions, and there is no need for these to be size_t. (as well as __builtin_clzll()) Differential Revision: https://reviews.llvm.org/D60823 llvm-svn: 362503
* Include what you use in PPCRegisterInfo.cppDmitri Gribenko2019-06-041-1/+0
| | | | llvm-svn: 362495
* [ARM] Add FP16 vector insert/extract patternsMikhail Maltsev2019-06-041-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482
* Include what you use in PPC.hDmitri Gribenko2019-06-041-1/+0
| | | | llvm-svn: 362477
* Include what you use in PPCMachineScheduler.cppDmitri Gribenko2019-06-041-1/+3
| | | | llvm-svn: 362476
* Include what you use in PPCRegisterInfo.hDmitri Gribenko2019-06-041-1/+2
| | | | llvm-svn: 362475
* [ARM] Turn some undefined encoding bits into 0s.Simon Tatham2019-06-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470
* AMDGPU: Disable stack realignment for kernelsMatt Arsenault2019-06-032-0/+14
| | | | | | | | | | | | | | | | | | | This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447
OpenPOWER on IntegriCloud