summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* X86AsmParser: Do not process a non-existent tokenCraig Topper2019-03-261-0/+5
| | | | | | | | | | This error can only happen if an unfinished operation is at Eof. Patch by Brandon Jones Differential Revision: https://reviews.llvm.org/D57379 llvm-svn: 356972
* [ARM] Add missing memory operands to a bunch of instructions.Eli Friedman2019-03-253-25/+59
| | | | | | | | | | | | | | This should hopefully lead to minor improvements in code generation, and more accurate spill/reload comments in assembly. Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they don't lead to misleading assembly comments for merged memory operands; this is technically orthogonal, but in practice the relevant memory operand lists don't show up without this change. Differential Revision: https://reviews.llvm.org/D59713 llvm-svn: 356963
* Revert "AMDGPU: Scavenge register instead of findUnusedReg"Matt Arsenault2019-03-251-1/+1
| | | | | | | | This reverts r356149. This is crashing on rocBLAS. llvm-svn: 356958
* AMDGPU: Remove unnecessary check for isFullCopyMatt Arsenault2019-03-251-1/+1
| | | | | | | Subregister indexes are not used for physical register operands, so isFullCopy is implied by the physical register check. llvm-svn: 356956
* [AArch64] Prefer "mov" over "orr" to materialize constants.Eli Friedman2019-03-251-2/+5
| | | | | | | | | | | | | This is generally more readable due to the way the assembler aliases work. (This causes a lot of test changes, but it's not really as scary as it looks at first glance; it's just mechanically changing a bunch of checks for orr to check for mov instead.) Differential Revision: https://reviews.llvm.org/D59720 llvm-svn: 356954
* AMDGPU: Set hasSideEffects 0 on _term instructionsMatt Arsenault2019-03-251-0/+3
| | | | | | | | These were defaulting to true, but they are just wrappers around bit operations. This avoids regressions in the exec mask optimization passes in a future commit. llvm-svn: 356952
* AMDGPU: Add support for cross address space synchronization scopesKonstantin Zhuravlyov2019-03-253-32/+101
| | | | | | Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946
* AMDGPU: Preserve LiveIntervals in WQMMatt Arsenault2019-03-251-0/+2
| | | | | | This seems to already be done, but wasn't marked. llvm-svn: 356922
* [MIPS GlobalISel] Select copy for arguments from FPRBRegBankPetar Avramovic2019-03-251-5/+15
| | | | | | | | | Move selectCopy into MipsInstructionSelector class. Select copy for arguments from FPRBRegBank for MIPS32. Differential Revision: https://reviews.llvm.org/D59644 llvm-svn: 356886
* [MIPS GlobalISel] Add floating point register bankPetar Avramovic2019-03-252-0/+7
| | | | | | | | | Add floating point register bank for MIPS32. Implement getRegBankFromRegClass for float register classes. Differential Revision: https://reviews.llvm.org/D59643 llvm-svn: 356883
* [MIPS GlobalISel] Lower float and double arguments in registersPetar Avramovic2019-03-252-36/+98
| | | | | | | | | | Lower float and double arguments in registers for MIPS32. When float/double argument is passed through gpr registers select appropriate move instruction. Differential Revision: https://reviews.llvm.org/D59642 llvm-svn: 356882
* [ARM GlobalISel] 64-bit memops should be alignedDiana Picus2019-03-251-9/+10
| | | | | | | | | | We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872
* [X86] Update some of the getMachineNode calls from X86ISelDAGToDAG to also ↵Craig Topper2019-03-251-8/+9
| | | | | | | | | include a VT for a EFLAGS result. This makes the nodes consistent with how they would be emitted from the isel table. llvm-svn: 356870
* [X86] When selecting (x << C1) op C2 as (x op (C2>>C1)) << C1, use the ↵Craig Topper2019-03-251-1/+2
| | | | | | | | | | | | | operation VT for the target constant. Normally when the nodes we use here(AND32ri8 for example) are selected their immediates are just converted from ConstantSDNode to TargetConstantSDNode without changing VT from the original operation VT. So we should still be emitting them with the operation VT. Theoretically this could expose more accurate opportunities for CSE. llvm-svn: 356869
* [X86] Remove GetLo8XForm and use GetLo32XForm instead. NFCICraig Topper2019-03-251-6/+1
| | | | | | | | We were using this to create an AND32ri8 node from a 64-bit and, but that node normally still uses a 32-bit immediate. So we should just truncate the existing immediate to i32. We already verified it has the same value in bits 31:7. llvm-svn: 356868
* [X86] Remove a couple unused SDNodeXForms. NFCCraig Topper2019-03-251-11/+0
| | | | llvm-svn: 356867
* Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit ↵Craig Topper2019-03-253-5/+18
| | | | | | | | immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866
* [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-28/+33
| | | | | | | | Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864
* [WebAssembly] Rename a variable in CFGSort (NFC)Heejin Ahn2019-03-241-4/+4
| | | | | | | | Class `RegionInfo` was `SortUnitInfo` before, so the variables were named `SUI`. Now the class name is `RegionInfo`, so this renames `SUI` to `RI`, matching the class name. llvm-svn: 356861
* [X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-2/+14
| | | | | | | | Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858
* [x86] improve the default expansion of uaddsat/usubsatSanjay Patel2019-03-241-3/+31
| | | | | | | | | | | | | | | This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855
* [x86] reduce code duplication; NFCSanjay Patel2019-03-231-3/+5
| | | | llvm-svn: 356836
* [ARM] Don't form "ands" when it isn't scheduled correctly.Eli Friedman2019-03-221-1/+9
| | | | | | | | | | | | In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809
* [X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if ↵Craig Topper2019-03-221-0/+22
| | | | | | | | | | | | possible if popcnt instruction is not available On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences. This mitigates some of PR41151 for the i64 on i686 case when we have SSE2. Differential Revision: https://reviews.llvm.org/D59662 llvm-svn: 356808
* [X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enableCraig Topper2019-03-221-5/+44
| | | | | | | | | | | | | | We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807
* [AArch64, ARM] Add support for Exynos M5Evandro Menezes2019-03-222-0/+4
| | | | | | Add Exynos M5 support and test cases. llvm-svn: 356793
* [ARM] [NFC] Use tGPR in patterns where appropriate.Eli Friedman2019-03-221-11/+12
| | | | | | | | | | This doesn't have any practical effect at the moment, as far as I know, because high registers aren't allocatable in Thumb1 mode. But it might matter in the future. Differential Revision: https://reviews.llvm.org/D59675 llvm-svn: 356791
* [X86] lowerShuffleAsBitMask - ensure float bit masks are the correct width ↵Simon Pilgrim2019-03-221-5/+5
| | | | | | (PR41203) llvm-svn: 356784
* [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.Alina Sbirlea2019-03-222-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it. AA changes: - This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode". - All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged. - AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added. MemorySSA changes: - All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults). - At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults. - All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built. - The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA. - All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis. Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59315 llvm-svn: 356783
* [AMDGPU] Use three- and five-dword result type in image opsTim Renouf2019-03-222-15/+12
| | | | | | | | | | | | | | | | Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757
* [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsicsTim Renouf2019-03-226-24/+68
| | | | | | | | | | | | | | | Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755
* [RISCV] Add basic RV32E definitions and MC layer supportAlex Bradbury2019-03-229-15/+67
| | | | | | | | | | | | | | | | | | | | | The RISC-V ISA defines RV32E as an alternative "base" instruction set encoding, that differs from RV32I by having only 16 rather than 32 registers. This patch adds basic definitions for RV32E as well as MC layer support (assembling, disassembling) and tests. The only supported ABI on RV32E is ILP32E. Add a new RISCVFeatures::validate() helper to RISCVUtils which can be called from codegen or MC layer libraries to validate the combination of TargetTriple and FeatureBitSet. Other targets have similar checks (e.g. erroring if SPE is enabled on PPC64 or oddspreg + o32 ABI on Mips), but they either duplicate the checks (Mips), or fail to check for both codegen and MC codepaths (PPC). Codegen for the ILP32E ABI support and RV32E codegen are left for a future patch/patches. Differential Revision: https://reviews.llvm.org/D59470 llvm-svn: 356744
* [RISCV] Optimize emission of SELECT sequencesAlex Bradbury2019-03-221-17/+90
| | | | | | | | | | | | | | | | | | | This patch optimizes the emission of a sequence of SELECTs with the same condition, avoiding the insertion of unnecessary control flow. Such a sequence often occurs when a SELECT of values wider than XLEN is legalized into two SELECTs with legal types. We have identified several use cases where the SELECTs could be interleaved with other instructions. Therefore, we extend the sequence to include non-SELECT instructions if we are able to detect that the non-SELECT instructions do not impact the optimization. This patch supersedes https://reviews.llvm.org/D59096, which attempted to address this issue by introducing a new SelectionDAG node. Hat tip to Eli Friedman for his feedback on how to best handle this issue. Differential Revision: https://reviews.llvm.org/D59355 Patch by Luís Marques. llvm-svn: 356741
* [RISCV] Allow conversion of CC logic to bitwise logicAlex Bradbury2019-03-221-0/+4
| | | | | | | | | | | | | Indicates in the TargetLowering interface that conversions from CC logic to bitwise logic are allowed. Adds tests that show the benefit when optimization opportunities are detected. Also adds tests that show that when the optimization is not applied correct code is generated (but opportunities for other optimizations remain). Differential Revision: https://reviews.llvm.org/D59596 Patch by Luís Marques. llvm-svn: 356740
* [AMDGPU] Added v5i32 and v5f32 register classesTim Renouf2019-03-2210-4/+144
| | | | | | | | | | They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735
* [BPF] handle derived type properly for computing type idYonghong Song2019-03-221-1/+2
| | | | | | | | | | | | | | | | | | Currently, the type id for a derived type is computed incorrectly. For example, type #1: int type #2: ptr to #1 For a global variable "int *a", type #1 will be attributed to variable "a". This is due to a bug which assigns the type id of the basetype of that derived type as the derived type's type id. This happens to "const", "volatile", "restrict", "typedef" and "pointer" types. This patch fixed this bug, fixed existing test cases and added a new one focusing on pointers plus other derived types. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 356727
* [AArch64] Split the neon.addp intrinsic into integer and fp variants.Amara Emerson2019-03-213-28/+1
| | | | | | | | | | | | | | | | | | | This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode. This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865. This is a more permanent solution to PR40968. Differential Revision: https://reviews.llvm.org/D59655 llvm-svn: 356722
* [X86] Use LoadInst->getType() instead of ↵Craig Topper2019-03-211-3/+2
| | | | | | | | LoadInst->getPointerOperandType()->getElementType(). NFCI For the future day when the pointer's don't have element types, we shoudl just use the type of the load result instead. llvm-svn: 356721
* Mips: Fix typo in assert messageMatt Arsenault2019-03-211-1/+1
| | | | llvm-svn: 356717
* Mips: Don't create copy of nothingMatt Arsenault2019-03-211-2/+0
| | | | | | | | | This was creating a copy of the register the pseudo itself was def'ing, leaving a copy of an undefined register. I'm not sure how the verifier is not catching this, but this avoids asserting in a future change to RegAllocFast llvm-svn: 356716
* [AArch64] Update for ExynosEvandro Menezes2019-03-211-1/+1
| | | | | | Fix the feature set for Exynos M4 by removing support for `+fp16fml` and fix test case. llvm-svn: 356698
* [X86] canonicalizeBitSelect - don't attempt to canonicalize mask registersSimon Pilgrim2019-03-211-1/+1
| | | | | | | | We don't use X86ISD::ANDNP for mask registers. Test case from @craig.topper (Craig Topper) llvm-svn: 356696
* [X86] Don't avoid folding multiple use sign extended 8-bit immediate into ↵Craig Topper2019-03-213-18/+5
| | | | | | | | | | | | instructions under optsize. Under optsize we try to avoid folding immediates into instructions under optsize. But if the immediate is 16-bits or 32 bits, but can be encoded as an 8-bit immediate we don't save enough from disabling the folding unless the immediate has enough uses to make up for the size of the move which is either 3 bytes or 5 bytes since there are no sign extended 8-bit moves. We would also save something if the immediate was a live out of the basic block and thus a move was unavoidable, but that would require a more advanced heuristic than just counting uses. Note we only avoid folding multiple use immediates into the patterns that use X86ISD::ADD/SUB/XOR/OR/AND/CMP/ADC/SBB nodes and not the more common ISD::ADD/SUB/XOR/OR/AND nodes. Differential Revision: https://reviews.llvm.org/D59522 llvm-svn: 356688
* [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and ↵Craig Topper2019-03-212-0/+30
| | | | | | | | | | | | | | compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687
* [Thumb] Fix infinite loop in ABS expansion (PR41160)Simon Pilgrim2019-03-211-1/+4
| | | | | | Don't expand ISD::ABS node if its legal. llvm-svn: 356661
* [AMDGPU] Support for v3i32/v3f32Tim Renouf2019-03-2112-52/+279
| | | | | | | | | | | | | | | Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
* [AArch64] Allow -mattr=tpidr-el[1|2|3]Oliver Stannard2019-03-213-0/+21
| | | | | | | | | | | Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base register, rather than the default TPIDR_EL0. Patch by Philip Derrin! Differential revision: https://reviews.llvm.org/D54685 llvm-svn: 356657
* [X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 ↵Craig Topper2019-03-205-48/+92
| | | | | | | | | | | | including 'generic'. Disable use of CMPXCHG8B when this flag isn't set. CMPXCHG8B was introduced on i586/pentium generation. If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG. Differential Revision: https://reviews.llvm.org/D59576 llvm-svn: 356631
* [WebAssembly][NFC] Fix formatting error from rL356610Thomas Lively2019-03-201-2/+3
| | | | llvm-svn: 356622
* [AMDGPU] Do not generate spurious PAL metadataTim Renouf2019-03-202-6/+10
| | | | | | | | | | | | My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata" accidentally caused a spurious PAL metadata .note record to be emitted for any AMDGPU output. That caused failures in the lld test amdgpu-relocs.s. Fixed. Differential Revision: https://reviews.llvm.org/D59613 Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e llvm-svn: 356621
OpenPOWER on IntegriCloud