summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP loweringSimon Pilgrim2020-01-111-0/+40
| | | | | | | | Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed. This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case. Helps with poor shuffles mentioned in D66004.
* [X86] Remove dead code from X86DAGToDAGISel::Select that is no longer needed ↵Craig Topper2020-01-111-28/+0
| | | | now that we don't mutate strict fp nodes. NFC
* [X86] Simplify code by removing an unreachable condition. NFCICraig Topper2020-01-101-12/+2
| | | | | | For X87<->SSE conversions, the SSE type is always smaller than the X87 type. So we can always use the smallest type for the memory type.
* [X86] Preserve fpexcept property when turning strict_fp_extend and ↵Craig Topper2020-01-102-4/+37
| | | | | | | | | | | strict_fp_round into stack operations. We use the stack for X87 fp_round and for moving from SSE f32/f64 to X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64. Note for the SSE<->X87 conversions the conversion always happens in the X87 domain. The load/store ops in the X87 instructions are able to signal exceptions.
* [X86][Disassembler] Simplify readPrefixesFangrui Song2020-01-101-43/+25
|
* [X86] Use ReplaceAllUsesWith instead of ReplaceAllUsesOfValueWith to ↵Craig Topper2020-01-101-12/+2
| | | | simplify some code. NFCI
* [AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering.Michael Bedy2020-01-101-5/+22
| | | | | | | | | | | | | | | | | | | Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386
* [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from ↵Craig Topper2020-01-104-17/+4
| | | | | | | | | | | | | | | | | | | RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536
* [AArch64] Don't generate libcalls for wide shifts on DarwinJessica Paquette2020-01-101-1/+1
| | | | | | | Similar to cff90f07cb5cc3. Darwin doesn't always use compiler-rt, and so we can't assume that these functions are available (at least on arm64).
* Let targets adjust operand latency of bundlesStanislav Mekhanoshin2020-01-103-41/+28
| | | | | | | | | | | This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535
* [AArch64] Add isAuthenticated predicate to MCInstDescVedant Kumar2020-01-102-6/+14
| | | | | | | | | | Add a predicate to MCInstDesc that allows tools to determine whether an instruction authenticates a pointer. This can be used by diagnostic tools to hint at pointer authentication failures. Differential Revision: https://reviews.llvm.org/D70329 rdar://55089604
* [X86] Support function attribute "patchable-function-entry"Fangrui Song2020-01-101-3/+15
| | | | | | | For x86-64, we diverge from GCC -fpatchable-function-entry in that we emit multi-byte NOPs. Differential Revision: https://reviews.llvm.org/D72220
* [AArch64] Add function attribute "patchable-function-entry" to add NOPs at ↵Fangrui Song2020-01-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function entry The Linux kernel uses -fpatchable-function-entry to implement DYNAMIC_FTRACE_WITH_REGS for arm64 and parisc. GCC 8 implemented -fpatchable-function-entry, which can be seen as a generalized form of -mnop-mcount. The N,M form (function entry points before the Mth NOP) is currently only used by parisc. This patch adds N,0 support to AArch64 codegen. N is represented as the function attribute "patchable-function-entry". We will use a different function attribute for M, if we decide to implement it. The patch reuses the existing patchable-function pass, and TargetOpcode::PATCHABLE_FUNCTION_ENTER which is currently used by XRay. When the integrated assembler is used, __patchable_function_entries will be created for each text section with the SHF_LINK_ORDER flag to prevent --gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197) and COMDAT (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195) issues. Retrospectively, __patchable_function_entries should use a PC-relative relocation type to avoid the SHF_WRITE flag and dynamic relocations. "patchable-function-entry"'s interaction with Branch Target Identification is still unclear (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 for GCC discussions). Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D72215
* [AIX] Allow vararg calls when all arguments reside in registersjasonliu2020-01-101-22/+85
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch pushes the AIX vararg unimplemented error diagnostic later and allows vararg calls so long as all the arguments can be passed in register. This patch extends the AIX calling convention implementation to initialize GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated for floating point arguments. The GPR(s) are only initialized for vararg calls, otherwise the callee is expected to retrieve the float argument in the FPR. f64 in AIX PPC32 requires special handling in order to allocated and initialize 2 GPRs. This is performed with bitcast, SRL, truncation to initialize one GPR for the MSW and bitcast, truncations to initialize the other GPR for the LSW. A future patch will follow to add support for arguments passed on the stack. Patch provided by: cebowleratibm Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D71013
* [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵Simon Pilgrim2020-01-101-2/+2
| | | | | | | | multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
* [BPF] extend BTF_KIND_FUNC to cover global, static and extern funcsYonghong Song2020-01-103-29/+39
| | | | | | | | | | | | | | Previously extern function is added as BTF_KIND_VAR. This does not work well with existing BTF infrastructure as function expected to use BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO. This patch added extern function to BTF_KIND_FUNC. The two bits 0:1 of btf_type.info are used to indicate what kind of function it is: 0: static 1: global 2: extern Differential Revision: https://reviews.llvm.org/D71638
* [PowerPC] Handle constant zero bits in BitPermutationSelectorNemanja Ivanovic2020-01-101-4/+7
| | | | | | | | | | We currently crash when analyzing an AssertZExt node that has some bits that are constant zeros (i.e. as a result of an and with a constant). This issue was reported in https://bugs.llvm.org/show_bug.cgi?id=41088 and this patch fixes that. Differential revision: https://reviews.llvm.org/D72038
* AMDGPU/GlobalISel: Clamp G_ZEXT source sizesMatt Arsenault2020-01-101-2/+3
| | | | | Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
* [FPEnv] Invert sense of MIFlag::FPExcept flagUlrich Weigand2020-01-101-3/+3
| | | | | | | | | | | | | | | | | | In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes default to potentially raising FP exceptions unless otherwise specified -- i.e. if we forget to propagate the flag somewhere, the effect is now only lost performance, not incorrect code. However, the related flag at the MI level still defaults to nodes not raising FP exceptions unless otherwise specified. To be fully on the (conservatively) safe side, we should invert that flag as well. This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept. (Note that this does also introduce an incompatible change in the MIR format.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D72466
* [ARM][MVE] Tail predicate VMAX,VMAXA,VMIN,VMINASam Parker2020-01-101-0/+2
| | | | | | | Add the MVE min and max instructions to our tail predication whitelist. Differential Revision: https://reviews.llvm.org/D72502
* ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC.Sjoerd Meijer2020-01-101-7/+16
|
* Reverting, broke some bots. Need further investigation.Diogo Sampaio2020-01-107-336/+85
| | | | | | | | Summary: This reverts commit 8c12769f3046029e2a9b4e48e1645b1a77d28650. Reviewers: Subscribers:
* [ARM][Thumb2] Fix ADD/SUB invalid writes to SPDiogo Sampaio2020-01-107-85/+336
| | | | | | | | | | | | | | | | | | | | Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680
* Fix "pointer is null" static analyzer warnings. NFCI.Simon Pilgrim2020-01-101-0/+1
| | | | Assert that the pointers are non-null before dereferencing them.
* [MIR] Fix cyclic dependency of MIR formatterPeng Guo2020-01-101-4/+1
| | | | | | | | | | | | | | Summary: Move MIR formatter pointer from TargetMachine to TargetInstrInfo to avoid cyclic dependency between target & codegen. Reviewers: dsanders, bkramer, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72485
* [NFC] [PowerPC] Add isPredicable for basic instrsQiu Chaofan2020-01-104-34/+21
| | | | | | | | | PowerPC uses a dedicated method to check if the machine instr is predicable by opcode. However, there's a bit `isPredicable` in instr definition. This patch removes the method and set the bit only to opcodes referenced in it. Differential Revision: https://reviews.llvm.org/D71921
* [NFC] Style cleanupShengchen Kan2020-01-101-9/+10
|
* AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELTMatt Arsenault2020-01-095-10/+89
| | | | | Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
* AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v caseMatt Arsenault2020-01-092-16/+87
| | | | | | If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-093-0/+61
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* AVR: Update for getRegisterByName changeMatt Arsenault2020-01-092-3/+3
|
* TableGen/GlobalISel: Add way for SDNodeXForm to work on timmMatt Arsenault2020-01-097-31/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
* GlobalISel: Handle llvm.read_registerMatt Arsenault2020-01-091-0/+2
| | | | | Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-0920-23/+22
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [AArch64][GlobalISel] Implement selection of <2 x float> vector splat.Amara Emerson2020-01-092-7/+36
| | | | | | Also requires making G_IMPLICIT_DEF of v2s32 legal. Differential Revision: https://reviews.llvm.org/D72422
* AMDGPU/GlobalISel: Fix argument lowering for vectors of pointersMatt Arsenault2020-01-091-2/+18
| | | | | | | When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.
* AMDGPU/GlobalISel: Widen 16-bit shift amount sourcesMatt Arsenault2020-01-091-1/+2
| | | | | | This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.
* MipsDelaySlotFiller: Update registers def-uses for BUNDLE instructionsAlex Richardson2020-01-091-2/+10
| | | | | | | | | | | | | | | Summary: In commit b91f239485fb7bb8d29be3e0b60660a2de7570a9 I updated the MipsDelaySlotFiller to skip BUNDLE instructions. However, in addition to not considering BUNDLE instructions for the delay slot, we also need to ensure that the register def-use information is updated. Not updating this information caused run-time crashes (when using the out-of-tree CHERI backend) since later definitions could be overwritten with earlier register values. Reviewers: atanasyan Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D72254
* [GlobalISel][AArch64] Import + select LDR*roW and STR*roW patternsJessica Paquette2020-01-092-46/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for selecting a large chunk of the load/store *roW patterns. This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO into GISel. The code is very similar to the XRO code. The main difference is that in the *roW patterns, we want to try and fold in an extend, and *possibly* a shift along with it. A good portion of this patch is refactoring the existing XRO code. - Add selectAddrModeWRO - Factor out the code from selectAddrModeShiftedExtendXReg which is used by both selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL. This is similar to the function of the same name in AArch64DAGToDAGISel. - Add support for extends to the factored out code in selectExtendedSHL. - Teach getExtendTypeForInst how to handle AND masks that are intended to be used in loads/stores (necessary for this addressing mode.) - Make getExtendTypeForInst not static because moving it made an annoying diff and I wanted to have the WRO/XRO functions close to each other while I was writing the code. Differential Revision: https://reviews.llvm.org/D72426
* [ms] [X86] Use "P" modifier on all branch-target operands in inline X86 ↵Eric Astor2020-01-094-50/+31
| | | | | | | | | | | | | | | | | | | assembly. Summary: Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions. Also add a regression test for llvm.org/PR44272, since this finishes fixing it. Reviewers: thakis, rnk Reviewed By: thakis Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72417
* [X86] AMD Znver2 (Rome) Scheduler enablementGanesh Gopalasubramanian2020-01-103-2/+1551
| | | | | | | | | | | | The patch gives out the details of the znver2 scheduler model. There are few improvements with respect to execution units, latencies and throughput when compared with znver1. The tests that were present for znver1 for llvm-mca tool were replicated. The latencies, execution units, timeline and throughput information are updated for znver2. Reviewers: craig.topper, Simon Pilgrim Differential Revision: https://reviews.llvm.org/D66088
* [PowerPC] The VK_PLT symbolref modifier is only used on 32-bit ELF. [NFC]Sean Fertile2020-01-091-2/+2
| | | | | | | | | Fix a conditional that guarded code for execution only on 32-bit ELF by checking that the Subtarget was not 64-bit and not-Darwin. By adding a new target ABI (AIX), the condition is no longer correct. This code is dead for AIX, due to a 'report_fatal_error' for thread local storage usage earlier in the pipeline, but needs to be modifed as part of Darwins removal from the PowerPC backend.
* [SystemZ] Fix matching another pattern for nxgrk (PR44496)Ulrich Weigand2020-01-091-2/+3
| | | | | | | | SystemZDAGToDAGISel::Select will attempt to split logical instruction with a large immediate constant. This must not happen if the result matches one of the z15 combined operations, so the code checks for those. However, one of them was missed, causing invalid code to be generated in the test case for PR44496.
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-092-32/+30
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU: Eliminate more legacy codepred address space PatFragsMatt Arsenault2020-01-094-93/+24
| | | | These should now be limited to R600 code.
* AMDGPU: Use new PatFrag system for d16 storesMatt Arsenault2020-01-092-15/+9
|
* AMDGPU: Use new PatFrag system for d16 load nodesMatt Arsenault2020-01-091-32/+23
|
* AMDGPU/GlobalISel: Fix import of zext of s16 op patternsMatt Arsenault2020-01-092-3/+5
|
* AMDGPU/GlobalISel: Add IMMPopCount xformMatt Arsenault2020-01-093-0/+12
| | | | Partially fixes BFE pattern import.
* AMDGPU/GlobalISel: Add selectVOP3Mods_nnanMatt Arsenault2020-01-093-0/+20
| | | | | | This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.
OpenPOWER on IntegriCloud