summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* PPCReduceCRLogicals - fix static analyzer warnings. NFCSimon Pilgrim2019-11-131-6/+6
| | | | | - Fix uninitialized variable warnings. - Fix null dereference warnings.
* Revert "[RISCV] Fix wrong CFI directives"Luís Marques2019-11-131-0/+55
| | | | test/DebugInfo/RISCV/relax-debug-frame.ll wasn't properly updated.
* [ARM][MVE] canTailPredicateLoopSjoerd Meijer2019-11-132-9/+100
| | | | | | | | | | | | | | | | | | This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845
* [RISCV] Fix wrong CFI directivesLuís Marques2019-11-131-55/+0
| | | | | | | | | | | | | Summary: Removes CFI CFA directives that could incorrectly propagate beyond the basic block they were inteded for. Specifically it removes the epilogue CFI directives. See the branch_and_tail_call test for an example of the issue. Should fix the stack unwinding issues caused by the incorrect directives. Reviewers: asb, lenary, shiva0217 Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D69723
* [X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS ↵Simon Pilgrim2019-11-131-20/+24
| | | | | | | | instructions These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting. In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.
* [Mips] Add rematerialization support for ldi.fmtMirko Brkusanin2019-11-131-0/+1
| | | | | | | Instruction ldi.fmt can be considered cheap enough to avoid spill and restore of value that it produces since it's loaded from immediate. Differential Revision: https://reviews.llvm.org/D69898
* [mips] Show an error if 64-bit target triple provided with 32-bit CPUSimon Atanasyan2019-11-131-0/+4
| | | | | | | | | When a 64-bit triple is used emit an error if the CPU only supports 32-bit code. Patch by Miloš Stojanović. Differential Revision: https://reviews.llvm.org/D70018
* [AArch64] Extend storeRegToStackSlot to spill SVE registers.Sander de Smalen2019-11-132-0/+26
| | | | | | | | | | This patch allows the register allocator to spill SVE registers to the stack. Reviewers: ostannard, efriedma, rengolin, cameron.mcinally Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70082
* [AArch64][SVE] Allocate locals that are scalable vectors.Sander de Smalen2019-11-132-10/+45
| | | | | | | | | | | | This patch adds a target interface to set the StackID for a given type, which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a 'sve-vec' StackID, so it is allocated in the SVE area of the stack frame. Reviewers: ostannard, efriedma, rengolin, cameron.mcinally Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70080
* [ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.Simon Tatham2019-11-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | MVE includes instructions that extract an 8- or 16-bit lane from a vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td` already included isel patterns to select those instructions in response to the `ARMISD::VGETLANEs` selection-DAG node type. But `ARMISD::VGETLANEs` was never actually generated, because the code that creates it was conditioned on NEON only. It's an easy fix to enable the same code for integer MVE, and now IR that sign-extends the result of an extractelement (whether explicitly or as part of the function call ABI) will use `vmov.s8` instead of `vmov.u8` followed by `sxtb`. Reviewers: SjoerdMeijer, dmgreen, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70132
* [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)joanlluch2019-11-132-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Replaces ``` unsigned getShiftAmountThreshold(EVT VT) ``` by ``` bool shouldAvoidTransformToShift(EVT VT, unsigned amount) ``` thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not. Updates the MSP430 target with a custom implementation. This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this. Existing tests apply, a few more have been added. Reviewers: asl, spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70042
* [X86] Remove setOperationAction for FP_TO_SINT v8i16.Craig Topper2019-11-122-8/+3
| | | | | | | | This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
* AMDGPU: Extend add x, (ext setcc) combine to subMatt Arsenault2019-11-131-0/+22
| | | | | | This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-131-7/+1
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* AMDGPU Reduce reported maximum group size to 1024Matt Arsenault2019-11-131-1/+2
| | | | | While some targets allow encoding 2048, this was never tested or supported.
* [X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type.Craig Topper2019-11-121-25/+47
| | | | | This avoids some nasty issues with argument passing and lowering of arbitrary v64i8 shuffles.
* [X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 ↵Craig Topper2019-11-121-4/+4
| | | | | | | | | | | type won't be split by prefer-vector-width=256 Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16. In reality this shouldn't happen because it means we have a 512-bit vector argument, but min-legal-vector-width says a value less than 512. But a 512-bit argument should have been factored into the preferred vector width.
* [BPF] generate BTF_KIND_VARs for all non-static globalsYonghong Song2019-11-122-4/+7
| | | | | | | | | | | Enable to generate BTF_KIND_VARs for non-static default-section globals which is not allowed previously. Modified the existing test case to accommodate the new change. Also removed unused linkage enum members VAR_GLOBAL_TENTATIVE and VAR_GLOBAL_EXTERNAL. Differential Revision: https://reviews.llvm.org/D70145
* [AArch64] Update for ExynosEvandro Menezes2019-11-122-10/+26
| | | | Fix the modeling for loads and stores using the register offset addresing mode.
* [AArch64] Fix addressing mode predicatesEvandro Menezes2019-11-121-3/+5
| | | | Fix predicates related to the register offset addressing mode.
* ARM: Don't emit R_ARM_NONE relocations to compact unwinding decoders in ↵Peter Collingbourne2019-11-122-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | .ARM.exidx on Android. These relocations are specified by the ARM EHABI (section 6.3). As I understand it, their purpose is to accommodate unwinder implementations that wish to reduce code size by placing the implementations of the compact unwinding decoders in a separate translation unit, and using extern weak symbols to refer to them from the main unwinder implementation, so that they are only linked when something in the binary needs them in order to unwind. However, neither of the unwinders used on Android (libgcc, LLVM libunwind) use this technique, and in fact emitting these relocations ends up being counterproductive to code size because they cause a copy of the unwinder to be statically linked into most binaries, regardless of whether it is actually needed. Furthermore, these relocations create circular dependencies (between libc and the unwinder) in cases where the unwinder is dynamically linked and libc contains compact unwind info. Therefore, deviate from the EHABI here and stop emitting these relocations on Android. Differential Revision: https://reviews.llvm.org/D70027
* [Hexagon] Update PS_aligna with max stack alignment once isel completesKrzysztof Parzyszek2019-11-122-3/+18
|
* [Hexagon] Fix vector spill expansion to use proper alignmentKrzysztof Parzyszek2019-11-124-105/+173
| | | | | | | | | | 1. Add pseudos PS_vloadrv_ai and PS_vstorerv_ai: those are now used for single vector registers in loadRegFromStackSlot (and store...). 2. Remove pseudos PS_vloadrwu_ai and PS_vstorerwu_ai. The alignment is now checked when expanding spill pseudos (both in frame lowering and in expand-post-ra-pseudos), and a proper instruction is generated. 3. Update MachineMemOperands when dealigning vector spill slots. 4. Return vector predicate registers in getCallerSavedRegs.
* [Hexagon] Convert stack object offsets to int64, NFCKrzysztof Parzyszek2019-11-121-1/+1
| | | | This will print [SP-56] instead of [SP+4294967240].
* [Hexagon] Handle stack realignment in hexagon-vextractKrzysztof Parzyszek2019-11-121-7/+37
|
* [Hexagon] Require PS_aligna whenever variable-sized objects are presentKrzysztof Parzyszek2019-11-121-3/+3
|
* [PowerPC][NFC]Fix typo in desc for enable-ppc-prefetchingJinsong Ji2019-11-121-1/+1
|
* [AArch64ExpandPseudos] Preserve renamable state when expanding MOVi64 & co.Florian Hahn2019-11-121-2/+6
| | | | | | | | | | | If the MOVi operand was renamable, the operands of the expanded instructions are also renamable. Reviewers: thegameg, samparker, zatrazz Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D70061
* [X86] Update stale comment. NFCCraig Topper2019-11-111-2/+2
|
* AMDGPU/SI: make ~SIScheduleBlockCreator trivialFangrui Song2019-11-112-6/+2
|
* [X86] Remove setOperationAction lines that say to promote MVT::i1Craig Topper2019-11-111-6/+0
| | | | | | | | MVT::i1 should be removed by type legalization before we reach any code that would act on the promote action. Mainly to avoid replicating this for strict FP versions of these operations.
* [X86] Remove some else branches after checking for !useSoftFloat() that set ↵Craig Topper2019-11-111-9/+0
| | | | | | | | | operations to Expand. If we're using soft floats, then these operations shoudl be softened during type legalization. They'll never get to LegalizeVectorOps or LegalizeDAG so they don't need to be Expanded there.
* [PowerPC][XCOFF] Add support for zero initialized global values.Sean Fertile2019-11-111-1/+1
| | | | | | | | For XCOFF, globals mapped into the .bss section are linked as COMMON definitions. This behaviour is incorrect for zero initialized data, so emit those to the .data section instead. Differential Revision: https://reviews.llvm.org/D69528
* [AArch64] Update for ExynosEvandro Menezes2019-11-111-1/+1
| | | | Fix the costs of FP register moves.
* [AArch64] Add new scheduling predicatesEvandro Menezes2019-11-112-1/+74
| | | | Add new scheduling predicates to identify more ASIMD forms.
* [CGP] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominatorsYi-Hong Lyu2019-11-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example: long long test(long long a, long long b) { if (a << b > 0) return b; if (a << b < 0) return a; return a*b; } Produces: sld. 5, 3, 4 ble 0, .LBB0_2 mr 3, 4 blr .LBB0_2: # %if.end cmpldi 5, 0 li 5, 1 isel 4, 4, 5, 2 mulld 3, 4, 3 blr But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already contains the result of that comparison). The root cause of this is that LLVM converts signed comparisons into equality comparison based on dominance. Equality comparisons are unsigned by default, so we get either a record-form or cmp (without the l for logical) feeding a cmpl. That is the situation we want to avoid here. Differential Revision: https://reviews.llvm.org/D60506
* [PowerPC] Implementing overflow version for XO-Form instructionsStefan Pintile2019-11-113-68/+128
| | | | | | | | | | | | | | | The Overflow version of XO-Form instruction uses the SO, OV and OV32 special registers. This changes modifies existing multiclasses and instruction definitions to allow for the use of the XER register to record the various types if overflow from possible add, subtract and multiply instructions. It then modifies the existing instructions as to use these multiclasses as needed. Patch By: Kamau Bridgeman Differential Revision: https://reviews.llvm.org/D66902
* AArch64FunctionInfo - fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-11-111-6/+6
|
* Fix -Wcovered-switch-default warning. NFCI.Simon Pilgrim2019-11-111-2/+1
|
* Use MCRegister in copyPhysRegMatt Arsenault2019-11-1144-98/+98
|
* [AArch64][SVE] Spilling/filling of SVE callee-saves.Sander de Smalen2019-11-116-39/+296
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the spills/fills of callee-saved SVE registers using STR and LDR instructions. Also adds the `aarch64_sve_vector_pcs` attribute to specify the callee-saved registers to be used for functions that return SVE vectors or take SVE vectors as arguments. The callee-saved registers are vector registers z8-z23 and predicate registers p4-p15. The overal frame-layout with SVE will be as follows: +-------------+ | stack args | +-------------+ | Callee Saves| | X29, X30 | |-------------| <- FP | SVE Callee | < ////////////// | saved regs | < ////////////// | z23 | < ////////////// | : | < // SCALABLE // | z8 | < ////////////// | p15 | < /// STACK //// | : | < ////////////// | p4 | < //// AREA //// +-------------+ < ////////////// | : | < ////////////// | SVE locals | < ////////////// | : | < ////////////// +-------------+ |/////////////| alignment gap. | : | | Stack objs | | : | +-------------+ <- SP after call and frame-setup Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D68996
* [RISCV] Fix CFA when doing split sp adjustment with fpLuís Marques2019-11-101-15/+25
| | | | | | | | | | | | | Summary: When using the split sp adjustment and using the frame-pointer we were still emitting CFI CFA directives based on the sp value. The final sp-based offset also didn't reflect the two-stage sp adjust. There remain CFI issues that aren't related to the split sp adjustment, and thus will be addressed in a separate patch. Reviewers: asb, lenary, shiva0217 Reviewed By: lenary, shiva0217 Tags: #llvm Differential Revision: https://reviews.llvm.org/D69385
* [RISCV][NFC] Add CFI-related testsLuís Marques2019-11-101-0/+2
| | | | | | | | | | Summary: Adds tests necessary to properly show the impact of other patches that affect the emission of CFI directives. Reviewers: asb, lenary Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D69721
* [X86] Handle MO_ConstantPoolIndex in X86AsmPrinter::PrintOperandCraig Topper2019-11-091-0/+1
| | | | Fixes PR43952
* Remove duplicate MemVT to fix shadow variable warning. NFCI.Simon Pilgrim2019-11-091-1/+0
|
* Remove superfluous break after return. NFC.Simon Pilgrim2019-11-091-2/+0
|
* Fix shadow variable warning by reducing scope of CC/InverseCC CondCodes. NFCI.Simon Pilgrim2019-11-091-3/+3
|
* [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine ↵joanlluch2019-11-082-0/+15
| | | | | | | | | | | | | | | | | | | | | (3) (baseline tests) Summary: This is baseline tests for D69326 Incorporates a command line flag for the MSP430 and adds a test cases to help showing the effects of applying D69326 More details and motivation for this patch in D69326 Reviewers: spatel, asl, lebedev.ri Reviewed By: spatel, asl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69975
* [AArch64][X86] Don't assume __powidf2 is available on Windows.Eli Friedman2019-11-083-56/+17
| | | | | | | | | | We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013
* [PowerPC] Remove redundant CRSET/CRUNSET in custom lowering of known CR bit ↵Yi-Hong Lyu2019-11-083-3/+37
| | | | | | | | | | | | | | | | | | | | | | spills We lower known CR bit spills (CRSET/CRUNSET) to load and spill the known value but forgot to remove the redundant spills. e.g., This sequence was used to spill a CRUNSET: crclr 4*cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) Custom lowering of known CR bit spills lower it to: crxor 4*cr5+lt, 4*cr5+lt, 4*cr5+lt li r3,0 stw r3,132(r1) crxor is redundant if there is no use of 4*cr5+lt so we should remove it Differential revision: https://reviews.llvm.org/D67722
OpenPOWER on IntegriCloud