bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[LiveInterval] Allow updating subranges with slightly out-dated IR	Quentin Colombet	2019-11-13	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During register coalescing, we update the live-intervals on-the-fly. To do that we are in this strange mode where the live-intervals can be slightly out-of-sync (more precisely they are forward looking) compared to what the IR actually represents. This happens because the register coalescer only updates the IR when it is done with updating the live-intervals and it has to do it this way because updating the IR on-the-fly would actually clobber some information on how the live-ranges that are being updated look like. This is problematic for updates that rely on the IR to accurately represents the state of the live-ranges. Right now, we have only one of those: stripValuesNotDefiningMask. To reconcile this need of out-of-sync IR, this patch introduces a new argument to LiveInterval::refineSubRanges that allows the code doing the live range updates to reason about how the code should look like after the coalescer will have rewritten the registers. Essentially this captures how a subregister index with be offseted to match its position in a new register class. E.g., let say we want to merge: V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32> We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32> overlap, i.e., by choosing a class where we can find "offset + 1 == 3". Put differently we align V2's sub3 with V1's sub1: V2: sub0 sub1 sub2 sub3 V1: <offset> sub0 sub1 This offset will look like a composed subregidx in the the class: V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> Now if we didn't rewrite the uses and def of V1, all the checks for V1 need to account for this offset to match what the live intervals intend to capture. Prior to this patch, we would fail to recognize the uses and def of V1 and would end up with machine verifier errors: No live segment at def. This could lead to miscompile as we would drop some live-ranges and thus, miss some interferences. For this problem to trigger, we need to reach stripValuesNotDefiningMask while having a mismatch between the IR and the live-ranges (i.e., we have to apply a subreg offset to the IR.) This requires the following three conditions: 1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1> 2. An update with Tuple registers with a possibility to coalesce the subreg index: e.g., v1.dsub_1 == v2.dsub_3 3. Subreg liveness enabled. looking at the IR to decide what is alive and what is not, i.e., calling stripValuesNotDefiningMask. coalescer maintains for the live-ranges information. None of the targets that currently use subreg liveness (i.e., the targets that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and and #2, so this patch also artificial enables subreg liveness for ARM, so that a nice test case can be attached.
*	[AArch64][v8.3a] Add missing imp-defs on RETA*.	Ahmed Bougacha	2019-11-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	RETA always implicitly uses LR, unlike RET which merely has an alias that defaults it to LR. Additionally, RETA implicitly uses SP as well, which it uses as a discriminator to authenticate LR. This isn't usually noticeable, because RET_ReallyLR is used in most of the backend. However, the post-RA scheduler, if enabled, will cause miscompiles if the imp-uses are missing. While there, fix a typo in the lone affected testcase.
*	[AArch64][v8.3a] Add LDRA '[xN]!' alias.	Ahmed Bougacha	2019-11-13	1	-0/+3
\| \| \| \| \| \|	The instruction definition has been retroactively expanded to allow for an alias for '[xN, 0]!' as '[xN]!'. That wouldn't make sense on LDR, but does for LDRA.
*	Fix comment spelling {addresing -> addressing} (NFC)	Matthew Malcomson	2019-11-13	1	-1/+1
\|
*	PowerPC - fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-11-13	3	-7/+7
\|
*	Fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-13	1	-1/+1
\|
*	Fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-13	1	-1/+1
\|
*	Fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-13	1	-1/+1
\|
*	Sparc - fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-11-13	3	-3/+3
\|
*	PPCReduceCRLogicals - fix static analyzer warnings. NFC	Simon Pilgrim	2019-11-13	1	-6/+6
\| \| \| \| \|	- Fix uninitialized variable warnings. - Fix null dereference warnings.
*	Revert "[RISCV] Fix wrong CFI directives"	Luís Marques	2019-11-13	1	-0/+55
\| \| \| \|	test/DebugInfo/RISCV/relax-debug-frame.ll wasn't properly updated.
*	[ARM][MVE] canTailPredicateLoop	Sjoerd Meijer	2019-11-13	2	-9/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845
*	[RISCV] Fix wrong CFI directives	Luís Marques	2019-11-13	1	-55/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removes CFI CFA directives that could incorrectly propagate beyond the basic block they were inteded for. Specifically it removes the epilogue CFI directives. See the branch_and_tail_call test for an example of the issue. Should fix the stack unwinding issues caused by the incorrect directives. Reviewers: asb, lenary, shiva0217 Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D69723
*	[X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS ↵	Simon Pilgrim	2019-11-13	1	-20/+24
\| \| \| \| \| \| \| \|	instructions These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting. In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.
*	[Mips] Add rematerialization support for ldi.fmt	Mirko Brkusanin	2019-11-13	1	-0/+1
\| \| \| \| \| \| \|	Instruction ldi.fmt can be considered cheap enough to avoid spill and restore of value that it produces since it's loaded from immediate. Differential Revision: https://reviews.llvm.org/D69898
*	[mips] Show an error if 64-bit target triple provided with 32-bit CPU	Simon Atanasyan	2019-11-13	1	-0/+4
\| \| \| \| \| \| \| \| \|	When a 64-bit triple is used emit an error if the CPU only supports 32-bit code. Patch by Miloš Stojanović. Differential Revision: https://reviews.llvm.org/D70018
*	[AArch64] Extend storeRegToStackSlot to spill SVE registers.	Sander de Smalen	2019-11-13	2	-0/+26
\| \| \| \| \| \| \| \| \| \|	This patch allows the register allocator to spill SVE registers to the stack. Reviewers: ostannard, efriedma, rengolin, cameron.mcinally Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70082
*	[AArch64][SVE] Allocate locals that are scalable vectors.	Sander de Smalen	2019-11-13	2	-10/+45
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a target interface to set the StackID for a given type, which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a 'sve-vec' StackID, so it is allocated in the SVE area of the stack frame. Reviewers: ostannard, efriedma, rengolin, cameron.mcinally Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70080
*	[ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.	Simon Tatham	2019-11-13	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MVE includes instructions that extract an 8- or 16-bit lane from a vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td` already included isel patterns to select those instructions in response to the `ARMISD::VGETLANEs` selection-DAG node type. But `ARMISD::VGETLANEs` was never actually generated, because the code that creates it was conditioned on NEON only. It's an easy fix to enable the same code for integer MVE, and now IR that sign-extends the result of an extractelement (whether explicitly or as part of the function call ABI) will use `vmov.s8` instead of `vmov.u8` followed by `sxtb`. Reviewers: SjoerdMeijer, dmgreen, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70132
*	[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)	joanlluch	2019-11-13	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replaces ``` unsigned getShiftAmountThreshold(EVT VT) ``` by ``` bool shouldAvoidTransformToShift(EVT VT, unsigned amount) ``` thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not. Updates the MSP430 target with a custom implementation. This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this. Existing tests apply, a few more have been added. Reviewers: asl, spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70042
*	[X86] Remove setOperationAction for FP_TO_SINT v8i16.	Craig Topper	2019-11-12	2	-8/+3
\| \| \| \| \| \| \| \|	This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
*	AMDGPU: Extend add x, (ext setcc) combine to sub	Matt Arsenault	2019-11-13	1	-0/+22
\| \| \| \| \| \|	This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.
*	AMDGPU: Switch backend default max workgroup size to 1024	Matt Arsenault	2019-11-13	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
*	AMDGPU Reduce reported maximum group size to 1024	Matt Arsenault	2019-11-13	1	-1/+2
\| \| \| \| \|	While some targets allow encoding 2048, this was never tested or supported.
*	[X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type.	Craig Topper	2019-11-12	1	-25/+47
\| \| \| \| \|	This avoids some nasty issues with argument passing and lowering of arbitrary v64i8 shuffles.
*	[X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 ↵	Craig Topper	2019-11-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	type won't be split by prefer-vector-width=256 Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16. In reality this shouldn't happen because it means we have a 512-bit vector argument, but min-legal-vector-width says a value less than 512. But a 512-bit argument should have been factored into the preferred vector width.
*	[BPF] generate BTF_KIND_VARs for all non-static globals	Yonghong Song	2019-11-12	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \|	Enable to generate BTF_KIND_VARs for non-static default-section globals which is not allowed previously. Modified the existing test case to accommodate the new change. Also removed unused linkage enum members VAR_GLOBAL_TENTATIVE and VAR_GLOBAL_EXTERNAL. Differential Revision: https://reviews.llvm.org/D70145
*	[AArch64] Update for Exynos	Evandro Menezes	2019-11-12	2	-10/+26
\| \| \| \|	Fix the modeling for loads and stores using the register offset addresing mode.
*	[AArch64] Fix addressing mode predicates	Evandro Menezes	2019-11-12	1	-3/+5
\| \| \| \|	Fix predicates related to the register offset addressing mode.
*	ARM: Don't emit R_ARM_NONE relocations to compact unwinding decoders in ↵	Peter Collingbourne	2019-11-12	2	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.ARM.exidx on Android. These relocations are specified by the ARM EHABI (section 6.3). As I understand it, their purpose is to accommodate unwinder implementations that wish to reduce code size by placing the implementations of the compact unwinding decoders in a separate translation unit, and using extern weak symbols to refer to them from the main unwinder implementation, so that they are only linked when something in the binary needs them in order to unwind. However, neither of the unwinders used on Android (libgcc, LLVM libunwind) use this technique, and in fact emitting these relocations ends up being counterproductive to code size because they cause a copy of the unwinder to be statically linked into most binaries, regardless of whether it is actually needed. Furthermore, these relocations create circular dependencies (between libc and the unwinder) in cases where the unwinder is dynamically linked and libc contains compact unwind info. Therefore, deviate from the EHABI here and stop emitting these relocations on Android. Differential Revision: https://reviews.llvm.org/D70027
*	[Hexagon] Update PS_aligna with max stack alignment once isel completes	Krzysztof Parzyszek	2019-11-12	2	-3/+18
\|
*	[Hexagon] Fix vector spill expansion to use proper alignment	Krzysztof Parzyszek	2019-11-12	4	-105/+173
\| \| \| \| \| \| \| \| \| \|	1. Add pseudos PS_vloadrv_ai and PS_vstorerv_ai: those are now used for single vector registers in loadRegFromStackSlot (and store...). 2. Remove pseudos PS_vloadrwu_ai and PS_vstorerwu_ai. The alignment is now checked when expanding spill pseudos (both in frame lowering and in expand-post-ra-pseudos), and a proper instruction is generated. 3. Update MachineMemOperands when dealigning vector spill slots. 4. Return vector predicate registers in getCallerSavedRegs.
*	[Hexagon] Convert stack object offsets to int64, NFC	Krzysztof Parzyszek	2019-11-12	1	-1/+1
\| \| \| \|	This will print [SP-56] instead of [SP+4294967240].
*	[Hexagon] Handle stack realignment in hexagon-vextract	Krzysztof Parzyszek	2019-11-12	1	-7/+37
\|
*	[Hexagon] Require PS_aligna whenever variable-sized objects are present	Krzysztof Parzyszek	2019-11-12	1	-3/+3
\|
*	[PowerPC][NFC]Fix typo in desc for enable-ppc-prefetching	Jinsong Ji	2019-11-12	1	-1/+1
\|
*	[AArch64ExpandPseudos] Preserve renamable state when expanding MOVi64 & co.	Florian Hahn	2019-11-12	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	If the MOVi operand was renamable, the operands of the expanded instructions are also renamable. Reviewers: thegameg, samparker, zatrazz Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D70061
*	[X86] Update stale comment. NFC	Craig Topper	2019-11-11	1	-2/+2
\|
*	AMDGPU/SI: make ~SIScheduleBlockCreator trivial	Fangrui Song	2019-11-11	2	-6/+2
\|
*	[X86] Remove setOperationAction lines that say to promote MVT::i1	Craig Topper	2019-11-11	1	-6/+0
\| \| \| \| \| \| \| \|	MVT::i1 should be removed by type legalization before we reach any code that would act on the promote action. Mainly to avoid replicating this for strict FP versions of these operations.
*	[X86] Remove some else branches after checking for !useSoftFloat() that set ↵	Craig Topper	2019-11-11	1	-9/+0
\| \| \| \| \| \| \| \| \|	operations to Expand. If we're using soft floats, then these operations shoudl be softened during type legalization. They'll never get to LegalizeVectorOps or LegalizeDAG so they don't need to be Expanded there.
*	[PowerPC][XCOFF] Add support for zero initialized global values.	Sean Fertile	2019-11-11	1	-1/+1
\| \| \| \| \| \| \| \|	For XCOFF, globals mapped into the .bss section are linked as COMMON definitions. This behaviour is incorrect for zero initialized data, so emit those to the .data section instead. Differential Revision: https://reviews.llvm.org/D69528
*	[AArch64] Update for Exynos	Evandro Menezes	2019-11-11	1	-1/+1
\| \| \| \|	Fix the costs of FP register moves.
*	[AArch64] Add new scheduling predicates	Evandro Menezes	2019-11-11	2	-1/+74
\| \| \| \|	Add new scheduling predicates to identify more ASIMD forms.
*	[CGP] Make ICMP_EQ use CR result of ICMP_S(L\|G)T dominators	Yi-Hong Lyu	2019-11-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For example: long long test(long long a, long long b) { if (a << b > 0) return b; if (a << b < 0) return a; return a*b; } Produces: sld. 5, 3, 4 ble 0, .LBB0_2 mr 3, 4 blr .LBB0_2: # %if.end cmpldi 5, 0 li 5, 1 isel 4, 4, 5, 2 mulld 3, 4, 3 blr But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already contains the result of that comparison). The root cause of this is that LLVM converts signed comparisons into equality comparison based on dominance. Equality comparisons are unsigned by default, so we get either a record-form or cmp (without the l for logical) feeding a cmpl. That is the situation we want to avoid here. Differential Revision: https://reviews.llvm.org/D60506
*	[PowerPC] Implementing overflow version for XO-Form instructions	Stefan Pintile	2019-11-11	3	-68/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Overflow version of XO-Form instruction uses the SO, OV and OV32 special registers. This changes modifies existing multiclasses and instruction definitions to allow for the use of the XER register to record the various types if overflow from possible add, subtract and multiply instructions. It then modifies the existing instructions as to use these multiclasses as needed. Patch By: Kamau Bridgeman Differential Revision: https://reviews.llvm.org/D66902
*	AArch64FunctionInfo - fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-11-11	1	-6/+6
\|
*	Fix -Wcovered-switch-default warning. NFCI.	Simon Pilgrim	2019-11-11	1	-2/+1
\|
*	Use MCRegister in copyPhysReg	Matt Arsenault	2019-11-11	44	-98/+98
\|
*	[AArch64][SVE] Spilling/filling of SVE callee-saves.	Sander de Smalen	2019-11-11	6	-39/+296
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement the spills/fills of callee-saved SVE registers using STR and LDR instructions. Also adds the `aarch64_sve_vector_pcs` attribute to specify the callee-saved registers to be used for functions that return SVE vectors or take SVE vectors as arguments. The callee-saved registers are vector registers z8-z23 and predicate registers p4-p15. The overal frame-layout with SVE will be as follows: +-------------+ \| stack args \| +-------------+ \| Callee Saves\| \| X29, X30 \| \|-------------\| <- FP \| SVE Callee \| < ////////////// \| saved regs \| < ////////////// \| z23 \| < ////////////// \| : \| < // SCALABLE // \| z8 \| < ////////////// \| p15 \| < /// STACK //// \| : \| < ////////////// \| p4 \| < //// AREA //// +-------------+ < ////////////// \| : \| < ////////////// \| SVE locals \| < ////////////// \| : \| < ////////////// +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup Reviewers: cameron.mcinally, efriedma, greened, thegameg, ostannard, rengolin Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D68996