bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[MC] Delete MCFragment::isDummy. NFC	Fangrui Song	2020-01-05	1	-1/+1
\| \| \| \| \|	isa<...>, dyn_cast<...> and cast<...> are used by other fragments. Don't make MCDummyFragment special.
*	[X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.	Craig Topper	2020-01-05	1	-24/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956
*	[NFC] Modify the format:	Liu, Chen3	2020-01-06	1	-2/+1
\| \| \| \|	Drop the else since we alerady returned in the if.
*	[Coroutines] Remove corresponding phi values when apply ↵	Brian Gesiak	2020-01-05	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	simplifyTerminatorLeadingToRet Summary: In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret. However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720) This patch fix this issue Test Plan: cppcoro library with O3+flto check-llvm Reviewers: modocache, GorNishanov, lewissbaker Reviewed By: modocache Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71826 Patch by junparser (JunMa)!
*	[MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ↵	Fangrui Song	2020-01-05	3	-10/+6
\| \| \| \| \| \| \| \|	ARMELFObjectWriter::addTargetSectionFlags This simplifies the generic interface and also makes SHF_ARM_PURECODE more robust (fixes a TODO). Inspecting MCDataFragment contents covers more cases than MCObjectStreamer::EmitBytes.
*	[MC] Delete MCSection::{rbegin,rend}	Fangrui Song	2020-01-05	1	-2/+2
\|
*	[MC] Drop an unused rule about absolute temporary symbols	Fangrui Song	2020-01-05	1	-4/+0
\|
*	[X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes ↵	Simon Pilgrim	2020-01-05	1	-2/+13
\| \| \| \| \| \| \| \|	(PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
*	[X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.	Simon Pilgrim	2020-01-05	1	-62/+62
\| \| \| \|	Updates function order in preparation of future fix for PR43660
*	[X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END ↵	Simon Pilgrim	2020-01-05	2	-27/+4
\| \| \| \| \| \|	(NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
*	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors	David Green	2020-01-05	5	-18/+45
\| \| \| \| \| \| \| \| \| \| \|	This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139
*	[ARM] Fill in FP16 FMA patterns	David Green	2020-01-05	1	-0/+21
\| \| \| \| \| \|	This adds fp16 variants of all the fma patterns in the ARM backend. Differential Revision: https://reviews.llvm.org/D72138
*	[LegalizeVectorOps][X86] Enable expansion of vector fp_to_uint in ↵	Craig Topper	2020-01-04	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	LegalizeVectorOps to avoid scalarization. The code here isn't great in all caess. Particularly v4f64->v4i32 on 64-bit AVX targets. But there is some improvement in some configurations. There's definitely some issues with computeNumSignBits with X86ISD::STRICT_FCMP. As well as not being able to propagate sign bits through merge_values nodes that get created during custom legalization.
*	[TargetLowering] In expandFP_TO_UINT, add proper extend or truncate for the ↵	Craig Topper	2020-01-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	condition to feed the DstVT select. Previously, for vectors we created a vselect with a condition that didn't match what the target wanted according to getSetCCResultType. To make up for this, X86 had a special DAG combine to detect if the condition was all sign bits and then insert its own truncate or extend. By adding the extend/truncate here explicitly we can avoid that.
*	[LegalizeVectorOps] Split most of ExpandStrictFPOp into a separate ↵	Craig Topper	2020-01-04	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \|	UnrollStrictFPOp method. Call that method from ExpandUINT_TO_FLOAT. ExpandStrictFPOp calls ExpandUINT_TO_FLOAT. Previously, ExpandUINT_TO_FLOAT returned SDValue() if it wasn't able to handle and needed to unroll. Then ExpandStrictFPOp would detect his SDValue() and do the unroll. After this change, ExpandUINT_TO_FLOAT will directly call UnrollStrictFPOp and return the unrolled result.
*	GlobalISel: Scalarize all division operations	Matt Arsenault	2020-01-04	2	-0/+10
\| \| \| \| \| \|	This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.
*	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."	Florian Hahn	2020-01-04	20	-19/+19
\| \| \| \| \|	This reverts commit 51ef53f3bd23559203fe9af82ff2facbfedc1db3, as it breaks some bots.
*	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).	Florian Hahn	2020-01-04	20	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \|	SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
*	[SCEV] Remove unused ScalarEvolutionExpander.h includes (NFC).	Florian Hahn	2020-01-04	4	-4/+0
\|
*	GlobalISel: Define G_READCYCLECOUNTER	Matt Arsenault	2020-01-04	1	-0/+2
\|
*	AMDGPU/GlobalISel: Refine SMRD selection rules	Matt Arsenault	2020-01-04	1	-4/+22
\| \| \| \| \|	Fix selecting these for volatile global loads, and ensure the loads are constant enough.
*	AMDGPU/GlobalISel: Legalize more odd sized loads	Matt Arsenault	2020-01-04	1	-5/+9
\| \| \| \| \|	The attempts to widen sufficently aligned, odd sized loads wasn't consistently applied.
*	AMDGPU/GlobalISel: Assume vcc phis for any vcc input	Matt Arsenault	2020-01-04	1	-2/+3
\| \| \| \| \| \| \|	This produces more intelligible looking results, more comparabble to the DAG output in the simplest cases. This is probably wrong in complex control flow, but RegBankSelect doesn't attempt analyzing if this is on a masked path for selecting the bank yet.
*	AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectly	Matt Arsenault	2020-01-04	1	-13/+23
\| \| \| \| \| \| \| \| \| \| \|	We're checking the current register bank of the registers in the instruction, but the mapping may have inserted cross bank copies and is expecting to replace the registers. We mostly get away with this currently, because VGPR->SGPR copies are illegal, and we assume this won't happen. In a future change, we'll start relying on more cross register bank copies being inserted, and this starts to break down.
*	[AMDGPU] need to insert wait between the scalar load and vector store to the ↵	alex-t	2020-01-04	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	same address to avoid WAR conflict. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D71934
*	[NFCI][InstCombine] Refactor 'sink negation into select if that folds one ↵	Roman Lebedev	2020-01-04	1	-40/+35
\| \| \| \| \| \| \| \| \| \|	hand of select to 0' fold I would think it's better than having two practically identical folds next to eachother, but then generalization isn't all that pretty due to the fact that we need to produce different `sub` each time.. This change is no-functional-changes-intended refactoring.
*	[InstCombine] Sink sub into hands of select if one hand becomes zero. Part 2 ↵	Roman Lebedev	2020-01-04	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR44426) This decreases use count of %Op0, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal) %Op0 = %TrueVal %o = select i1 %Cond, i8 %Op0, i8 %FalseVal %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %FalseVal %r = select i1 %Cond, i8 0, i8 %n Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0 %Op0 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op0 %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %TrueVal %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/aHRt https://bugs.llvm.org/show_bug.cgi?id=44426
*	[InstCombine] Sink sub into hands of select if one hand becomes zero (PR44426)	Roman Lebedev	2020-01-04	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This decreases use count of %Op1, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1) %Op1 = %TrueVal %o = select i1 %Cond, i8 %Op1, i8 %FalseVal %r = sub i8 %o, %Op1 => %n = sub i8 %FalseVal, %Op1 %r = select i1 %Cond, i8 0, i8 %n Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0 %Op1 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op1 %r = sub i8 %o, %Op1 => %n = sub i8 %TrueVal, %Op1 %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/avL https://bugs.llvm.org/show_bug.cgi?id=44426
*	[Transforms][GlobalSRA] huge array causes long compilation time and huge ↵	Alexey Lapshin	2020-01-04	1	-65/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memory usage. Summary: For artificial cases (huge array, few usages), Global SRA optimization creates a lot of redundant data. It creates an instance of GlobalVariable for each array element. For huge array, that means huge compilation time and huge memory usage. Following example compiles for 10 minutes and requires 40GB of memory. namespace { char LargeBuffer[64 * 1024 * 1024]; } int main ( void ) { LargeBuffer[0] = 0; printf("\n "); return LargeBuffer[0] == 0; } The fix is to avoid Global SRA for large arrays. Reviewers: craig.topper, rnk, efriedma, fhahn Reviewed By: rnk Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71993
*	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits ↵	Simon Pilgrim	2020-01-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887
*	[AMDGPU] Revert scheduling to reduce spilling	Stanislav Mekhanoshin	2020-01-03	1	-2/+11
\| \| \| \| \| \| \| \| \| \|	We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule. Differential Revision: https://reviews.llvm.org/D72181
*	GlobalISel: Add type argument to getRegBankFromRegClass	Matt Arsenault	2020-01-03	11	-27/+38
\| \| \| \| \| \|	AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
*	[amdgpu] Skip non-instruction values in CF user tracing.	Michael Liao	2020-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - CF users won't be non-instruction values. Skip them to save the compilation time. It's especially true when there are multiple functions in that module, where, says, a constant may be used in most functions. The current CF user tracing adds significant overhead. Reviewers: alex-t, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72174
*	[NFC][ORC] Fix typos and whitespaces in comments	Stefan Gränitz	2020-01-03	1	-1/+1
\|
*	[SystemZ] Don't allow CL option -mpacked-stack with -mbackchain.	Jonas Paulsson	2020-01-03	1	-0/+2
\| \| \| \| \| \| \|	-mpacked-stack is currently not supported with -mbackchain, so this should result in a compilation error message instead of being silently ignored. Review: Ulrich Weigand
*	AMDGPU/GlobalISel: Add new utils file	Matt Arsenault	2020-01-03	4	-33/+77
\| \| \| \| \| \|	There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.
*	AMDGPU: Only allow regs for s_movrel_{b32\|b64}	Matt Arsenault	2020-01-03	1	-2/+13
\| \| \| \| \|	This would incorrectly allowing folding immediates. These currently aren't selectable, but will be from GlobalISel soon.
*	[DAGCombiner] fix miscompile in translating (X & undef) to shuffle	Sanjay Patel	2020-01-03	1	-1/+3
\| \| \| \| \|	See PR42982 for more context: https://bugs.llvm.org/show_bug.cgi?id=42982
*	[LegalizeVectorOps] Pass the post-UpdateNodeOperands version of Op to ↵	Craig Topper	2020-01-03	1	-11/+14
\| \| \| \| \| \| \| \| \| \|	ExpandLoad/ExpandStore UpdateNodeOperands might CSE to another existing node. So we should make sure we're legalizing that node otherwise we might fail to hook up the operands properly. I've moved the result registration up to the caller to avoid having to pass both Result and Op into the functions where it might be confusing which is which. This address 2 other issues pointed out in D71861. Differential Revision: https://reviews.llvm.org/D72021
*	[X86] Improve for v2i32->v2f64 uint_to_fp	Craig Topper	2020-01-03	1	-36/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
*	Move tail call disabling code to target independent code	Reid Kleckner	2020-01-03	10	-43/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
*	[NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427)	Roman Lebedev	2020-01-03	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This decreases use count of Op1, potentially allows us to further hoist said 'neg' later on, and results in marginally better X86 codegen. Name: (Op1 & С) - Op1 -> -(Op1 & ~C) %o = and i64 %Op1, C1 %r = sub i64 %o, %Op1 => %n = and i64 %Op1, ~C1 %r = sub i64 0, %n https://rise4fun.com/Alive/rwgA https://godbolt.org/z/R_RMfM https://bugs.llvm.org/show_bug.cgi?id=44427
*	[DWARF] Don't assume optional always has a value.	Jonas Devlieghere	2020-01-03	1	-1/+4
\| \| \| \| \| \| \| \|	When getting the file name form the line table prologue we assume that a valid string form value can always be extracted as a string. If you look at the implementation of DWARFormValue this is not necessarily true. I hit this assertion from LLDB when I create a "dummy" DWARFContext that was missing the string section.
*	[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448)	Roman Lebedev	2020-01-03	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448) %negy = sub i8 0, %y %unbiasedx = and i8 %negy, %x %r = sub i8 %unbiasedx, %x => %ymask = add i8 %y, -1 %xmasked = and i8 %ymask, %x %r = sub i8 0, %xmasked https://rise4fun.com/Alive/OIpla This decreases use count of %x, may allow us to later hoist said negation even further, and results in marginally nicer X86 codegen. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499
*	[Attributor][FIX] Allow dead users of rewritten function	Johannes Doerfert	2020-01-03	1	-13/+15
\| \| \| \| \| \| \|	If we replace a function with a new one because we rewrite the signature, dead users may still refer to the old version. With this patch we reuse the code that deals with dead functions, which the old versions are, to avoid problems.
*	[Attributor][NFC] Unify the way we delete dead functions	Johannes Doerfert	2020-01-03	1	-13/+14
\|
*	[Attributor][FIX] Don't crash on ptr2int/int2ptr instructions	Johannes Doerfert	2020-01-03	1	-1/+2
\| \| \| \| \|	An integer isn't allowed in getAlignmentForValue so we need to stop at a ptr2int instruction during exploration.
*	[Attributor][FIX] Do not derive nonnull and dereferenceable w/o access	Johannes Doerfert	2020-01-03	1	-14/+0
\| \| \| \| \| \| \|	An inbounds GEP results in poison if the value is not "inbounds", not in UB. We accidentally derived nonnull and dereferenceable from these inbounds GEPs even in the absence of accesses that would make the poison to UB.
*	[Attributor][FIX] Return CHANGED once a pessimistic fixpoint is reached.	Johannes Doerfert	2020-01-03	1	-1/+2
\|
*	AMDGPU/GlobalISel: Fix off by one in operand index	Matt Arsenault	2020-01-03	1	-4/+4
\| \| \| \|	This should be looking at the RHS of the add for a constant.