bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane	Matt Arsenault	2019-07-01	2	-0/+82
\| \| \| \|	llvm-svn: 364801
*	AMDGPU/GlobalISel: Implement select for 32-bit G_ADD	Tom Stellard	2019-07-01	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797
*	[ARM] Fix MVE_VQxDMLxDH instruction class	Mikhail Maltsev	2019-07-01	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and VQRDMLSDH instructions handle their results as follows: "The base variant writes the results into the lower element of each pair of elements in the destination register, whereas the exchange variant writes to the upper element in each pair". I.e., the initial content of the output register affects the result, as usual, we model this with an additional input. Also, for 32-bit variants Qd is not allowed to be the same register as Qm and Qn, we use @earlyclobber to indicate this. This patch also changes vpred_r to vpred_n because the instructions don't have an explicit 'inactive' operand. Reviewers: dmgreen, ostannard, simon_tatham Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64007 llvm-svn: 364796
*	AMDGPU/GlobalISel: Select G_BRCOND for vcc	Matt Arsenault	2019-07-01	2	-25/+44
\| \| \| \|	llvm-svn: 364795
*	[ARM] MVE: support QQPRRegClass and QQQQPRRegClass	Mikhail Maltsev	2019-07-01	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: QQPRRegClass and QQQQPRRegClass are used by the interleaving/deinterleaving loads/stores to represent sequences of consecutive SIMD registers. Reviewers: ostannard, simon_tatham, dmgreen Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64009 llvm-svn: 364794
*	[InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)	Roman Lebedev	2019-07-01	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To be noted, this pattern is not unhandled by instcombine per-se, it is somehow does end up being folded when one runs opt -O3, but not if it's just -instcombine. Regardless, that fold is indirect, depends on some other folds, and is thus blind when there are extra uses. This does address the regression being exposed in D63992. https://godbolt.org/z/7DGltU https://rise4fun.com/Alive/EPO0 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 \| PR42459 ]] Reviewers: spatel, nikic, huihuiz Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63993 llvm-svn: 364792
*	[InstCombine] Shift amount reassociation in bittest (PR42399)	Roman Lebedev	2019-07-01	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Given pattern: `icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0` we should move shifts to the same hand of 'and', i.e. rewrite as `icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)` It might be tempting to not restrict this to situations where we know we'd fold two shifts together, but i'm not sure what rules should there be to avoid endless combine loops. We pick the same shift that was originally used to shift the variable we picked to shift: https://rise4fun.com/Alive/6x1v Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 \| PR42399]]. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63829 llvm-svn: 364791
*	[Hexagon] Custom-lower UADDO(x, 1) and USUBO(x, 1)	Krzysztof Parzyszek	2019-07-01	2	-2/+42
\| \| \| \|	llvm-svn: 364790
*	AMDGPU/GlobalISel: Select G_FRAME_INDEX	Matt Arsenault	2019-07-01	2	-0/+19
\| \| \| \|	llvm-svn: 364789
*	AMDGPU/GFX10: fix scratch resource descriptor	Nicolai Haehnle	2019-07-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788
*	AMDGPU/GlobalISel: Make s16 select legal	Matt Arsenault	2019-07-01	2	-7/+9
\| \| \| \| \| \| \|	This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787
*	AMDGPU/GlobalISel: Select G_BRCOND for scc conditions	Matt Arsenault	2019-07-01	2	-0/+34
\| \| \| \|	llvm-svn: 364786
*	AMDGPU/GlobalISel: Tolerate copies with no type set	Matt Arsenault	2019-07-01	1	-3/+6
\| \| \| \| \| \| \|	isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784
*	AMDGPU/GlobalISel: Select src modifiers	Matt Arsenault	2019-07-01	2	-6/+43
\| \| \| \|	llvm-svn: 364782
*	Fixup r364512	Diana Picus	2019-07-01	1	-10/+12
\| \| \| \| \| \| \| \| \| \|	Fix stack-use-after-scope errors from r364512. One instance was already fixed in r364611 - this patch simplifies that fix and addresses one more instance of similar code. Discussed in: https://reviews.llvm.org/D63905 llvm-svn: 364778
*	[Hexagon] Rework VLCR algorithm	Krzysztof Parzyszek	2019-07-01	1	-59/+161
\| \| \| \| \| \| \| \|	Add code to catch pattern for commutative instructions for VLCR. Patch by Suyog Sarda. llvm-svn: 364770
*	AMDGPU: Convert some places to Register	Matt Arsenault	2019-07-01	2	-9/+10
\| \| \| \|	llvm-svn: 364769
*	AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE	Matt Arsenault	2019-07-01	1	-0/+1
\| \| \| \|	llvm-svn: 364768
*	AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR	Matt Arsenault	2019-07-01	1	-1/+2
\| \| \| \|	llvm-svn: 364767
*	AMDGPU/GlobalISel: Fail on store to 32-bit address space	Matt Arsenault	2019-07-01	1	-0/+6
\| \| \| \|	llvm-svn: 364766
*	AMDGPU/GlobalISel: Improve icmp selection coverage.	Matt Arsenault	2019-07-01	2	-13/+38
\| \| \| \| \| \|	Select s64 eq/ne scalar icmp. llvm-svn: 364765
*	AMDGPU/GlobalISel: RegBankSelect for WWM/WQM	Matt Arsenault	2019-07-01	1	-0/+2
\| \| \| \|	llvm-svn: 364763
*	AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote	Matt Arsenault	2019-07-01	1	-1/+1
\| \| \| \|	llvm-svn: 364762
*	AMDGPU/GlobalISel: Fix scc->vcc copy handling	Matt Arsenault	2019-07-01	2	-13/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
*	AMDGPU/GlobalISel: Use and instead of BFE with inline immediate	Matt Arsenault	2019-07-01	1	-6/+29
\| \| \| \| \| \| \|	Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
*	[mips] Add missing schedinfo for MSA and ASE instructions	Simon Atanasyan	2019-07-01	3	-2/+12
\| \| \| \|	llvm-svn: 364757
*	[mips] Add missing schedinfo for atomic instructions	Simon Atanasyan	2019-07-01	2	-3/+22
\| \| \| \|	llvm-svn: 364756
*	[mips] Add missing schedinfo for ADJCALLSTACKDOWN, ADJCALLSTACKUP	Simon Atanasyan	2019-07-01	1	-1/+1
\| \| \| \|	llvm-svn: 364755
*	[AMDGPU] Call isLoopExiting for blocks in the loop.	Florian Hahn	2019-07-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750
*	[RISCV] Add break; to the last switch case	Fangrui Song	2019-07-01	1	-0/+1
\| \| \| \| \| \|	As suggested by jrtc27 in the post-commit review of D60528. llvm-svn: 364746
*	[X86] CombineShuffleWithExtract - updated description comments. NFCI.	Simon Pilgrim	2019-07-01	1	-4/+4
\| \| \| \| \| \|	CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size. llvm-svn: 364745
*	[SelectionDAG] Do minnum->minimum at legalization time instead of building time	Benjamin Kramer	2019-07-01	2	-16/+17
\| \| \| \| \| \| \| \|	The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743
*	[InstCombine] Omit 'urem' where possible	Roman Lebedev	2019-07-01	1	-4/+20
\| \| \| \| \| \| \| \|	This was added in D63390 / rL364286 to backend, but it makes sense to also handle it in middle-end. https://rise4fun.com/Alive/Zsln llvm-svn: 364738
*	[DebugInfo] Avoid adding too much indirection to pointer-valued variables	Jeremy Morse	2019-07-01	2	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses PR41675, where a stack-pointer variable is dereferenced too many times by its location expression, presenting a value on the stack as the pointer to the stack. The difference between a stack pointer DBG_VALUE and one that refers to a value on the stack, is currently the indirect flag. However the DWARF backend will also try to guess whether something is a memory location or not, based on whether there is any computation in the location expression. By simply prepending the stack offset to existing expressions, we can accidentally convert a register location into a memory location, which introduces a suprise (and unintended) dereference. The solution is to add DW_OP_stack_value whenever we add a DIExpression computation to a stack pointer. It's an implicit location computed on the expression stack, thus needs to be flagged as a stack_value. For the edge case where the offset is zero and the location could be a register location, DIExpression::prepend will still generate opcodes, and thus DW_OP_stack_value must still be added. Differential Revision: https://reviews.llvm.org/D63429 llvm-svn: 364736
*	[SimpleLoopUnswitch] Implement handling of prof branch_weights metadata for ↵	Yevgeny Rouban	2019-07-01	1	-17/+39
\| \| \| \| \| \| \| \|	SwitchInst Differential Revision: https://reviews.llvm.org/D60606 llvm-svn: 364734
*	[ARM] WLS/LE Code Generation	Sam Parker	2019-07-01	8	-28/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733
*	[X86] Improve the type checking fast-isel handling of vector bitcasts.	Craig Topper	2019-07-01	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729
*	[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 ↵	Craig Topper	2019-07-01	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728
*	[X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD.	Craig Topper	2019-07-01	2	-0/+18
\| \| \| \| \| \| \| \| \|	We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726
*	[InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics	Sanjay Patel	2019-06-30	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	This is the opposite direction of D62158 (we have to choose 1 form or the other). Now that we have FMF on the select, this becomes more palatable. And the benefits of having a single IR instruction for this operation (less chances of missing folds based on extra uses, etc) overcome my previous comments about the potential advantage of larger pattern matching/analysis. Differential Revision: https://reviews.llvm.org/D62414 llvm-svn: 364721
*	Cleanup: llvm::bsearch -> llvm::partition_point after r364719	Fangrui Song	2019-06-30	13	-42/+37
\| \| \| \|	llvm-svn: 364720
*	[X86] Custom lower AVX masked loads to masked load and vselect instead of ↵	Craig Topper	2019-06-30	2	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \|	selecting a maskmov+vblend during isel. AVX masked loads only support 0 as the value for masked off elements. So we need an extra blend to support other values. Previously we expanded the masked load to two instructions with isel patterns. With this patch we now insert the vselect during lowering and it will be separately selected as a blend. llvm-svn: 364718
*	[SelectionDAG] Use the memory VT instead of result VT for FoldingSet ↵	Craig Topper	2019-06-30	1	-3/+2
\| \| \| \| \| \| \| \| \|	profiling in getMaskedLoad/getMaskedStore. This matches what is done by the Profile function. Otherwise CSE won't work properly. llvm-svn: 364717
*	[LFTR] Rephrase getLoopTest into "based-on" check; NFCI	Nikita Popov	2019-06-29	1	-23/+23
\| \| \| \| \| \| \| \| \|	What we want to know here is whether we're already using this value for the loop condition, so make the query about that. We can extend this to a more general "based-on" relationship, rather than a direct icmp use later. llvm-svn: 364715
*	[InstCombine] canonicalize fmin/fmax to LLVM intrinsics minnum/maxnum	Sanjay Patel	2019-06-29	1	-24/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This transform came up in D62414, but we should deal with it first. We have LLVM intrinsics that correspond exactly to libm calls (unlike most libm calls, these libm calls never set errno). This holds without any fast-math-flags, so we should always canonicalize to those intrinsics directly for better optimization. Currently, we convert to fcmp+select only when we have FMF (nnan) because fcmp+select does not preserve the semantics of the call in the general case. Differential Revision: https://reviews.llvm.org/D63214 llvm-svn: 364714
*	[LFTR] Remove unnecessary latch check; NFCI	Nikita Popov	2019-06-29	1	-14/+9
\| \| \| \| \| \| \| \| \| \| \|	The whole indvars pass works on loops in simplified form, so there is always a unique latch. Convert the condition into an assertion in needsLFTR (though we also assert this in later LFTR functions). Additionally update the comment on getLoopTest() now that we are dealing with multiple exits. llvm-svn: 364713
*	[InstCombine] Shift amount reassociation (PR42391)	Roman Lebedev	2019-06-29	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Given pattern: `(x shiftopcode Q) shiftopcode K` we should rewrite it as `x shiftopcode (Q+K)` iff `(Q+K) u< bitwidth(x)` This is valid for any shift, but they must be identical. * https://rise4fun.com/Alive/9E2 * exact on both lshr => exact https://rise4fun.com/Alive/plHk * exact on both ashr => exact https://rise4fun.com/Alive/QDAA * nuw on both shl => nuw https://rise4fun.com/Alive/5Uk * nsw on both shl => nsw https://rise4fun.com/Alive/0plg Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42391 \| PR42391]]. Reviewers: spatel, nikic, RKSimon Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63812 llvm-svn: 364712
*	[APInt] Fix getBitsNeeded for INT_MIN values	Dmitry Venikov	2019-06-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes behaviour of APInt::getBitsNeeded for INT_MIN 10 bits values. Reviewers: regehr, RKSimon Reviewed By: RKSimon Subscribers: grandinj, dexonsmith, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63691 llvm-svn: 364710
*	[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)	Nikita Popov	2019-06-29	1	-40/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we have a truncated exit count we'll truncate the IV when comparing against the limit, in which case exit count overflow in post-inc form doesn't matter. However, for pointer IVs we don't do that, so we have to be careful about incrementing the IV in the wide type. I'm fixing this by removing the IVCount variable (which was ExitCount or ExitCount+1) and replacing it with a UsePostInc flag, and then moving the actual limit adjustment to the individual cases (which are: pointer IV where we add to the wide type, integer IV where we add to the narrow type, and constant integer IV where we add to the wide type). Differential Revision: https://reviews.llvm.org/D63686 llvm-svn: 364709
*	AMDGPU/GlobalISel: RegBankSelect for update.dpp	Matt Arsenault	2019-06-29	1	-0/+1
\| \| \| \|	llvm-svn: 364701