bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Make printf lowering faster when there are no printfs	Jay Foad	2019-10-02	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433
*	[X86] Add broadcast load folding patterns to the NoVLX compare patterns.	Craig Topper	2019-10-02	1	-16/+138
\| \| \| \| \| \| \| \| \|	These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423
*	AMDGPU/GlobalISel: Use getIntrinsicID helper	Matt Arsenault	2019-10-02	3	-7/+7
\| \| \| \|	llvm-svn: 373417
*	AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX	Matt Arsenault	2019-10-02	1	-1/+7
\| \| \| \| \| \| \| \| \|	In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415
*	AMDGPU/GlobalISel: Private loads always use VGPRs	Matt Arsenault	2019-10-02	1	-4/+6
\| \| \| \|	llvm-svn: 373414
*	AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR	Matt Arsenault	2019-10-02	1	-4/+6
\| \| \| \| \| \|	This will be needed to support AGPR operations. llvm-svn: 373413
*	AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values	Matt Arsenault	2019-10-02	1	-29/+35
\| \| \| \|	llvm-svn: 373412
*	[AMDGPU] separate accounting for agprs	Stanislav Mekhanoshin	2019-10-02	3	-7/+52
\| \| \| \| \| \| \| \| \|	Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411
*	[X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵	Craig Topper	2019-10-01	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \|	constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
*	AMDGPU: Fix an out of date assert in addressing FrameIndex	Changpeng Fang	2019-10-01	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404
*	Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵	Craig Topper	2019-10-01	1	-79/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
*	[X86] convertToThreeAddress, make sure second operand of SUB32ri is really ↵	Craig Topper	2019-10-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	an immediate before calling getImm(). It might be a symbol instead. We can't fold those since we can't negate them. Similar for other SUB with immediates. Fixes PR43529. llvm-svn: 373397
*	AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo	Tom Stellard	2019-10-01	1	-205/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366
*	AMDGPU/GlobalISel: Increase max legal size to 1024	Matt Arsenault	2019-10-01	3	-10/+13
\| \| \| \| \| \| \| \|	There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350
*	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵	Craig Topper	2019-10-01	6	-182/+335
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
*	[AMDGPU] Add VerifyScheduling support.	Jay Foad	2019-10-01	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346
*	[X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to ↵	Craig Topper	2019-10-01	1	-16/+25
\| \| \| \| \| \| \| \| \| \|	the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303
*	[WebAssembly] Make sure EH pads are preferred in sorting	Heejin Ahn	2019-10-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In CFGSort, we try to make EH pads have higher priorities as soon as they are ready to be sorted, to prevent creation of unwind destination mismatches in CFGStackify. We did that by making priority queues' comparison function prefer EH pads, but it was possible for an EH pad to be popped from `Preferred` queue and then not sorted immediately and enter `Ready` queue instead in a certain condition. This patch makes sure that special condition does not consider EH pads as its candidates. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68229 llvm-svn: 373302
*	[WebAssembly] Unstackify regs after fixing unwinding mismatches	Heejin Ahn	2019-10-01	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixing unwind mismatches for exception handling can result in splicing existing BBs and moving some of instructions to new BBs. In this case some of stackified def registers in the original BB can be used in the split BB. For example, we have this BB and suppose %r0 is a stackified register. ``` bb.1: %r0 = call @foo ... use %r0 ... ``` After fixing unwind mismatches in CFGStackify, `bb.1` can be split and some instructions can be moved to a newly created BB: ``` bb.1: %r0 = call @foo bb.split (new): ... use %r0 ... ``` In this case we should make %r0 un-stackified, because its use is now in another BB. When spliting a BB, this CL unstackifies all def registers that have uses in the new split BB. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68218 llvm-svn: 373301
*	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP	Matt Arsenault	2019-10-01	4	-8/+55
\| \| \| \|	llvm-svn: 373298
*	AMDGPU/GlobalISel: Add support for init.exec intrinsics	Matt Arsenault	2019-10-01	6	-20/+42
\| \| \| \| \| \| \|	TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296
*	AMDGPU/GlobalISel: Allow scc/vcc alternative mappings for s1 constants	Matt Arsenault	2019-10-01	1	-1/+15
\| \| \| \|	llvm-svn: 373295
*	AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering	Matt Arsenault	2019-10-01	1	-3/+8
\| \| \| \| \| \| \| \|	This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293
*	TLI: Remove DAG argument from getRegisterByName	Matt Arsenault	2019-10-01	20	-62/+60
\| \| \| \| \| \| \| \| \| \| \|	Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292
*	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO	Matt Arsenault	2019-10-01	3	-1/+47
\| \| \| \|	llvm-svn: 373288
*	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources	Matt Arsenault	2019-10-01	1	-3/+9
\| \| \| \| \| \|	Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287
*	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE	Matt Arsenault	2019-10-01	3	-8/+105
\| \| \| \| \| \| \|	Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286
*	[LegacyPassManager] Deprecate the BasicBlockPass/Manager.	Alina Sbirlea	2019-09-30	2	-48/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The BasicBlockManager is potentially broken and should not be used. Replace all uses of the BasicBlockPass with a FunctionBlockPass+loop on blocks. Reviewers: chandlerc Subscribers: jholewinski, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68234 llvm-svn: 373254
*	[X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when ↵	Craig Topper	2019-09-30	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246
*	[X86] Address post-commit review from code I accidentally commited in r373136.	Craig Topper	2019-09-30	1	-3/+6
\| \| \| \| \| \|	See https://reviews.llvm.org/D68167 llvm-svn: 373245
*	[NewPM] Port MachineModuleInfo to the new pass manager.	Yuanfang Chen	2019-09-30	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Existing clients are converted to use MachineModuleInfoWrapperPass. The new interface is for defining a new pass manager API in CodeGen. Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm Reviewed By: arsenm, fedor.sergeev Differential Revision: https://reviews.llvm.org/D64183 llvm-svn: 373240
*	[X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to ↵	Craig Topper	2019-09-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234
*	[AArch64][SVE] Implement punpk[hi\|lo] intrinsics	Kerry McLaughlin	2019-09-30	2	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the following two intrinsics: - int_aarch64_sve_punpkhi - int_aarch64_sve_punpklo This patch also contains a fix which allows LLVMHalfElementsVectorType to forward reference overloadable arguments. Reviewers: sdesmalen, rovka, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, greened, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67830 llvm-svn: 373232
*	[AArch64][GlobalISel] Support lowering variadic musttail calls	Jessica Paquette	2019-09-30	1	-11/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for lowering variadic musttail calls. To do this, we have to... - Detect a musttail call in a variadic function before attempting to lower the call's formal arguments. This is done in the IRTranslator. - Compute forwarded registers in `lowerFormalArguments`, and add copies for those registers. - Restore the forwarded registers in `lowerTailCall`. Because there doesn't seem to be any nice way to wrap these up into the outgoing argument handler, the restore code in `lowerTailCall` is done separately. Also, irritatingly, you have to make sure that the registers don't overlap with any passed parameters. Otherwise, the scheduler doesn't know what to do with the extra copies and asserts. Add call-translator-variadic-musttail.ll to test this. This is pretty much the same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to base this off of, but the idea is the same. Differential Revision: https://reviews.llvm.org/D68043 llvm-svn: 373226
*	[mips] Fix code indentation. NFC	Simon Atanasyan	2019-09-30	1	-3/+3
\| \| \| \|	llvm-svn: 373225
*	[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition	Alexander Timofeev	2019-09-30	1	-0/+7
\| \| \| \| \| \| \| \|	Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
*	[Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)	Guillaume Chatelet	2019-09-30	3	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207
*	[ARM][MVE] Change VCTP operand	Sam Parker	2019-09-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188
*	[ARM][CGP] Allow signext arguments	Sam Parker	2019-09-30	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \|	As we perform a zext on any arguments used in the promoted tree, it doesn't matter if they're marked as signext. The only permitted user(s) in the tree which would interpret the sign bits are signed icmps. For these instructions, their promoted operands are truncated before the icmp uses them. Differential Revision: https://reviews.llvm.org/D68019 llvm-svn: 373186
*	[SystemZ] Add SystemZPostRewrite in addPostRegAlloc() instead at -O0.	Jonas Paulsson	2019-09-30	1	-1/+4
\| \| \| \| \| \| \| \|	SystemZPostRewrite needs to be run before (it may emit COPYs) the Post-RA pseudo pass also at -O0, so it should be added in addPostRegAlloc(). Review: Ulrich Weigand llvm-svn: 373182
*	[X86] Remove some redundant isel patterns. NFCI	Craig Topper	2019-09-30	1	-78/+0
\| \| \| \| \| \| \|	These are all also implemented in avx512_logical_lowering_types with support for masking. llvm-svn: 373181
*	AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor	Matt Arsenault	2019-09-30	1	-15/+17
\| \| \| \|	llvm-svn: 373180
*	[X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to ↵	Craig Topper	2019-09-30	1	-1/+12
\| \| \| \| \| \|	enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177
*	[X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r373174	Fangrui Song	2019-09-30	1	-0/+1
\| \| \| \|	llvm-svn: 373175
*	[X86] Remove -x86-experimental-vector-widening-legalization command line flag	Craig Topper	2019-09-29	2	-1257/+145
\| \| \| \| \| \| \| \| \|	This was added back to allow some performance regressions to be investigated. The main perf issue was fixed shortly after adding this back and no other major issues have been reported. So I think its safe to remove this again. llvm-svn: 373174
*	[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.	Craig Topper	2019-09-29	1	-1/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's room from improvement here, but this is a decent starting point. There are a few minor regressions in the vector-rotate tests, where we are now forming a vpternlog from an and before we get a chance to form it for a bitselect that we were matching previously. This results in an AND and an ANDN feeding the vpternlog where previously we just had an AND after the vpternlog. I think we can probably DAG combine the AND with the bitselect to get back to similar codegen. llvm-svn: 373172
*	[PowerPC] Fix conditions of assert in PPCAsmPrinter	Jinsong Ji	2019-09-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: g++ build emits warning: llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp:667:77: error: suggest parentheses around ?&&? within ?\|\|? [-Werror=parentheses] assert(MO.isGlobal() \|\| MO.isCPI() \|\| MO.isJTI() \|\| MO.isBlockAddress() && ~~~~~~~~~~~~~~~~~~~~^~ "Unexpected operand type for LWZtoc pseudo."); I believe the intension is to assert all different types, so we should add a parentheses to include all '\|\|'. Reviewers: #powerpc, sfertile, hubert.reinterpretcast, Xiangling_L Reviewed By: Xiangling_L Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, shchenz, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68180 llvm-svn: 373164
*	[ARM] Cortex-M4 schedule additions	David Green	2019-09-29	5	-17/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an attempt to fill in some of the missing instructions from the Cortex-M4 schedule, and make it easier to do the same for other ARM cpus. - Some instructions are marked as hasNoSchedulingInfo as they are pseudos or otherwise do not require scheduling info - A lot of features have been marked not supported - Some WriteRes's have been added for cvt instructions. - Some extra instruction latencies have been added, notably by relaxing the regex for dsp instruction to catch more cases, and some fp instructions. This goes a long way to get the CompleteModel working for this CPU. It does not go far enough as to get all scheduling info for all output operands correct. Differential Revision: https://reviews.llvm.org/D67957 llvm-svn: 373163
*	[X86] Enable isel to fold broadcast loads that have been bitcasted from FP ↵	Craig Topper	2019-09-29	1	-0/+96
\| \| \| \| \| \|	into a vpternlog. llvm-svn: 373157
*	[X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp	Craig Topper	2019-09-29	2	-43/+160
\| \| \| \| \| \| \| \| \| \| \| \|	This allows us to reduce the use count on the condition node before the match. This enables load folding for that operand without relying on the peephole pass. This will be improved on for broadcast load folding in a subsequent commit. This still requires a bunch of isel patterns for vXi16/vXi8 types though. llvm-svn: 373156