bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, ↵	Simon Pilgrim	2019-02-01	1	-9/+16
\| \| \| \| \| \| \| \| \| \|	BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1))) Enable peeking through one use bitcasts to the subvector shuffle. This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns llvm-svn: 352879
*	[AArch64] Optimize floating point materialization	Adhemerval Zanella	2019-02-01	7	-46/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866
*	[X86][BdVer2] Transfer delays from the integer to the floating point unit.	Roman Lebedev	2019-02-01	4	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I'm unable to find this number in the "AMD SOG for family 15h". llvm-exegesis measures the latencies of these instructions as `2`, which matches the latencies specified in "AMD SOG for family 15h". However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver, Steamroller and Excavator pipeline", "Data delay between different execution domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency. Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`, which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay. Additional data point comes from the fact that Agner's "Instruction tables", for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`. Reviewers: andreadb, RKSimon, craig.topper Reviewed By: andreadb Subscribers: gbedwell, courbet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57300 llvm-svn: 352861
*	[CodeGen] Don't scavenge non-saved regs in exception throwing functions	Oliver Stannard	2019-02-01	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, LiveRegUnits was assuming that if a block has no successors and does not return, then no registers are live at the end of it (because the end of the block is unreachable). This was causing the register scavenger to use callee-saved registers to materialise stack frame addresses without saving them in the prologue. This would normally be fine, because the end of the block is unreachable, but this is not legal if the block ends by throwing a C++ exception. If this happens, the scratch register will be modified, but its previous value won't be preserved, so it doesn't get restored by the exception unwinder. Differential revision: https://reviews.llvm.org/D57381 llvm-svn: 352844
*	[RISCV] Implement RV64D codegen	Alex Bradbury	2019-02-01	11	-0/+1483
\| \| \| \| \| \| \| \| \| \| \| \|	This patch: * Adds necessary RV64D codegen patterns * Modifies CC_RISCV so it will properly handle f64 types (with soft float ABI) Note that in general there is no reason to try to select fcvt.w[u].d rather than fcvt.l[u].d for i32 conversions because fptosi/fptoui produce poison if the input won't fit into the target type. Differential Revision: https://reviews.llvm.org/D53237 llvm-svn: 352833
*	[SelectionDAG] Support promotion of the FPOWI integer operand	Alex Bradbury	2019-02-01	1	-0/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this is a signed value, this should be sign-extended. This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for RV64, as prior to this patch that file couldn't be compiled for RV64 due to an assertion when performing codegen for fpowi. Differential Revision: https://reviews.llvm.org/D54574 llvm-svn: 352832
*	[x86] adjust test to show both add/inc options; NFC	Sanjay Patel	2019-02-01	1	-2/+4
\| \| \| \| \| \| \| \|	If we're optimizing for size, that overrides the subtarget feature, so we would always produce 'inc' if we matched this pattern. llvm-svn: 352821
*	[x86] add test for missed opportunity to use 'inc'; NFC	Sanjay Patel	2019-01-31	1	-0/+43
\| \| \| \| \| \|	Another pattern exposed in D57516. llvm-svn: 352820
*	GlobalISel: Fix MMO creation with non-power-of-2 mem size	Matt Arsenault	2019-01-31	1	-0/+9
\| \| \| \| \| \| \|	It should probably just be mandatory for getTgtMemIntrinsic to return the alignment. llvm-svn: 352817
*	[WebAssembly] Fix a regression selecting negative build_vector lanes	Thomas Lively	2019-01-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The custom lowering introduced in rL352592 creates build_vector nodes with negative i32 operands, but these operands did not meet the value range constraints necessary to match build_vector nodes. This CL fixes the issue by removing the unnecessary constraints. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57481 llvm-svn: 352813
*	[RISCV] Add RV64F codegen support	Alex Bradbury	2019-01-31	10	-0/+1336
\| \| \| \| \| \| \| \| \| \| \| \| \|	This requires a little extra work due tothe fact i32 is not a legal type. When call lowering happens post-legalisation (e.g. when an intrinsic was inserted during legalisation). A bitcast from f32 to i32 can't be introduced. This is similar to the challenges with RV32D. To handle this, we introduce target-specific DAG nodes that perform bitcast+anyext for f32->i64 and trunc+bitcast for i64->f32. Differential Revision: https://reviews.llvm.org/D53235 llvm-svn: 352807
*	[x86] add test for missed opportunity to use 'inc'; NFC	Sanjay Patel	2019-01-31	1	-0/+30
\| \| \| \|	llvm-svn: 352805
*	[WebAssembly] Add bulk memory target feature	Thomas Lively	2019-01-31	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also clean up some preexisting target feature code. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, jfb Differential Revision: https://reviews.llvm.org/D57495 llvm-svn: 352793
*	[DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT	Guozhi Wei	2019-01-31	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes pr39098. For the attached test case, CombineZExtLogicopShiftLoad can optimize it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 But later visitANDLike transforms it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t61: i32 = truncate t57 t63: i32 = srl t61, Constant:i8<1> t64: i32 = and t63, Constant:i32<524287> t65: i64 = zero_extend t64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop. Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free. Differential Revision: https://reviews.llvm.org/D57491 llvm-svn: 352792
*	[Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments	Leonard Chan	2019-01-31	1	-72/+28
\| \| \| \| \| \| \| \| \| \| \|	r zero scale SMULFIX, expand into MUL which produces better code for X86. For vector arguments, expand into MUL if SMULFIX is provided with a zero scale. Otherwise, expand into MULH[US] or [US]MUL_LOHI. Differential Revision: https://reviews.llvm.org/D56987 llvm-svn: 352783
*	Revert "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."	Craig Topper	2019-01-31	1	-42/+86
\| \| \| \| \| \|	This is causing a failure in chromium llvm-svn: 352782
*	[X86][AVX] Fold concat(broadcast(x),broadcast(x)) -> broadcast(x)	Simon Pilgrim	2019-01-31	2	-76/+25
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352774
*	[X86][AVX] insert_subvector(bitcast(v), bitcast(s), c1) -> ↵	Simon Pilgrim	2019-01-31	4	-98/+67
\| \| \| \| \| \| \| \| \| \|	bitcast(insert_subvector(v,s,c2)) Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling. Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352773
*	revert r352766: [PatternMatch] add special-case uaddo matching for ↵	Sanjay Patel	2019-01-31	1	-1/+3
\| \| \| \| \| \| \| \|	increment-by-one Missed some regression test updates when testing this. llvm-svn: 352769
*	[PatternMatch] add special-case uaddo matching for increment-by-one	Sanjay Patel	2019-01-31	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 We were failing to match the canonicalized pattern when it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks, but that's a different problem. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352766
*	[X86][AVX] Fold broadcast(bitcast(src)) -> bitcast(broadcast(src))	Simon Pilgrim	2019-01-31	3	-12/+8
\| \| \| \|	llvm-svn: 352751
*	[X86][AVX] Add PR34394 subvector broadcast test cases	Simon Pilgrim	2019-01-31	1	-10/+131
\| \| \| \| \| \|	Tidyup check-prefixes at the same time llvm-svn: 352749
*	[X86] combineExtractWithShuffle - more aggressively peek through bitcasts	Simon Pilgrim	2019-01-31	1	-29/+14
\| \| \| \| \| \|	Fixes regression introduced by rL352743 llvm-svn: 352745
*	[X86][AVX] Enable AVX1 broadcasts in shuffle combining	Simon Pilgrim	2019-01-31	7	-17/+10
\| \| \| \| \| \| \| \|	Enables 32/64-bit scalar load broadcasts on AVX1 targets The extractelement-load.ll regression will be fixed shortly in a followup commit. llvm-svn: 352743
*	[X86][AVX] Fold vt1 concat_vectors(vt2 undef, vt2 broadcast(x)) --> vt1 ↵	Simon Pilgrim	2019-01-31	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	broadcast(x) If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast. Avoids a regression when we enable AVX1 broadcasts in shuffle combining llvm-svn: 352742
*	[ARM] Thumb2: ConstantMaterializationCost	Sjoerd Meijer	2019-01-31	1	-67/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Constants can also be materialised using the negated value and a MVN, and this case seem to have been missed for Thumb2. To check the constant materialisation costs, we now call getT2SOImmVal twice, once for the original constant and then also for its negated value, and this function checks if the constant can both be splatted or rotated. This was revealed by a test that optimises for minsize: instead of a LDR literal pool load and having a literal pool entry, just a MVN with an immediate is smaller (and also faster). Differential Revision: https://reviews.llvm.org/D57327 llvm-svn: 352737
*	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS	Sjoerd Meijer	2019-01-31	3	-0/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
*	GlobalISel: Handle odd splits in fewerElementsVector for load/store	Matt Arsenault	2019-01-31	2	-123/+248
\| \| \| \|	llvm-svn: 352720
*	GlobalISel: Implement narrowScalar for bswap	Matt Arsenault	2019-01-31	1	-0/+125
\| \| \| \|	llvm-svn: 352719
*	GlobalISel: Allow bitcount ops to have different result type	Matt Arsenault	2019-01-31	6	-5/+445
\| \| \| \| \| \|	For AMDGPU the result is always 32-bit for 64-bit inputs. llvm-svn: 352717
*	GlobalISel: Fix creating MMOs with align 0	Matt Arsenault	2019-01-31	12	-127/+127
\| \| \| \|	llvm-svn: 352712
*	[X86] Add a 32-bit command line to avx512-intrinsics.ll. Move all 64-bit ↵	Craig Topper	2019-01-31	2	-2265/+4569
\| \| \| \| \| \| \| \|	mode only intrinsics to avx512-intrinsics-x86_64.ll. Most of the other intrinsic tests have a 32-bit command lines. llvm-svn: 352708
*	[LegalizeVectorTypes] Allow illegal indices when splitting extract_vector_elt	Thomas Lively	2019-01-31	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR40267, in which the removed assertion was triggering on perfectly valid IR. As far as I can tell, constant out of bounds indices should be allowed when splitting extract_vector_elt, since they will simply be propagated as out of bounds indices in the resulting split vector and handled appropriately elsewhere. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya Differential Revision: https://reviews.llvm.org/D57471 llvm-svn: 352702
*	[X86] Add test case for pr40539. NFC	Craig Topper	2019-01-31	1	-0/+36
\| \| \| \|	llvm-svn: 352697
*	[GlobalISel][AArch64] Select G_FEXP	Jessica Paquette	2019-01-30	4	-1/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also allows us to select G_FEXP. It... - Updates the legalizer-info tests - Adds a test for legalizing exp - Updates the existing fp tests to show that we can now select G_FEXP https://reviews.llvm.org/D57483 llvm-svn: 352692
*	[PowerPC] delete no more needed workaround for readsRegister() in PowerPC	Chen Zheng	2019-01-30	1	-0/+18
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57439 llvm-svn: 352689
*	MIR: Reject non-power-of-4 alignments in MMO parsing	Matt Arsenault	2019-01-30	21	-151/+163
\| \| \| \|	llvm-svn: 352686
*	[GlobalISel][AArch64] Select G_FABS	Jessica Paquette	2019-01-30	4	-1/+172
\| \| \| \| \| \| \| \| \|	This adds instruction selection support for G_FABS in AArch64. It also updates the existing basic FP tests, adds a selection test for G_FABS. https://reviews.llvm.org/D57418 llvm-svn: 352684
*	[WebAssembly] Restore stack pointer right after catch instruction	Heejin Ahn	2019-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: After the staack is unwound due to a thrown exxception, `__stack_pointer` global can point to an invalid address. So a `global.set` to restore `__stack_pointer` should be inserted right after `catch` instruction. But after r352598 the `global.set` instruction is inserted not right after `catch` but after `block` - `br-on-exn` - `end_block` - `extract_exception` sequence. This CL fixes it. While doing that, we can actually move ReplacePhysRegs pass after LateEHPrepare and merge EHRestoreStackPointer pass into LateEHPrepare, and now placing `global.set` to `__stack_pointer` right after `catch` is much easier. Otherwise it is hard to guarantee that `global.set` is still right after `catch` and not touched with other transformations, in which case we have to do something to hoist it. Reviewers: dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D57421 llvm-svn: 352681
*	[DAGCombiner] sub X, 0/1 --> add X, 0/-1	Sanjay Patel	2019-01-30	3	-13/+8
\| \| \| \| \| \| \| \| \| \|	This extends the existing transform for: add X, 0/1 --> sub X, 0/-1 ...to allow the sibling subtraction fold. This pattern could regress with the proposed change in D57401. llvm-svn: 352680
*	[AArch64][x86] add tests for add/sub signbits fold; NFC	Sanjay Patel	2019-01-30	2	-0/+65
\| \| \| \| \| \| \|	As discussed/shown in D57401, we are missing a fold for subtract of 0/1 --> add 0/-1. llvm-svn: 352678
*	[GlobalISel][AArch64] Add instruction selection support for @llvm.log2	Jessica Paquette	2019-01-30	4	-1/+261
\| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters it. It updates the existing floating point tests to show that we don't fall back on the intrinsic, and select the correct instructions. It also adds a legalizer test for G_FLOG2. https://reviews.llvm.org/D57357 llvm-svn: 352673
*	[GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt	Jessica Paquette	2019-01-30	4	-0/+250
\| \| \| \| \| \| \| \| \| \|	This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer test for G_FSQRT, a selection test for it, and updates existing floating point tests. https://reviews.llvm.org/D57361 llvm-svn: 352671
*	[GlobalISel] Add IRTranslator support for @llvm.sqrt -> G_FSQRT	Jessica Paquette	2019-01-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	Follow-up commit to https://reviews.llvm.org/D57359. (r352668) This adds IRTranslator support for recognising a @llvm.sqrt intrinsic and translating it into a G_FSQRT. https://reviews.llvm.org/D57360 llvm-svn: 352670
*	[GlobalISel] Introduce a G_FSQRT generic instruction	Jessica Paquette	2019-01-30	1	-0/+3
\| \| \| \| \| \| \| \| \|	This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that. llvm-svn: 352668
*	Add a 'dynamic' parameter to the objectsize intrinsic	Erik Pilkington	2019-01-30	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
*	[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7.	Craig Topper	2019-01-30	1	-86/+42
\| \| \| \| \| \| \| \| \| \|	This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 352660
*	[X86][AVX] Prefer to combine shuffle to broadcasts whenever possible	Simon Pilgrim	2019-01-30	1	-11/+23
\| \| \| \| \| \|	This is the first step towards improving broadcast support on AVX1 targets. llvm-svn: 352634
*	GlobalISel: Implement fewerElementsVector for select	Matt Arsenault	2019-01-30	1	-0/+209
\| \| \| \|	llvm-svn: 352601
*	AMDGPU/GlobalISel: Fix clamping shifts with 16-bit insts	Matt Arsenault	2019-01-30	3	-0/+126
\| \| \| \|	llvm-svn: 352599