bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AMDGPU][MC] Corrected parsing of optional operands	Dmitry Preobrazhensky	2019-10-11	1	-12/+6
\| \| \| \| \| \| \| \| \| \|	See https://bugs.llvm.org/show_bug.cgi?id=43486 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D68350 llvm-svn: 374553
*	[mips] Fix loading "double" immediate into a GPR and FPR	Simon Atanasyan	2019-10-11	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a "double" (64-bit) value has zero low 32-bits, it's possible to load such value into a GP/FP registers as an instruction immediate. But now assembler loads only high 32-bits of the value. For example, if a target register is GPR the `li.d $4, 1.0` instruction converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000` in the register. While a correct representation of the `1.0` value is `0x3FF0000000000000`. The patch fixes that. Differential Revision: https://reviews.llvm.org/D68776 llvm-svn: 374544
*	[X86] isFNEG - add recursion depth limit	Simon Pilgrim	2019-10-11	1	-5/+9
\| \| \| \| \| \|	Now that its used by isNegatibleForFree we should try to avoid costly deep recursion llvm-svn: 374534
*	[PowerPC] Remove assertion "Shouldn't overwrite a register before it is killed"	Yi-Hong Lyu	2019-10-11	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The assertion is everzealous and fail tests like: renamable $x3 = LI8 0 STD renamable $x3, 16, $x1 renamable $x3 = LI8 0 Remove the assertion since killed flag of $x3 is not mandentory. Differential Revision: https://reviews.llvm.org/D68344 llvm-svn: 374515
*	[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a ↵	Craig Topper	2019-10-11	1	-0/+13
\| \| \| \| \| \|	saturating truncating store. llvm-svn: 374509
*	AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG	Matt Arsenault	2019-10-11	3	-62/+43
\| \| \| \|	llvm-svn: 374495
*	[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack ↵	Craig Topper	2019-10-11	1	-2/+9
\| \| \| \| \| \| \| \| \| \|	instructions in more situations. If we don't have VLX we won't end up selecting a saturating truncate for 256-bit or smaller vectors so we should just use the pack lowering. llvm-svn: 374487
*	[X86] Guard against leaving a dangling node in combineTruncateWithSat.	Craig Topper	2019-10-10	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When handling the packus pattern for i32->i8 we do a two step process using a packss to i16 followed by a packus to i8. If the final i8 step is a type with less than 64-bits the packus step will return SDValue(), but the i32->i16 step might have succeeded. This leaves the nodes from the middle step dangling. Guard against this by pre-checking that the number of elements is at least 8 before doing the middle step. With that check in place this should mean the only other case the middle step itself can fail is when SSE2 is disabled. So add an early SSE2 check then just assert that neither the middle or final step ever fail. llvm-svn: 374460
*	[AMDGPU] Handle undef old operand in DPP combine	Stanislav Mekhanoshin	2019-10-10	1	-1/+3
\| \| \| \| \| \| \| \|	It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455
*	[X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed ↵	Craig Topper	2019-10-10	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	inputs to be between 0 and 255 when zmm registers are disabled on SKX. If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp. Differential Revision: https://reviews.llvm.org/D68763 llvm-svn: 374431
*	[NFC][PowerPC]Clean up PPCAsmPrinter for TOC related pseudo opcode	Xiangling Liao	2019-10-10	1	-93/+70
\| \| \| \| \| \| \| \| \|	Add a helper function getMCSymbolForTOCPseudoMO to clean up PPCAsmPrinter a little bit. Differential Revision: https://reviews.llvm.org/D68721 llvm-svn: 374420
*	[ARM] VQSUB instruction	David Green	2019-10-10	2	-0/+10
\| \| \| \| \| \| \| \|	Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377
*	Fix assertions disabled builds after rL374367	Kadir Cetinkaya	2019-10-10	1	-2/+1
\| \| \| \|	llvm-svn: 374372
*	[BPF] Remove relocation for patchable externs	Yonghong Song	2019-10-10	6	-103/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, patchable extern relocations are introduced to patch external variables used for multi versioning in compile once, run everywhere use case. The load instruction will be converted into a move with an patchable immediate which can be changed by bpf loader on the host. The kernel verifier has evolved and is able to load and propagate constant values, so compiler relocation becomes unnecessary. This patch removed codes related to this. Differential Revision: https://reviews.llvm.org/D68760 llvm-svn: 374367
*	[X86] combineFMA - Convert to use isNegatibleForFree/GetNegatedExpression.	Simon Pilgrim	2019-10-10	1	-7/+84
\| \| \| \| \| \|	Split off from D67557. llvm-svn: 374356
*	[X86] combineFMADDSUB - Convert to use isNegatibleForFree/GetNegatedExpression.	Simon Pilgrim	2019-10-10	1	-10/+18
\| \| \| \| \| \|	Split off from D67557, fixes the compile time regression mentioned in rL372756 llvm-svn: 374351
*	Revert "[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs."	Jay Foad	2019-10-10	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion." This reverts the code changes from commit 53f967f2bdb6aa7b08596880c3689d1ecad6f0ff but keeps the test case. Reviewers: hliao, arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68769 llvm-svn: 374347
*	[DAG][X86] Add isNegatibleForFree/GetNegatedExpression override ↵	Simon Pilgrim	2019-10-10	2	-0/+27
\| \| \| \| \| \| \| \| \| \|	placeholders. NFCI. Continuing to undo the rL372756 reversion. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 374345
*	[ARM] VQADD instructions	David Green	2019-10-10	2	-20/+36
\| \| \| \| \| \| \| \| \|	This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336
*	Fix -Wparentheses warning. NFCI.	Simon Pilgrim	2019-10-10	1	-2/+2
\| \| \| \|	llvm-svn: 374326
*	[Mips] Fix 374055	Mirko Brkusanin	2019-10-10	1	-3/+3
\| \| \| \| \| \| \| \|	EXPENSIVE_CHECKS build was failing on new test. This is fixed by marking $ra register as undef. Test now has -verify-machineinstrs to check for operand flags. llvm-svn: 374320
*	[IfCvt][ARM] Optimise diamond if-conversion for code size	Oliver Stannard	2019-10-10	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301
*	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs	Matt Arsenault	2019-10-10	8	-21/+25
\| \| \| \| \| \| \| \| \|	SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
*	Conservatively add volatility and atomic checks in a few places	Philip Reames	2019-10-09	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86. As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them. This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions. Differential Revision: https://reviews.llvm.org/D68419 llvm-svn: 374261
*	AMDGPU: Don't fold copies to physregs	Matt Arsenault	2019-10-09	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \|	In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257
*	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer	Matt Arsenault	2019-10-09	1	-4/+14
\| \| \| \| \| \| \| \| \| \|	This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255
*	AMDGPU: Relax register classes used	Matt Arsenault	2019-10-09	1	-2/+2
\| \| \| \|	llvm-svn: 374254
*	AMDGPU: Fix typos	Matt Arsenault	2019-10-09	1	-2/+2
\| \| \| \|	llvm-svn: 374253
*	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR	Matt Arsenault	2019-10-09	1	-1/+10
\| \| \| \| \| \|	Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252
*	[AMDGPU] Fixed dpp combine of VOP1	Stanislav Mekhanoshin	2019-10-09	1	-0/+8
\| \| \| \| \| \| \| \| \|	If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241
*	[WebAssembly] Make returns variadic	Thomas Lively	2019-10-09	9	-168/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is necessary and sufficient to get simple cases of multiple return working with multivalue enabled. More complex cases will require block and loop signatures to be generalized to potentially be type indices as well. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68684 llvm-svn: 374235
*	[System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLI	Vitaly Buka	2019-10-09	1	-2/+10
\| \| \| \| \| \|	To fix "infinite recursion" warning. llvm-svn: 374222
*	[AMDGPU] Use math constants defined in MathExtras (NFC)	Evandro Menezes	2019-10-09	3	-45/+19
\| \| \| \| \| \| \| \|	Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208
*	[System Model] [TTI] Update cache and prefetch TTI interfaces	David Greene	2019-10-09	7	-37/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205
*	[WebAssembly] Add builtin and intrinsic for v8x16.swizzle	Thomas Lively	2019-10-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This clang builtin and corresponding LLVM intrinsic are necessary to expose the exact semantics of the underlying WebAssembly instruction to users. LLVM produces a poison value if the dynamic swizzle indices are greater than the vector size, but the WebAssembly instruction sets the corresponding output lane to zero. Users who depend on this behavior can safely use this builtin. Depends on D68527. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68531 llvm-svn: 374189
*	[WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR lowering	Thomas Lively	2019-10-09	3	-74/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adds the new v8x16.swizzle SIMD instruction as specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices. In addition to adding swizzles as a candidate lowering in LowerBUILD_VECTOR, also rewrites and simplifies the lowering to minimize the number of replace_lanes necessary rather than trying to minimize code size. This leads to more uses of v128.const instead of splats, which is expected to increase performance. The new code will be easier to tune once V8 implements all the vector construction operations, and it will also be easier to add new candidate instructions in the future if necessary. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68527 llvm-svn: 374188
*	[AArch64] Ensure no tagged memory is left in the unallocated portion of the	Momchil Velikov	2019-10-09	1	-15/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stack This patch makes sure that if we tag some memory, we untag that memory before the function returns/throws via any exit, reachable from the tag operation. For that we place the untag operation either at: a) the lifetime end call for the alloca, if that call post-dominates the lifetime start call (where the tag operation is placed), or it (the lifetime end call) dominates all reachable exits, otherwise b) at the reachable exits Differential Revision: https://reviews.llvm.org/D68469 llvm-svn: 374182
*	[mips] Rename local variable. NFC	Simon Atanasyan	2019-10-09	1	-19/+19
\| \| \| \|	llvm-svn: 374165
*	[mips] Split expandLoadImmReal into multiple methods. NFC	Simon Atanasyan	2019-10-09	1	-154/+205
\| \| \| \| \| \| \| \| \| \| \|	The `expandLoadImmReal` handles four different and almost non-overlapping cases: loading a "single" float immediate into a GPR, loading a "single" float immediate into a FPR, and the same couple for a "double" float immediate. It's better to move each `else if` branch into separate methods. llvm-svn: 374164
*	[BPF] do compile-once run-everywhere relocation for bitfields	Yonghong Song	2019-10-08	7	-96/+371
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void , unsigned, const void ); int field_read(struct s arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void )arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = (unsigned char )((void )arg + offset); break; case 2: ull = (unsigned short )((void )arg + offset); break; case 4: ull = (unsigned int )((void )arg + offset); break; case 8: ull = (unsigned long long )((void )arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = (u64 )(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = (u64 )(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = (u16 )(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = (u64 )(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = (u8 )(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
*	AMDGPU: Fix i16 arithmetic pattern redundancy	Matt Arsenault	2019-10-08	1	-78/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
*	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Jinsong Ji	2019-10-08	11	-54/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
*	AMDGPU: Add offsets to MMO when lowering buffer intrinsics	Tom Stellard	2019-10-08	2	-11/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087
*	[AMDGPU] Disable unused gfx10 dpp instructions	Stanislav Mekhanoshin	2019-10-08	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083
*	[WebAssembly] Fix a bug in 'try' placement	Heejin Ahn	2019-10-08	1	-13/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When searching for local expression tree created by stackified registers, for 'block' placement, we start the search from the previous instruction of a BB's terminator. But in 'try''s case, we should start from the previous instruction of a call that can throw, or a EH_LABEL that precedes the call, because the return values of the call's previous instructions can be stackified and consumed by the throwing call. For example, ``` i32.call @foo call @bar ; may throw br $label0 ``` In this case, if we start the search from the previous instruction of the terminator (`br` here), we end up stopping at `call @bar` and place a 'try' between `i32.call @foo` and `call @bar`, because `call @bar` does not have a return value so it is not a local expression tree of `br`. But in this case, unlike when placing 'block's, we should start the search from `call @bar`, because the return value of `i32.call @foo` is stackified and used by `call @bar`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68619 llvm-svn: 374073
*	[DebugInfo][If-Converter] Update call site info during the optimization	Nikola Prica	2019-10-08	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the If-Converter optimization pay attention when copying or deleting call instructions in order to keep call site information in valid state. Reviewers: aprantl, vsk, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D66955 llvm-svn: 374068
*	[Mips] Emit proper ABI for _mcount calls	Mirko Brkusanin	2019-10-08	2	-0/+49
\| \| \| \| \| \| \| \| \| \| \|	When -pg option is present than a call to _mcount is inserted into every function. However since the proper ABI was not followed then the generated gmon.out did not give proper results. By inserting needed instructions before every _mcount we can fix this. Differential Revision: https://reviews.llvm.org/D68390 llvm-svn: 374055
*	fix fmls fp16	Sebastian Pop	2019-10-08	1	-12/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tim Northover remarked that the added patterns for fmls fp16 produce wrong code in case the fsub instruction has a multiplication as its first operand, i.e., all the patterns FMLSv_OP1: > define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) { > ; CHECK-LABEL: test_FMLSv8f16_OP1: > ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h > entry: > > %mul = fmul fast <8 x half> %c, %b > %sub = fsub fast <8 x half> %mul, %a > ret <8 x half> %sub > } > > This doesn't look right to me. The exact instruction produced is "fmls > v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2v1", but the > IR is calculating "v2v1-v0". The equivalent <4 x float> code also > doesn't emit an fmls. This patch generates an fmla and negates the value of the operand2 of the fsub. Inspecting the pattern match, I found that there was another mistake in the opcode to be selected: matching FMULv416 should generate FMLSv416 and not FMLSv232. Tested on aarch64-linux with make check-all. Differential Revision: https://reviews.llvm.org/D67990 llvm-svn: 374044
*	[SVE][IR] Scalable Vector size queries and IR instruction support	Graham Hunter	2019-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042
*	AMDGPU: Propagate undef flag during pre-RA exec mask optimizations	Nicolai Haehnle	2019-10-08	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041