bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[mips] Spectre variant two mitigation for MIPSR2	Simon Dardis	2018-02-21	14	-38/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch provides mitigation for CVE-2017-5715, Spectre variant two, which affects the P5600 and P6600. It implements the LLVM part of -mindirect-jump=hazard. It is _not_ enabled by default for the P5600. The migitation strategy suggested by MIPS for these processors is to use hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard barrier variants of the 'jalr' and 'jr' instructions respectively. These instructions impede the execution of instruction stream until architecturally defined hazards (changes to the instruction stream, privileged registers which may affect execution) are cleared. These instructions in MIPS' designs are not speculated past. These instructions are used with the attribute +use-indirect-jump-hazard when branching indirectly and for indirect function calls. These instructions are defined by the MIPS32R2 ISA, so this mitigation method is not compatible with processors which implement an earlier revision of the MIPS ISA. Performance benchmarking of this option with -fpic and lld using -z hazardplt shows a difference of overall 10%~ time increase for the LLVM testsuite. Certain benchmarks such as methcall show a substantially larger increase in time due to their nature. Reviewers: atanasyan, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D43486 llvm-svn: 325653
*	[InstCombine] C / -X --> -C / X	Sanjay Patel	2018-02-21	1	-8/+17
\| \| \| \| \| \| \| \| \|	We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. This is similar to rL325648. llvm-svn: 325649
*	[InstCombine] -X / C --> X / -C for FP	Sanjay Patel	2018-02-20	1	-5/+12
\| \| \| \| \| \| \|	We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. llvm-svn: 325648
*	Revert "[AMDGPU] Increased vector length for global/constant loads."	Konstantin Zhuravlyov	2018-02-20	2	-34/+2
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643
*	[DSE] Don't DSE stores that subsequent memmove calls read from	Sanjoy Das	2018-02-20	1	-16/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We used to remove the first memmove in cases like this: memmove(p, p+2, 8); memmove(p, p+2, 8); which is incorrect. Fix this by changing isPossibleSelfRead to what was most likely the intended behavior. Historical note: the buggy code was added in https://reviews.llvm.org/rL120974 to address PR8728. Reviewers: rsmith Subscribers: mcrosier, llvm-commits, jlebar Differential Revision: https://reviews.llvm.org/D43425 llvm-svn: 325641
*	[PBQP] Fix PR33038 by pruning empty intervals in initializeGraph.	Lang Hames	2018-02-20	1	-11/+27
\| \| \| \| \| \| \| \|	Spilling may cause previously non-empty intervals (both for the spilled vreg and others) to become empty. Moving the pruning into initializeGraph catches these cases and fixes PR33038. llvm-svn: 325632
*	[MemoryBuiltins] Check nobuiltin status when identifying calls to free.	Benjamin Kramer	2018-02-20	1	-10/+8
\| \| \| \| \| \| \| \|	This is usually not a problem because this code's main purpose is eliminating unused new/delete pairs. We got deletes of nullptr or nobuiltin deletes of builtin new wrong though. llvm-svn: 325630
*	[InstCombine] remove unneeded operand swap: NFCI	Sanjay Patel	2018-02-20	1	-3/+0
\| \| \| \| \| \| \|	FMul is commutative, so complexity-based canonicalization should always take care of the swap via SimplifyAssociativeOrCommutative(). llvm-svn: 325628
*	[SelectionDAG] Support known true/false SimplifySetCC cases for comparing ↵	Craig Topper	2018-02-20	1	-58/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	against vector splats of constants. This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together. This supports things like (setcc uge X, 0) -> true Differential Revision: https://reviews.llvm.org/D43489 llvm-svn: 325627
*	[AArch64] Refactor instructions using SIMD immediates	Evandro Menezes	2018-02-20	1	-368/+281
\| \| \| \| \| \| \| \| \| \| \|	Get rid of icky goto loops and make the code easier to maintain. Otherwise, NFC. Restore r324903 and fix PR36369. Differentail revision: https://reviews.llvm.org/D43364 llvm-svn: 325621
*	[LTO] Remove unused Path parameter to AddBufferFn	Teresa Johnson	2018-02-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With D43396, no clients use the Path parameter anymore. Depends on D43396. Reviewers: pcc Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43400 llvm-svn: 325619
*	[ARM] Lower BR_CC for f16	Sjoerd Meijer	2018-02-20	1	-2/+1
\| \| \| \| \| \| \| \|	This case wasn't handled yet. Differential Revision: https://reviews.llvm.org/D43508 llvm-svn: 325616
*	[Hexagon] Handle *Low8 register classes in early if-conversion	Krzysztof Parzyszek	2018-02-20	1	-0/+2
\| \| \| \|	llvm-svn: 325606
*	[X86] Correct SHRUNKBLEND creation to work correctly when there are multiple ↵	Craig Topper	2018-02-20	1	-31/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	uses of the condition. SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set. So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend. This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition. Differential Revision: https://reviews.llvm.org/D43446 llvm-svn: 325604
*	[SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and ↵	Craig Topper	2018-02-20	3	-19/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	simplify DAGCombiner and simplifySetCC code and fix a bug. DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places. This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner. Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too. Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future. Fixes PR36250. Differential Revision: https://reviews.llvm.org/D43449 llvm-svn: 325602
*	[X86] Promote 16-bit cmovs to 32-bits	Craig Topper	2018-02-20	1	-3/+54
\| \| \| \| \| \| \| \| \| \|	This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions. This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits. Differential Revision: https://reviews.llvm.org/D43327 llvm-svn: 325601
*	[InstCombine] remove unneeded dyn_cast to prevent unused variable warning	Sanjay Patel	2018-02-20	1	-2/+1
\| \| \| \|	llvm-svn: 325597
*	[InstCombine] remove compound fdiv pattern folds	Sanjay Patel	2018-02-20	1	-27/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are fdiv-with-constant-divisor, so they already become reciprocal multiplies. The last gap for vector ops should be closed with rL325590. It's possible that we're missing folds for some edge cases with denormal intermediate constants after deleting these, but there are no tests for those patterns, and it would be better to handle denormals more consistently (and less conservatively) as noted in TODO comments. llvm-svn: 325595
*	[InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C)	Sanjay Patel	2018-02-20	2	-21/+29
\| \| \| \|	llvm-svn: 325590
*	[mips] Correct the definition of cvt.d.w	Simon Dardis	2018-02-20	1	-3/+2
\| \| \| \| \| \| \| \|	An upcoming patch D41434, changes the ordering of the matcher table for assembly. This patch corrects the definition of the normal MIPS cvt.d.w not to be available in microMIPS. llvm-svn: 325589
*	[DEBUGINFO] Add support for emission of the inlined strings.	Alexey Bataev	2018-02-20	3	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch adds an option for emission of inlined strings rather than .debug_str section. Reviewers: echristo, jlebar Subscribers: eraman, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D43390 llvm-svn: 325583
*	[PowerPC] Reduce stack frame for fastcc functions by only allocating ↵	Lei Huang	2018-02-20	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \|	parameter save area when needed Current implementation always allocates the parameter save area conservatively for fastcc functions. There is no reason to allocate the parameter save area if all the parameters can be passed via registers. Differential Revision: https://reviews.llvm.org/D42602 llvm-svn: 325581
*	[Hexagon] Fix alignment calculation of stack objects in Hexagon bit tracker	Krzysztof Parzyszek	2018-02-20	3	-6/+6
\| \| \| \|	llvm-svn: 325580
*	[VectorLegalizer] Fix uint64_t typo in ExpandUINT_TO_FLOAT (PR36391)	Simon Pilgrim	2018-02-20	1	-1/+1
\| \| \| \| \| \| \| \|	ExpandUINT_TO_FLOAT can accept vXi32 or vXi64 inputs, so we need to use a uint64_t shift to generate the 2^(BW/2) constant. No test case unfortunately as no upstream target uses this, but its affecting a downstream target. llvm-svn: 325578
*	[ARM] Mark -1 as cheap in xor's for thumb1	David Green	2018-02-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1 would not otherwise be considered a cheap constant. This prevents the -1's from being pulled out into constants and potentially hoisted. Differential Revision: https://reviews.llvm.org/D43451 llvm-svn: 325573
*	[llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".	George Rimar	2018-02-20	6	-2/+39
\| \| \| \| \| \| \| \| \| \| \|	For instructions like call foo and jmp foo patch changes relocation produced from R_X86_64_PC32 to R_X86_64_PLT32. Relocation can be used as a marker for 32-bit PC-relative branches. Linker will reduce PLT32 relocation to PC32 if function is defined locally. Differential revision: https://reviews.llvm.org/D43383 llvm-svn: 325569
*	[AMDGPU] stop buffer_store being moved illegally	Tim Renouf	2018-02-20	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The machine instruction scheduler was illegally moving a buffer store past a buffer load with the same descriptor and offset. Fixed by marking buffer ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D43332 Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7 llvm-svn: 325567
*	[MC] - Don't crash on unclosed frame.	George Rimar	2018-02-20	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm-mc can crash when there is cfi_startproc without cfi_end_proc: .text .globl foo foo: .cfi_startproc Testcase shows the issue, patch fixes it. Differential revision: https://reviews.llvm.org/D43456 llvm-svn: 325564
*	[X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and ↵	Craig Topper	2018-02-20	2	-9/+46
\| \| \| \| \| \| \| \|	auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics. The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select. llvm-svn: 325559
*	Report fatal error in the case of out of memory	Serge Pavlov	2018-02-20	9	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second part of recommit of r325224. The previous part was committed in r325426, which deals with C++ memory allocation. Solution for C memory allocation involved functions `llvm::malloc` and similar. This was a fragile solution because it caused ambiguity errors in some cases. In this commit the new functions have names like `llvm::safe_malloc`. The relevant part of original comment is below, updated for new function names. Analysis of fails in the case of out of memory errors can be tricky on Windows. Such error emerges at the point where memory allocation function fails, but manifests itself when null pointer is used. These two points may be distant from each other. Besides, next runs may not exhibit allocation error. In some cases memory is allocated by a call to some of C allocation functions, malloc, calloc and realloc. They are used for interoperability with C code, when allocated object has variable size and when it is necessary to avoid call of constructors. In many calls the result is not checked for null pointer. To simplify checks, new functions are defined in the namespace 'llvm': `safe_malloc`, `safe_calloc` and `safe_realloc`. They behave as corresponding standard functions but produce fatal error if allocation fails. This change replaces the standard functions like 'malloc' in the cases when the result of the allocation function is not checked for null pointer. Finally, there are plain C code, that uses malloc and similar functions. If the result is not checked, assert statement is added. Differential Revision: https://reviews.llvm.org/D43010 llvm-svn: 325551
*	[AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to ↵	Amara Emerson	2018-02-20	1	-4/+31
\| \| \| \| \| \| \| \| \| \| \| \|	fpr32 first. This is a follow on commit to r[x] where we fix the other direction of copy. For this case, after converting the source from gpr32 -> fpr32, we use a subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land. https://reviews.llvm.org/D43444 llvm-svn: 325550
*	[X86] Make XOP VPCOM instructions commutable to fold loads during isel.	Craig Topper	2018-02-20	3	-52/+75
\| \| \| \|	llvm-svn: 325547
*	[X86] Make a helper function for commuting AVX512 VPCMP immediates since we ↵	Craig Topper	2018-02-20	3	-24/+24
\| \| \| \| \| \|	do it in two places. llvm-svn: 325546
*	[InstCombine] use CreateWithCopiedFlags to reduce code; NFCI	Sanjay Patel	2018-02-19	1	-7/+6
\| \| \| \| \| \|	Also, move the folds with constants closer to make it easier to follow. llvm-svn: 325541
*	Revert "[mem2reg] Use range loops (NFCI)"	Brian Gesiak	2018-02-19	1	-8/+9
\| \| \| \| \| \|	This reverts commit r325532. llvm-svn: 325539
*	[X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.	Craig Topper	2018-02-19	1	-0/+4
\| \| \| \| \| \|	Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway. llvm-svn: 325534
*	[InstCombine] allow fdiv with constant dividend folds with less than full ↵	Sanjay Patel	2018-02-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	-ffast-math It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this should be conservatively better than what we have right now. GCC allows this with only -freciprocal-math. The last test is changed to show a case that is expected to fold, but we need D43398. llvm-svn: 325533
*	[mem2reg] Use range loops (NFCI)	Brian Gesiak	2018-02-19	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Several for loops in PromoteMemoryToRegister.cpp leave their increment expression empty, instead incrementing the iterator within the for loop body. I believe this is because these loops were previously implemented as while loops; see https://reviews.llvm.org/rL188327. Incrementing the iterator within the body of the for loop instead of in its increment expression makes it seem like the iterator will be modified or conditionally incremented within the loop, but that is not the case in these loops. Instead, use range loops. Test Plan: `check-llvm` Reviewers: davide, bkramer Reviewed By: davide, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43473 llvm-svn: 325532
*	[InstCombine] refactor fdiv with constant dividend folds; NFC	Sanjay Patel	2018-02-19	1	-26/+27
\| \| \| \| \| \| \| \| \|	The last fold that used to be here was not necessary. That's a combination of 2 folds (and there's a regression test to show that). The transforms are guarded by isFast(), but that should be loosened. llvm-svn: 325531
*	[Coroutines] Move debug statement before assert	Brian Gesiak	2018-02-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Summary: Move a debug statement to above where an assertion is hit, so that the debug statement can be inspected before a stack trace. Test Plan: `check-llvm` llvm-svn: 325529
*	[X86] Stop swapping the operands of AVX512 setge.	Craig Topper	2018-02-19	1	-2/+2
\| \| \| \| \| \|	We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap. llvm-svn: 325527
*	[X86] Reduce the number of isel pattern variations needed for ↵	Craig Topper	2018-02-19	2	-16/+32
\| \| \| \| \| \| \| \| \| \|	VPTESTM/VPTESTNM matching. Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node. This removes over 24000 bytes from the isel table. llvm-svn: 325526
*	bitcode support change for fast flags compatibility	Steven Wu	2018-02-19	2	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The discussion and as per need, each vendor needs a way to keep the old fast flags and the new fast flags in the auto upgrade path of the IR upgrader. This revision addresses that issue. Patched by Michael Berg Reviewers: qcolombet, hans, steven_wu Reviewed By: qcolombet, steven_wu Subscribers: dexonsmith, vsk, mehdi_amini, andrewrk, MatzeB, wristow, spatel Differential Revision: https://reviews.llvm.org/D43253 llvm-svn: 325525
*	[AMDGPU] Make note of existing waitcnt instrs; this is add-on work related ↵	Mark Searles	2018-02-19	1	-18/+16
\| \| \| \| \| \|	to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up. llvm-svn: 325524
*	[SelectionDAG] ComputeKnownBits - add support for SMIN+SMAX clamp patterns	Simon Pilgrim	2018-02-19	1	-5/+32
\| \| \| \| \| \| \| \| \| \|	If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants. ComputeKnownBits equivalent of D43338. Differential Revision: https://reviews.llvm.org/D43463 llvm-svn: 325521
*	[AMDGPU] Increased vector length for global/constant loads.	Mark Searles	2018-02-19	2	-2/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518
*	[CodeGen] Refactor AppleAccelTable	Pavel Labath	2018-02-19	3	-155/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit separates the abstract accelerator table data structure from the code for writing out an on-disk representation of a specific accelerator table format. The idea is that former (now called AccelTable<T>) can be reused for the DWARF v5 accelerator tables as-is, without any further customizations. Some bits of the emission code (now living in the EmissionContext class) can be reused for DWARF v5 as well, but the subtle differences in the layout of various subtables mean the sharing is not always possible. (Also, the individual emit*** functions are fairly simple so there's a tradeoff between making a bigger general-purpose function, and two smaller targeted functions.) Another advantage of this setup is that more of the serialization logic can be hidden in the .cpp file -- I have moved declarations of the header and all the emission functions there. Reviewers: JDevlieghere, aprantl, probinson, dblaikie Subscribers: echristo, clayborg, vleschuk, llvm-commits Differential Revision: https://reviews.llvm.org/D43285 llvm-svn: 325516
*	Bring back r323297.	Rafael Espindola	2018-02-19	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was reverted because it broke the grub build. The reason the grub build broke is because grub does its own relocation processing and was not handing R_386_PLT32. Since grub has no dynamic linker, the fix is trivial: handle R_386_PLT32 exactly like R_386_PC32. On the report it was noted that they are using -fno-integrated-assembler. The upstream GAS (starting with 451875b4f976a527395e9303224c7881b65e12ed) will already be producing a R_386_PLT32 anyway, so they have to update their code one way or the other Original message: Don't assume a null GV is local for ELF and MachO. This is already a simplification, and should help with avoiding a plt reference when calling an intrinsic with -fno-plt. With this change we return false for null GVs, so the caller only needs to check the new metadata to decide if it should use foo@plt or *foo@got. llvm-svn: 325514
*	[CodeGen] Fix tests breaking after r325505	Francis Visoiu Mistrih	2018-02-19	1	-2/+0
\| \| \| \|	llvm-svn: 325512
*	[ThinLTO] Add GraphTraits for FunctionSummaries	Charles Saternos	2018-02-19	3	-1/+32
\| \| \| \| \| \| \| \|	Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes. Third attempt - moved function from lambda to static function due to build failures. llvm-svn: 325506