bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[opt] Change the parameter of OptTable::PrintHelp from Name to Usage and ↵	Fangrui Song	2018-10-10	3	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	don't append "[options] <inputs>" Summary: Before, "[options] <inputs>" is unconditionally appended to the `Name` parameter. It is more flexible to change its semantic to `Usage` and let user customize the usage line. % llvm-objcopy ... USAGE: llvm-objcopy <input> [ <output> ] [options] <inputs> With this patch: % llvm-objcopy ... USAGE: llvm-objcopy input [output] Reviewers: rupprecht, alexshap, jhenderson Reviewed By: rupprecht Subscribers: jakehehrlich, mehdi_amini, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51009 llvm-svn: 344097
*	[WebAssembly] Handle V128 register class in explicit locals pass	Thomas Lively	2018-10-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also add tests to catch crashes in passes that are not normally run in tests. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52959 llvm-svn: 344094
*	[DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops	Nemanja Ivanovic	2018-10-09	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093
*	[X86] Fix sanitizer bot failure from 344085	Rong Xu	2018-10-09	1	-2/+3
\| \| \| \| \| \|	Fix the memory issue exposed by sanitizer. llvm-svn: 344092
*	[WebAssembly] Improve readability of SIMD instructions (NFC)	Heejin Ahn	2018-10-09	3	-442/+586
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Categorize instructions into the categories as in the SIMD spec - Move SIMD-related definition to WebAssemblyInstrSIMD.td - Put definition and use of patterns together - Add newlines here and there Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53045 llvm-svn: 344086
*	Recommit r343993: [X86] condition branches folding for three-way conditional ↵	Rong Xu	2018-10-09	6	-0/+601
\| \| \| \| \| \| \| \|	codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085
*	[FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros	Cameron McInally	2018-10-09	1	-4/+2
\| \| \| \| \| \|	https://reviews.llvm.org/D52934 llvm-svn: 344084
*	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization	Sanjay Patel	2018-10-09	3	-4/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082
*	[PDB] Fix another bug in globals stream name lookup.	Zachary Turner	2018-10-09	1	-9/+15
\| \| \| \| \| \| \| \|	When we're on the last bucket the computation is tricky. We were failing when the last bucket contained multiple matches. Added a new test for this. llvm-svn: 344081
*	[ORC] Promote and rename private symbols inside the CompileOnDemand layer,	Lang Hames	2018-10-09	4	-8/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than require them to have been promoted before being passed in. Dropping this precondition is better for layer composition (CompileOnDemandLayer was the only one that placed pre-conditions on the modules that could be added). It also means that the promoted private symbols do not show up in the target JITDylib's symbol table. Instead, they are confined to the hidden implementation dylib that contains the actual definitions. For the 403.gcc testcase this cut down the public symbol table size from ~15,000 symbols to ~4000, substantially reducing symbol dependence tracking costs. llvm-svn: 344078
*	[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported	Nemanja Ivanovic	2018-10-09	2	-0/+10
\| \| \| \| \| \| \| \| \| \|	This is the PPC-specific non-controversial part of https://reviews.llvm.org/D44548 that simply enables this combine for PPC since PPC has these instructions. This commit will allow the target-independent portion to be truly target independent. llvm-svn: 344077
*	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits ↵	Craig Topper	2018-10-09	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071
*	[DWARF] Make llvm-dwarfdump display the .debug_loc.dwo section. Fixes PR38991.	Wolfgang Pieb	2018-10-09	1	-5/+4
\| \| \| \| \| \| \| \|	Reviewer: dblaikie Differential Revision: https://reviews.llvm.org/D52444 llvm-svn: 344068
*	[PDB] Fix failure on big endian machines.	Zachary Turner	2018-10-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We changed an ArrayRef<uint8_t> to an ArrayRef<uint32_t>, but it needs to be an ArrayRef<support::ulittle32_t>. We also change ArrayRef<> to FixedStreamArray<>. Technically an ArrayRef<> will work, but it can cause a copy in the underlying implementation if the memory is not contiguous, and there's no reason not to use a FixedStreamArray<>. Thanks to nemanjai@ and thakis@ for helping me track this down and confirm the fix. llvm-svn: 344063
*	[InstCombine] make helper function 'static'; NFC	Sanjay Patel	2018-10-09	1	-2/+2
\| \| \| \|	llvm-svn: 344056
*	[x86] use demanded bits to simplify masked store codegen	Sanjay Patel	2018-10-09	1	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \|	As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048
*	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG and CONCAT_VECTORS support to ↵	Simon Pilgrim	2018-10-09	1	-0/+30
\| \| \| \| \| \| \| \|	SimplifyDemandedBits Fix for AVX1 masked load/store regression on D52964 llvm-svn: 344043
*	[mips] Fix FDE/CFI encoding in case of N32 ABI	Simon Atanasyan	2018-10-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	For O32 and N32 ABI FDE/CFI encoding should be `DW_EH_PE_sdata4` and only N64 ABI uses `DW_EH_PE_sdata8`. To cover all cases this patch check code pointer size and setup a correct FDE/CFI encoding type. Differential revision: https://reviews.llvm.org/D52876 llvm-svn: 344040
*	[mips] Set pointer size to 4 bytes for N32 ABI	Simon Atanasyan	2018-10-09	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF generation. In case of MIPS it's incorrect to check for Triple::isMIPS64() only this function returns true for N32 ABI too. Now we do not have a method to recognize N32 if it's specified by a command line option and is not a part of a target triple. So we check for Triple::GNUABIN32 only. It's better than nothing. Differential revision: https://reviews.llvm.org/D52874 llvm-svn: 344039
*	[PowerPC] Remove self-copies in pre-emit peephole	Nemanja Ivanovic	2018-10-09	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are occasionally instances where AADB rewrites registers in such a way that a reg-reg copy becomes a self-copy. Such an instruction is obviously redundant and can be removed. This patch does precisely that. Note that this will not remove various nop's that we insert (which are themselves just self-copies). The reason those are left alone is that all of them have their own opcodes (that just encode to a self-copy). What prompted this patch is the fact that these self-copies sometimes end up using registers that make the instruction a priority-setting nop, thereby having a significant effect on performance. Differential revision: https://reviews.llvm.org/D52432 llvm-svn: 344036
*	[X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectors	Simon Pilgrim	2018-10-09	1	-15/+34
\| \| \| \| \| \| \| \|	As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019
*	[CFG Printer] Add support for writing the dot files with a custom	Chandler Carruth	2018-10-09	1	-4/+10
\| \| \| \| \| \| \| \| \| \|	prefix. Use this to direct these files to a specific location in the test suite so that we don't write files out to random directories (or fail if the working directory isn't writable). llvm-svn: 344014
*	Make LocationSize a proper Optional type; NFC	George Burgess IV	2018-10-09	4	-24/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second in a series of changes intended to make https://reviews.llvm.org/D44748 more easily reviewable. Please see that patch for more context. The first change being r344012. Since I was requested to do all of this with post-commit review, this is about as small as I can make this patch. This patch makes LocationSize into an actual type that wraps a uint64_t; users are required to call getValue() in order to get the size now. If the LocationSize has an Unknown size (e.g. if LocSize == MemoryLocation::UnknownSize), getValue() will assert. This also adds DenseMap specializations for LocationInfo, which required taking two more values from the set of values LocationInfo can represent. Hence, heavy users of multi-exabyte arrays or structs may observe slightly lower-quality code as a result of this change. The intent is for getValue()s to be very close to a corresponding hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`). Sadly, small diff context appears to crop that out sometimes, and the last change in DSE does require a bit of nonlocal reasoning about control-flow. :/ This also removes an assert, since it's now redundant with the assert in getValue(). llvm-svn: 344013
*	Use locals instead of struct fields; NFC	George Burgess IV	2018-10-09	3	-38/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is one of a series of changes intended to make https://reviews.llvm.org/D44748 more easily reviewable. Please see that patch for more context. Since I was requested to do all of this with post-commit review, this is about as small as I can make it (beyond committing changes to these few files separately, but they're incredibly similar in spirit, so...) On its own, this change doesn't make a great deal of sense. I plan on having a follow-up Real Soon Now(TM) to make the bits here make more sense. :) In particular, the next change in this series is meant to make LocationSize an actual type, which you have to call .getValue() on in order to get at the uint64_t inside. Hence, this change refactors code so that: - we only need to call the soon-to-come getValue() once in most cases, and - said call to getValue() happens very closely to a piece of code that checks if the LocationSize has a value (e.g. if it's != UnknownSize). llvm-svn: 344012
*	llvm-link: Improve diagnostic for module-level metadata mismatch	David Blaikie	2018-10-09	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This might produce hard to read/illegible diagnostics for especially weird/non-trivial module metadata but integers are about all we are using these days, so seems more useful than not. Patch based on work by Kristina Brooks - thanks! Differential Revision: https://reviews.llvm.org/D52952 llvm-svn: 344011
*	ExpandPostRAPseudos: Fix alldefsAreDead() not removing operands	Matthias Braun	2018-10-09	1	-0/+2
\| \| \| \| \| \| \| \|	One case left around nonsensical operands for the KILL instruction which the machine verifier checks for nowadays. While this should not hurt in release builds we should fix the machine verifier errors anyway. llvm-svn: 344008
*	[MIPS GlobalISel] Legalize i64 add	Petar Jovanovic	2018-10-08	2	-1/+50
\| \| \| \| \| \| \| \| \| \|	Custom legalize s64 G_ADD for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D52652 llvm-svn: 344007
*	TwoAddressInstructionPass: Modernize/fix some comments; NFC	Matthias Braun	2018-10-08	1	-5/+5
\| \| \| \|	llvm-svn: 344006
*	PHIElimination: Remove wrong comment; NFC	Matthias Braun	2018-10-08	1	-2/+1
\| \| \| \| \| \| \| \|	The comment was contradicting the code. Looking at history the feature was implemented a day after the comment was written without dropping the comment. llvm-svn: 344005
*	MachineFunctionPrinterPass: Declare SlotIndexes as used if available; NFC	Matthias Braun	2018-10-08	1	-0/+1
\| \| \| \| \| \| \|	This makes print-machineinstrs print the slot indexes in more situations. NFC for normal compilation. llvm-svn: 344004
*	Remove unused variable.	Zachary Turner	2018-10-08	1	-1/+0
\| \| \| \|	llvm-svn: 344002
*	[PDB] fix a bug in global stream name lookup.	Zachary Turner	2018-10-08	1	-5/+8
\| \| \| \| \| \| \|	When we're looking up a record in the last hash bucket chain, we need to be careful with the end-offset calculation. llvm-svn: 344001
*	[X86] Revert r343993 condition branches folding for three-way conditional codes	Rong Xu	2018-10-08	6	-601/+0
\| \| \| \| \| \|	Some buildbots failed. llvm-svn: 343998
*	[DAGCombiner] simplify code for fmul with constant fold; NFCI	Sanjay Patel	2018-10-08	1	-18/+8
\| \| \| \|	llvm-svn: 343997
*	[X86] Prefer isTypeLegal over checking isSimple in a DAG combine.	Craig Topper	2018-10-08	1	-1/+3
\| \| \| \| \| \|	Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future. llvm-svn: 343995
*	[X86] condition branches folding for three-way conditional codes	Rong Xu	2018-10-08	6	-0/+601
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a pass that optimizes condition branches on x86 by taking advantage of the three-way conditional code generated by compare instructions. Currently, it tries to hoisting EQ and NE conditional branch to a dominant conditional branch condition where the same EQ/NE conditional code is computed. An example: bb_0: cmp %0, 19 jg bb_1 jmp bb_2 bb_1: cmp %0, 40 jg bb_3 jmp bb_4 bb_4: cmp %0, 20 je bb_5 jmp bb_6 Here we could combine the two compares in bb_0 and bb_4 and have the following code: bb_0: cmp %0, 20 jg bb_1 jl bb_2 jmp bb_5 bb_1: cmp %0, 40 jg bb_3 jmp bb_6 For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height for bb_6 is also reduced. bb_4 is gone after the optimization. This optimization is motivated by the branch pattern generated by the switch lowering: we always have pivot-1 compare for the inner nodes and we do a pivot compare again the leaf (like above pattern). This pass currently is enabled on Intel's Sandybridge and later arches. Some reviewers pointed out that on some arches (like AMD Jaguar), this pass may increase branch density to the point where it hurts the performance of the branch predictor. Differential Revision: https://reviews.llvm.org/D46662 llvm-svn: 343993
*	[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions	Scott Linder	2018-10-08	4	-108/+293
\| \| \| \| \| \| \| \| \| \| \|	Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Recommits r341413 with changes to update the MachineDominatorTree when present. Differential Revision: https://reviews.llvm.org/D51742 llvm-svn: 343992
*	[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors	Simon Pilgrim	2018-10-08	1	-7/+5
\| \| \| \| \| \| \| \|	Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991
*	[x86] make horizontal binop matching clearer; NFCI	Sanjay Patel	2018-10-08	1	-41/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The instructions are complicated, so this code will probably never be very obvious, but hopefully this makes it better. As shown in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...we need to improve the matching to not miss cases where we're h-opping on 1 source vector, and that should be a small patch after this rearranging. llvm-svn: 343989
*	[TailCallElim] Enable marking of calls with byval as tails	Robert Lougher	2018-10-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r339636 the alias analysis rules were changed with regards to tail calls and byval arguments. Previously, tail calls were assumed not to alias allocas from the current frame. This has been updated, to not assume this for arguments with the byval attribute. This patch aligns TailCallElim with the new rule. Tail marking can now be more aggressive and mark more calls as tails, e.g.: define void @test() { %f = alloca %struct.foo call void @bar(%struct.foo* byval %f) ret void } define void @test2(%struct.foo* byval %f) { call void @bar(%struct.foo* byval %f) ret void } define void @test3(%struct.foo* byval %f) { %agg.tmp = alloca %struct.foo %0 = bitcast %struct.foo* %agg.tmp to i8* %1 = bitcast %struct.foo* %f to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false) call void @bar(%struct.foo* byval %agg.tmp) ret void } The problematic case where a byval parameter is captured by a call is still handled correctly, and will not be marked as a tail (see PR7272). llvm-svn: 343986
*	AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions	Tom Stellard	2018-10-08	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The 32-bit variants do not exist on VI+. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52958 llvm-svn: 343985
*	[AMDGPU] Add an AMDGPU specific atomic optimizer.	Neil Henning	2018-10-08	4	-0/+438
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973
*	[ThinLTO] Keep non-prevailing (linkonce\|weak)_odr symbols live	Xin Tong	2018-10-08	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have a symbol with (linkonce\|weak)_odr linkage, we do not want to dead strip it even it is not prevailing. IR level (linkonce\|weak)_odr symbol can become non-prevailing when we mix ELF objects and IR objects where the (linkonce\|weak)_odr symbol in the ELF object is prevailing and the ones in the IR objects are not. Stripping them will prevent us from doing optimizations with them. By not dead stripping them, We will convert these symbols to available_externally linkage as a result of non-prevailing and eventually dropping them after inlining. I modified cache-prevailing.ll to use linkonce linkage as it is testing whether cache prevailing bit is effective or not, not we should treat linkonce_odr alive or not Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D52893 llvm-svn: 343970
*	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled	Oliver Stannard	2018-10-08	1	-1/+9
\| \| \| \| \| \| \| \| \| \|	When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969
*	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI	Oliver Stannard	2018-10-08	4	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968
*	[AArch64][v8.5A] Branch Target Identification code-generation pass	Oliver Stannard	2018-10-08	4	-0/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967
*	[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM	Alexander Ivchenko	2018-10-08	2	-52/+178
\| \| \| \| \| \| \| \| \| \|	Support G_UDIV/G_UREM/G_SREM. The instruction selection code is taken from FastISel with only minor tweaks to adapt for GlobalISel. Differential Revision: https://reviews.llvm.org/D49781 llvm-svn: 343966
*	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.	Neil Henning	2018-10-08	6	-26/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962
*	[AsmParser] Return an error in the case of empty symbol ref in an expression	Francis Visoiu Mistrih	2018-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following instruction: > str q28, [x0, #164*@] contains a @ which is parsed as an empty symbol. The parser returns true but has no error, so the assembler continues by ignoring the instruction. Differential Revision: https://reviews.llvm.org/D52645 llvm-svn: 343961
*	[ARM] Account for implicit IT when calculating inline asm size	Peter Smith	2018-10-08	2	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated. The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT. Fixes pr31805 Differential Revision: https://reviews.llvm.org/D52834 llvm-svn: 343960