bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[mips] Sign extend i32 return values on MIPS64	Stefan Maksimovic	2018-07-26	4	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \|	Override getTypeForExtReturn so that functions returning an i32 typed value have it sign extended on MIPS64. Also provide patterns to get rid of unneeded sign extensions for arithmetic instructions which implicitly sign extend their results. Differential Revision: https://reviews.llvm.org/D48374 llvm-svn: 338019
*	[x86/SLH] Extract the logic to trace predicate state through calls to	Chandler Carruth	2018-07-26	1	-19/+39
\| \| \| \| \| \| \| \| \| \|	a helper function with a nice overview comment. NFC. This is a preperatory refactoring to implementing another component of mitigation here that was descibed in the design document but hadn't been implemented yet. llvm-svn: 338016
*	[AArch64] Armv8.2-A: add the crypto extensions	Sjoerd Meijer	2018-07-26	3	-5/+192
\| \| \| \| \| \| \| \| \|	This adds MC support for the crypto instructions that were made optional extensions in Armv8.2-A (AArch64 only). Differential Revision: https://reviews.llvm.org/D49370 llvm-svn: 338010
*	[X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner ↵	Craig Topper	2018-07-26	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	worklist in combineMul. I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited. So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations. Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now. llvm-svn: 338007
*	[X86] Remove some unnecessary explicit calls to DCI.AddToWorkList.	Craig Topper	2018-07-26	1	-10/+0
\| \| \| \| \| \|	These calls were making sure some newly created nodes were added to worklist, but the DAGCombiner has internal support for ensuring it has visited all nodes. Any time it visits a node it ensures the operands have been queued to be visited as well. This means if we only need to return the last new node. The DAGCombiner will take care of adding its inputs thus walking backwards through all the new nodes. llvm-svn: 337996
*	CodeGen: Cleanup regmask construction; NFC	Matthias Braun	2018-07-26	1	-3/+3
\| \| \| \| \| \| \| \| \|	- Avoid duplication of regmask size calculation. - Simplify allocateRegisterMask() call. - Rename allocateRegisterMask() to allocateRegMask() to be consistent with naming in MachineOperand. llvm-svn: 337986
*	bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order	Yonghong Song	2018-07-25	9	-8/+255
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977
*	Add missing 'override', fixing compilation with some compilers since SVN r337950	Martin Storsjo	2018-07-25	1	-1/+1
\| \| \| \|	llvm-svn: 337952
*	[COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter	Martin Storsjo	2018-07-25	5	-30/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950
*	[ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.	Eli Friedman	2018-07-25	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \|	Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945
*	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion	Stanislav Mekhanoshin	2018-07-25	1	-13/+21
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938
*	[Hexagon] Properly scale bit index when extracting elements from vNi1	Krzysztof Parzyszek	2018-07-25	1	-1/+3
\| \| \| \| \| \| \| \|	For example v = <2 x i1> is represented as bbbbaaaa in a predicate register, where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4 from the predicate register. llvm-svn: 337934
*	[MIPS GlobalISel] Lower pointer arguments	Petar Jovanovic	2018-07-25	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Add support for lowering pointer arguments. Changing type from pointer to integer is already done in MipsTargetLowering::getRegisterTypeForCallingConv. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49419 llvm-svn: 337912
*	[SystemZ] Use tablegen loops in SchedModels	Jonas Paulsson	2018-07-25	5	-229/+98
\| \| \| \| \| \| \| \| \| \|	NFC changes to make scheduler TableGen files more readable, by using loops instead of a lot of similar defs with just e.g. a latency value that changes. https://reviews.llvm.org/D49598 Review: Ulrich Weigand, Javed Abshar llvm-svn: 337909
*	[x86/SLH] Sink the return hardening into the main block-walk + hardening	Chandler Carruth	2018-07-25	1	-26/+17
\| \| \| \| \| \| \| \| \| \| \|	code. This consolidates all our hardening calls, and simplifies the code a bit. It seems much more clear to handle all of these together. No functionality changed here. llvm-svn: 337895
*	[x86/SLH] Improve name and comments for the main hardening function.	Chandler Carruth	2018-07-25	1	-174/+190
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function actually does two things: it traces the predicate state through each of the basic blocks in the function (as that isn't directly handled by the SSA updater) and it hardens everything necessary in the block as it goes. These need to be done together so that we have the currently active predicate state to use at each point of the hardening. However, this also made obvious that the flag to disable actual hardening of loads was flawed -- it also disabled tracing the predicate state across function calls within the body of each block. So this patch sinks this debugging flag test to correctly guard just the hardening of loads. Unless load hardening was disabled, no functionality should change with tis patch. llvm-svn: 337894
*	[mips] Replace custom parsing logic for data directives by the ↵	Simon Atanasyan	2018-07-25	3	-42/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`addAliasForDirective` The target independent AsmParser doesn't recognise .hword, .word, .dword which are required for Mips. Currently MipsAsmParser recognises these through dispatch to MipsAsmParser::parseDataDirective. This contains equivalent logic to AsmParser::parseDirectiveValue. This patch allows reuse of AsmParser::parseDirectiveValue by making use of addAliasForDirective to support .hword, .word and .dword. Original patch provided by Alex Bradbury at D47001 was modified to fix handling of microMIPS symbols. The `AsmParser::parseDirectiveValue` calls either `EmitIntValue` or `EmitValue`. In this patch we override `EmitIntValue` in the `MipsELFStreamer` to clear a pending set of microMIPS symbols. Differential revision: https://reviews.llvm.org/D49539 llvm-svn: 337893
*	[X86] Use X86ISD::MUL_IMM instead of ISD::MUL for multiply we intend to be ↵	Craig Topper	2018-07-25	1	-1/+2
\| \| \| \| \| \| \| \|	selected to LEA. This prevents other combines from possibly disturbing it. llvm-svn: 337890
*	[x86/SLH] Teach the x86 speculative load hardening pass to harden	Chandler Carruth	2018-07-25	1	-0/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	against v1.2 BCBS attacks directly. Attacks using spectre v1.2 (a subset of BCBS) are described in the paper here: https://people.csail.mit.edu/vlk/spectre11.pdf The core idea is to speculatively store over the address in a vtable, jumptable, or other target of indirect control flow that will be subsequently loaded. Speculative execution after such a store can forward the stored value to subsequent loads, and if called or jumped to, the speculative execution will be steered to this potentially attacker controlled address. Up until now, this could be mitigated by enableing retpolines. However, that is a relatively expensive technique to mitigate this particular flavor. Especially because in most cases SLH will have already mitigated this. To fully mitigate this with SLH, we need to do two core things: 1) Unfold loads from calls and jumps, allowing the loads to be post-load hardened. 2) Force hardening of incoming registers even if we didn't end up needing to harden the load itself. The reason we need to do these two things is because hardening calls and jumps from this particular variant is importantly different from hardening against leak of secret data. Because the "bad" data here isn't a secret, but in fact speculatively stored by the attacker, it may be loaded from any address, regardless of whether it is read-only memory, mapped memory, or a "hardened" address. The only 100% effective way to harden these instructions is to harden the their operand itself. But to the extent possible, we'd like to take advantage of all the other hardening going on, we just need a fallback in case none of that happened to cover the particular input to the control transfer instruction. For users of SLH, currently they are paing 2% to 6% performance overhead for retpolines, but this mechanism is expected to be substantially cheaper. However, it is worth reminding folks that this does not mitigate all of the things retpolines do -- most notably, variant #2 is not in any way mitigated by this technique. So users of SLH may still want to enable retpolines, and the implementation is carefuly designed to gracefully leverage retpolines to avoid the need for further hardening here when they are enabled. Differential Revision: https://reviews.llvm.org/D49663 llvm-svn: 337878
*	[X86] Use a shift plus an lea for multiplying by a constant that is a power ↵	Craig Topper	2018-07-25	1	-0/+18
\| \| \| \| \| \| \| \|	of 2 plus 2/4/8. The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2. llvm-svn: 337875
*	[X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we ↵	Craig Topper	2018-07-25	1	-11/+15
\| \| \| \| \| \|	do for pow2 - 2. llvm-svn: 337874
*	[X86] Use a two lea sequence for multiply by 37, 41, and 73.	Craig Topper	2018-07-24	1	-0/+9
\| \| \| \| \| \|	These fit a pattern used by 11, 21, and 19. llvm-svn: 337871
*	[X86] Change multiply by 26 to use two multiplies by 5 and an add instead of ↵	Craig Topper	2018-07-24	1	-7/+7
\| \| \| \| \| \| \| \|	multiply by 3 and 9 and a subtract. Same number of operations, but ending in an add is friendlier due to it being commutable. llvm-svn: 337869
*	[X86] When expanding a multiply by a negative of one less than a power of 2, ↵	Craig Topper	2018-07-24	1	-10/+12
\| \| \| \| \| \| \| \| \| \|	like 31, don't generate a negate of a subtract that we'll never optimize. We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway. This patch makes use explicitly emit the swapped subtract. llvm-svn: 337858
*	[X86] Generalize the multiply by 30 lowering to generic multipy by power 2 ↵	Craig Topper	2018-07-24	1	-15/+10
\| \| \| \| \| \| \| \| \| \|	minus 2. Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA. Use this for multiply by 14 as well. llvm-svn: 337856
*	[X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.	Craig Topper	2018-07-24	1	-2/+2
\| \| \| \| \| \|	The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub. llvm-svn: 337851
*	[MachineOutliner][NFC] Move target frame info into OutlinedFunction	Jessica Paquette	2018-07-24	4	-21/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just some gardening here. Similar to how we moved call information into Candidates, this moves outlined frame information into OutlinedFunction. This allows us to remove TargetCostInfo entirely. Anywhere where we returned a TargetCostInfo struct, we now return an OutlinedFunction. This establishes OutlinedFunctions as more of a general repeated sequence, and Candidates as occurrences of those repeated sequences. llvm-svn: 337848
*	Put "built-in" function definitions in global Used list, for LTO. (fix bug ↵	Peter Collingbourne	2018-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	34169) When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location. Fixes PR34169 Patch by Caroline Tice! Differential Revision: https://reviews.llvm.org/D49434 llvm-svn: 337847
*	[x86] Teach the x86 backend that it can fold between TCRETURNm* and ↵	Chandler Carruth	2018-07-24	2	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCRETURNr* and fix latent bugs with register class updates. Summary: Enabling this fully exposes a latent bug in the instruction folding: we never update the register constraints for the register operands when fusing a load into another operation. The fused form could, in theory, have different register constraints on its operands. And in fact, TCRETURNm* needs its memory operands to use tailcall compatible registers. I've updated the folding code to re-constrain all the registers after they are mapped onto their new instruction. However, we still can't enable folding in the general case from TCRETURNr* to TCRETURNm* because doing so may require more registers to be available during the tail call. If the call itself uses all but one register, and the folded load would require both a base and index register, there will not be enough registers to allocate the tail call. It would be better, IMO, to teach the register allocator to unfold TCRETURNm* when it runs out of registers (or specifically check the number of registers available during the TCRETURNr) but I'm not going to try and solve that for now. Instead, I've just blocked the forward folding from r -> m, leaving LLVM free to unfold from m -> r as that doesn't introduce new register pressure constraints. The down side is that I don't have anything that will directly exercise this. Instead, I will be immediately using this it my SLH patch. =/ Still worse, without allowing the TCRETURNr -> TCRETURNm* fold, I don't have any tests that demonstrate the failure to update the memory operand register constraints. This patch still seems correct, but I'm nervous about the degree of testing due to this. Suggestions? Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49717 llvm-svn: 337845
*	[MachineOutliner][NFC] Make Candidates own their call information	Jessica Paquette	2018-07-24	4	-41/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	Before this, TCI contained all the call information for each Candidate. This moves that information onto the Candidates. As a result, each Candidate can now supply how it ought to be called. Thus, Candidates will be able to, say, call the same function in cheaper ways when possible. This also removes that information from TCI, since it's no longer used there. A follow-up patch for the AArch64 outliner will demonstrate this. llvm-svn: 337840
*	[mips] Fix local dynamic TLS with Sym64	Simon Atanasyan	2018-07-24	6	-22/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827
*	[x86/SLH] Extract the core register hardening logic to a low-level	Chandler Carruth	2018-07-24	1	-36/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	helper and restructure the post-load hardening to use this. This isn't as trivial as I would have liked because the post-load hardening used a trick that only works for it where it swapped in a temporary register to the load rather than replacing anything. However, there is a simple way to do this without that trick that allows this to easily reuse a friendly API for hardening a value in a register. That API will in turn be usable in subsequent patcehs. This also techincally changes the position at which we insert the subreg extraction for the predicate state, but that never resulted in an actual instruction and so tests don't change at all. llvm-svn: 337825
*	[x86/SLH] Tidy up a comment, using doxygen structure and wording it to	Chandler Carruth	2018-07-24	1	-5/+7
\| \| \| \| \| \|	be more accurate and understandable. llvm-svn: 337822
*	[ARM] Disable ARMCodeGenPrepare by default	Sam Parker	2018-07-24	1	-1/+1
\| \| \| \| \| \| \| \|	ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
*	AMDGPU/GlobalISel: Legalize G_INSERT	Tom Stellard	2018-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
*	AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACT	Tom Stellard	2018-07-24	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794
*	[x86/SLH] Simplify the code for hardening a loaded value. NFC.	Chandler Carruth	2018-07-24	1	-20/+15
\| \| \| \| \| \| \|	This is in preparation for extracting this into a re-usable utility in this code. llvm-svn: 337785
*	[x86/SLH] Remove complex SHRX-based post-load hardening.	Chandler Carruth	2018-07-24	1	-73/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code was really nasty, had several bugs in it originally, and wasn't carrying its weight. While on Zen we have all 4 ports available for SHRX, on all of the Intel parts with Agner's tables, SHRX can only execute on 2 ports, giving it 1/2 the throughput of OR. Worse, all too often this pattern required two SHRX instructions in a chain, hurting the critical path by a lot. Even if we end up needing to safe/restore EFLAGS, that is no longer so bad. We pay for a uop to save the flag, but we very likely get fusion when it is used by forming a test/jCC pair or something similar. In practice, I don't expect the SHRX to be a significant savings here, so I'd like to avoid the complex code required. We can always resurrect this if/when someone has a specific performance issue addressed by it. llvm-svn: 337781
*	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes	Martin Storsjo	2018-07-23	2	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755
*	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵	Reid Kleckner	2018-07-23	6	-29/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
*	[Hexagon] Handle unnamed globals in HexagonConstExpr	Krzysztof Parzyszek	2018-07-23	1	-3/+15
\| \| \| \| \| \|	Instead of comparing names, compare positions in the parent module. llvm-svn: 337723
*	[ARM] Use unique_ptr to fix memory leak introduced in r337701	Fangrui Song	2018-07-23	1	-11/+9
\| \| \| \|	llvm-svn: 337714
*	OpChain has subclasses, so add a virtual destructor.	Jordan Rupprecht	2018-07-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713
*	[ARM] Follow-up to r337709.	Matt Morehouse	2018-07-23	1	-2/+0
\| \| \| \| \| \|	Fix double-free. llvm-svn: 337711
*	[ARM] Add doFinalization() to ARMCodeGenPrepare pass.	Matt Morehouse	2018-07-23	1	-0/+6
\| \| \| \| \| \| \|	Attempt to fix the leak introduced in r337687 and make sanitizer buildbots green again. llvm-svn: 337709
*	[ARM][NFC] ParallelDSP reorganisation	Sam Parker	2018-07-23	1	-88/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701
*	[SystemZ] Fix dumpSU() method in SystemZHazardRecognizer.	Jonas Paulsson	2018-07-23	1	-1/+5
\| \| \| \| \| \| \| \|	Two minor issues: The new MCD SchedWrite name does not contain "Unit" like all the others, so a check is needed. Also, print "LSU" instead of "LS". Review: Ulrich Weigand llvm-svn: 337700
*	[ARM] ARMCodeGenPrepare backend pass	Sam Parker	2018-07-23	4	-0/+757
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687
*	[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on ↵	Roman Lebedev	2018-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676
*	[x86/SLH] Fix a bug where we would harden tail calls twice -- once as	Chandler Carruth	2018-07-23	1	-1/+5
\| \| \| \| \| \| \| \| \|	a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673