bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombine] Enable more pre-indexed stores	Sam Parker	2019-01-23	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933
*	[SLH] AArch64: correctly pick temporary register to mask SP	Kristof Beyls	2019-01-23	1	-57/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930
*	[SystemZ] Handle DBG_VALUE instructions in two places in backend.	Jonas Paulsson	2019-01-23	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two backend optimizations failed to handle cases when compiled with -g, due to failing to consider DBG_VALUE instructions. This was in SystemZTargetLowering::emitSelect() and SystemZElimCompare::getRegReferences(). This patch makes sure that DBG_VALUEs are recognized so that they do not affect these optimizations. Tests for branch-on-count, load-and-trap and consecutive selects. Review: Ulrich Weigand https://reviews.llvm.org/D57048 llvm-svn: 351928
*	[IRCE] Support narrow latch condition for wide range checks	Max Kazantsev	2019-01-23	1	-11/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch relaxes restrictions on types of latch condition and range check. In current implementation, they should match. This patch allows to handle wide range checks against narrow condition. The motivating example is the following: int N = ... for (long i = 0; (int) i < N; i++) { if (i >= length) deopt; } In this patch, the option that enables this support is turned off by default. We'll wait until it is switched to true. Differential Revision: https://reviews.llvm.org/D56837 Reviewed By: reames llvm-svn: 351926
*	[Pipeliner] Add two pragmas to control software pipelining optimization	Brendon Cahoon	2019-01-23	1	-7/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#pragma clang loop pipeline(disable) Disable SWP optimization for the next loop. “disable” is the only possible value. #pragma clang loop pipeline_initiation_interval(number) Set value of initiation interval for SWP optimization to specified number value for the next loop. Number is the positive value greater than 0. These pragmas could be used for debugging or reducing compile time purposes. It is possible to disable SWP for concrete loops to save compilation time or to find bugs by not doing SWP to certain loops. It is possible to set value of initiation interval to concrete number to save compilation time by not doing extra pipeliner passes or to check created schedule for specific initiation interval. That is llvm part of the fix Clang part of fix: https://reviews.llvm.org/D55710 Patch by Alexey Lapshin! Differential Revision: https://reviews.llvm.org/D56403 llvm-svn: 351923
*	hwasan: Move memory access checks into small outlined functions on aarch64.	Peter Collingbourne	2019-01-23	5	-24/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920
*	MemoryBlock: Do not automatically extend a given size to a multiple of page ↵	Rui Ueyama	2019-01-23	2	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size. Previously, MemoryBlock automatically extends a requested buffer size to a multiple of page size because (I believe) doing it was thought to be harmless and with that you could get more memory (on average 2KiB on 4KiB-page systems) "for free". That programming interface turned out to be error-prone. If you request N bytes, you usually expect that a resulting object returns N for `size()`. That's not the case for MemoryBlock. Looks like there is only one place where we take the advantage of allocating more memory than the requested size. So, with this patch, I simply removed the automatic size expansion feature from MemoryBlock and do it on the caller side when needed. MemoryBlock now always returns a buffer whose size is equal to the requested size. Differential Revision: https://reviews.llvm.org/D56941 llvm-svn: 351916
*	[CodeView] Allow empty types in member functions	Josh Stone	2019-01-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: `CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void` return type if the function's type array was empty. After D54667, it started blindly indexing the 0th item for the return type, which fails in `getOperand` for empty arrays if assertions are enabled. This patch restores the `Void` return type for empty type arrays, and adds a test generated by Rust in line-only debuginfo mode. Reviewers: zturner, rnk Reviewed By: rnk Subscribers: hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D57070 llvm-svn: 351910
*	Fixed isReMaterializable setting for LUI instruction.	Ana Pazos	2019-01-22	1	-1/+2
\| \| \| \|	llvm-svn: 351895
*	[HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC	Vedant Kumar	2019-01-22	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	The splitting pass does not need BFI unless the Module actually has a profile summary. Do not calcualte BFI unless the summary is present. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.4% of the total to under 0.1%. llvm-svn: 351894
*	[HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC	Vedant Kumar	2019-01-22	1	-25/+21
\| \| \| \| \| \| \| \| \| \|	The splitting pass does not need (post)domtrees until after it's found a cold block. Defer domtree calculation until a cold block is found. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.8% of the total to 0.4%. llvm-svn: 351892
*	[LegalizeTypes] Add debug prints to the top of PromoteFloatOperand and ↵	Craig Topper	2019-01-22	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	PromoteFloatResult. Also add debug prints in the default case of the switches in these routines. Most if not all of the type legalization handlers already do this so this makes promoting floats consistent llvm-svn: 351890
*	AMDGPU/GlobalISel: Start selectively legalizing 16-bit operations	Matt Arsenault	2019-01-22	1	-4/+9
\| \| \| \| \| \| \| \|	It might be a bit nicer to use the fancy .legalIf and co. predicates, but this was requiring more boilerplate and disables the coverage assertions. llvm-svn: 351886
*	AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts	Matt Arsenault	2019-01-22	2	-2/+5
\| \| \| \|	llvm-svn: 351884
*	FileOutputBuffer: handle mmap(2) failure	Rui Ueyama	2019-01-22	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the underlying filesystem does not support mmap system call, FileOutputBuffer may fail when it attempts to mmap an output temporary file. This patch handles such situation. Unfortunately, it looks like it is very hard to test this functionality without a filesystem that doesn't support mmap using llvm-lit. I tested this locally by passing an invalid parameter to mmap so that it fails and falls back to the in-memory buffer. Maybe that's all what we can do. I believe it is reasonable to submit this without a test. Differential Revision: https://reviews.llvm.org/D56949 llvm-svn: 351883
*	GlobalISel: Allow shift amount to be a different type	Matt Arsenault	2019-01-22	8	-48/+115
\| \| \| \| \| \| \| \| \|	For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
*	[FileCheck] Suppress old -v/-vv diags if dumping input	Joel E. Denny	2019-01-22	1	-17/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old diagnostic form of the trace produced by -v and -vv looks like: ``` check1:1:8: remark: CHECK: expected string found in input CHECK: abc ^ <stdin>:1:3: note: found here ; abc def ^~~ ``` When dumping annotated input is requested (via -dump-input), I find that this old trace is not useful and is sometimes harmful: 1. The old trace is mostly redundant because the same basic information also appears in the input dump's annotations. 2. The old trace buries any error diagnostic between it and the input dump, but I find it useful to see any error diagnostic up front. 3. FILECHECK_OPTS=-dump-input=fail requests annotated input dumps only for failed FileCheck calls. However, I have to also add -v or -vv to get a full set of annotations, and that can produce massive output from all FileCheck calls in all tests. That's a real problem when I run this in the IDE I use, which grinds to a halt as it tries to capture all that output. When -dump-input=fail\|always, this patch suppresses the old trace from -v or -vv. Error diagnostics still print as usual. If you want the old trace, perhaps to see variable expansions, you can set -dump-input=none (the default). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D55825 llvm-svn: 351881
*	GlobalISel: Make buildConstant handle vectors	Matt Arsenault	2019-01-22	1	-4/+38
\| \| \| \| \| \| \|	Produce a splat build_vector similar to how SelectionDAG::getConstant does. llvm-svn: 351880
*	GlobalISel: Implement widen for extract_vector_elt elt type	Matt Arsenault	2019-01-22	2	-4/+32
\| \| \| \|	llvm-svn: 351871
*	GlobalISel: Implement fewerElementsVector for basic FP ops	Matt Arsenault	2019-01-22	2	-27/+65
\| \| \| \|	llvm-svn: 351866
*	Add missing include (cstdlib) to Demangle.h	Konstantin Zhuravlyov	2019-01-22	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57035 llvm-svn: 351861
*	AMDGPU/GlobalISel: Remove vectors from legal constant types	Matt Arsenault	2019-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 351859
*	GlobalISel: Support narrowing zextload/sextload	Matt Arsenault	2019-01-22	2	-0/+45
\| \| \| \|	llvm-svn: 351856
*	[SelectionDAGBuilder] Defer C_Register Assignments to be in line with	Nirav Dave	2019-01-22	1	-13/+3
\| \| \| \| \| \|	those of C_RegisterClass. NFCI. llvm-svn: 351854
*	GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANT	Matt Arsenault	2019-01-22	2	-4/+12
\| \| \| \|	llvm-svn: 351853
*	FileOutputBuffer: Handle "-" as stdout.	Rui Ueyama	2019-01-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was honestly a bit surprised that we didn't do this before. This patch is to handle "-" as the stdout so that if you pass `-o -` to lld, for example, it writes an output to stdout instead of file `-`. I thought that we might want to handle this at a higher level than FileOutputBuffer, because if we land this patch, we can no longer create a file whose name is `-` (there's a workaround though; you can pass `./-` instead of `-`). However, because raw_fd_ostream already handles `-` as a special file name, I think it's okay and actually consistent to handle `-` as a special name in FileOutputBuffer. Differential Revision: https://reviews.llvm.org/D56940 llvm-svn: 351852
*	Codegen support for atomicrmw fadd/fsub	Matt Arsenault	2019-01-22	11	-17/+63
\| \| \| \|	llvm-svn: 351851
*	Reapply "IR: Add fp operations to atomicrmw"	Matt Arsenault	2019-01-22	10	-14/+76
\| \| \| \| \| \| \|	This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850
*	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.	Alexey Bataev	2019-01-22	3	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enable full support for the debug info. Reviewers: echristo Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D46189 llvm-svn: 351846
*	Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG"	Jordan Rupprecht	2019-01-22	1	-1/+1
\| \| \| \| \| \|	This is still causing compilation crashes in some targets. Will follow up shortly with a repro. llvm-svn: 351845
*	[DEBUG_INFO, NVPTX] Fix relocation info.	Alexey Bataev	2019-01-22	4	-22/+57
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Initial function labels must follow the debug location for the correct relocation info generation. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45784 llvm-svn: 351843
*	[DAGCombiner] narrow vector binop with 2 insert subvector operands	Sanjay Patel	2019-01-22	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825
*	[RISCV][NFC] Change naming scheme for RISC-V specific DAG nodes	Alex Bradbury	2019-01-22	1	-43/+50
\| \| \| \| \| \| \| \| \| \|	Previously we had names like 'Call' or 'Tail'. This potentially clashes with the naming scheme used elsewhere in RISCVInstrInfo.td. Many other backends would use names like AArch64call or PPCtail. I prefer the SystemZ approach, which uses prefixed all-lowercase names. This matches the naming scheme used for target-independent SelectionDAG nodes. llvm-svn: 351823
*	[X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))	Simon Pilgrim	2019-01-22	1	-0/+70
\| \| \| \| \| \| \| \| \| \|	For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations. This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns. Differential Revision: https://reviews.llvm.org/D55935 llvm-svn: 351819
*	[X86][BtVer2] SSE2 vector shifts has local forwarding disabled	Simon Pilgrim	2019-01-22	1	-2/+2
\| \| \| \| \| \| \| \|	Similar to horizontal ops on D56777, the sse2 (but not mmx) bit shift ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57026 llvm-svn: 351817
*	Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI.	Simon Pilgrim	2019-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 351816
*	[X86][BtVer2] X86ISD::VPERMILPV has local forwarding disabled	Simon Pilgrim	2019-01-22	1	-2/+2
\| \| \| \| \| \| \| \|	Similar to horizontal ops on D56777, the vpermilpd/vpermilps variable mask ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57022 llvm-svn: 351815
*	[CostModel][X86] Add ICMP Predicate specific costs	Simon Pilgrim	2019-01-22	1	-8/+49
\| \| \| \| \| \| \| \|	First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810
*	[X86][SSE] Add selective commutation support for insertps (PR40340)	Simon Pilgrim	2019-01-22	3	-0/+24
\| \| \| \| \| \| \| \|	When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding. Differential Revision: https://reviews.llvm.org/D56843 llvm-svn: 351807
*	[RISCV] Quick fix for PR40333	Alex Bradbury	2019-01-22	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid the infinite loop caused by the target DAG combine converting ANYEXT to SIGNEXT and the target-independent DAG combine logic converting back to ANYEXT. Do this by not adding the new node to the worklist. Committing directly as this definitely doesn't make the problem any worse, and I intend to follow-up with a patch that avoids this custom combiner logic altogether and just lowers the i32 operations to a target-specific SelectionDAG node. This should be easier to reason about and improve codegen quality in some cases (though may miss out on some later DAG combines). llvm-svn: 351806
*	[LoopPredication] Support guards expressed as branches by widenable condition	Max Kazantsev	2019-01-22	1	-4/+60
\| \| \| \| \| \| \| \| \| \|	This patch adds support of guards expressed as branches by widenable conditions in Loop Predication. Differential Revision: https://reviews.llvm.org/D56081 Reviewed By: reames llvm-svn: 351805
*	[NFC] Add function to parse widenable conditional branches	Max Kazantsev	2019-01-22	1	-17/+14
\| \| \| \|	llvm-svn: 351803
*	[X86] HADDPS/HADDPD scalar lowering was added at rL350421	Simon Pilgrim	2019-01-22	1	-12/+0
\| \| \| \|	llvm-svn: 351797
*	Revert r351778: IR: Add fp operations to atomicrmw	Chandler Carruth	2019-01-22	10	-76/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796
*	[llvm-symbolizer] Add support for --basenames/-s	James Henderson	2019-01-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40068. --basenames is a GNU addr2line switch which strips the directory names from the file path in the output. Reviewed by: ruiu Differential Revision: https://reviews.llvm.org/D56919 llvm-svn: 351795
*	[NFC] Factor out some reusable logic	Max Kazantsev	2019-01-22	1	-15/+21
\| \| \| \|	llvm-svn: 351794
*	[NFC] Add detector for guards expressed as branch by widenable conditions	Max Kazantsev	2019-01-22	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a function to detect guards expressed in explicit control flow form as branch by `and` with widenable condition intrinsic call: %wc = call i1 @llvm.experimental.widenable.condition() %guard_cond = and i1, %some_cond, %wc br i1 %guard_cond, label %guarded, label %deopt deopt: <maybe some non-side-effecting instructions> deoptimize() This form can be used as alternative to implicit control flow guard representation expressed by `experimental_guard` intrinsic. Differential Revision: https://reviews.llvm.org/D56074 Reviewed By: reames llvm-svn: 351791
*	[RISCV][NFC] Add break to case statement in RISCVDAGToDAGISel::Select	Alex Bradbury	2019-01-22	1	-0/+1
\| \| \| \| \| \| \|	The break isn't strictly needed yet as there is no subsequent entry in the case. But adding to prevent mistakes further down the road. llvm-svn: 351785
*	[RISCV] Fix build after r351778	Alex Bradbury	2019-01-22	1	-3/+6
\| \| \| \| \| \| \|	Also add a comment to explain the expansion strategy for atomicrmw {fadd,fsub}. llvm-svn: 351782
*	IR: Add fp operations to atomicrmw	Matt Arsenault	2019-01-22	10	-14/+73
\| \| \| \| \| \|	Add just fadd/fsub for now. llvm-svn: 351778