bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.	Alexey Bataev	2019-01-23	4	-12/+18
\| \| \| \| \| \| \| \| \|	Enable full support for the debug info. Recommit to fix the emission of the not required closing brace. Differential revision: https://reviews.llvm.org/D46189 llvm-svn: 351972
*	Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target."	Haojian Wu	2019-01-23	3	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r351846. This patch may generate illegal assembly code, see ``` $ ./bin/clang -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-grtev4-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name new.cc -mrelocation-model pic -pic-level 2 -mthread-model posix -fmerge-all-constants -mdisable-fp-elim -relaxed-aliasing -no-integrated-as -mpie-copy-relocations -munwind-tables -fcuda-is-device -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debug-info-kind=line-directives-only -dwarf-version=2 -debugger-tuning=gdb -o empty.s -x cuda empty.cc $ cat empty.s // // Generated by LLVM NVPTX Back-End // .version 6.0 .target sm_35 .address_size 64 } ``` llvm-svn: 351966
*	[MC][X86] Correctly model additional operand latency caused by transfer ↵	Andrea Di Biagio	2019-01-23	16	-8/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
*	Fix indentation. NFCI.	Simon Pilgrim	2019-01-23	1	-13/+13
\| \| \| \|	llvm-svn: 351958
*	[IR] Match intrinsic parameter by scalar/vectorwidth	Simon Pilgrim	2019-01-23	1	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957
*	[Hexagon] Remove incorrect bit negation	Krzysztof Parzyszek	2019-01-23	1	-1/+1
\| \| \| \|	llvm-svn: 351956
*	[AArch64] Fix out of bounds strlen	Benjamin Kramer	2019-01-23	1	-2/+2
\| \| \| \| \| \| \| \| \|	CFIInst is not zero-terminated. This is one of more annoying functional differences between StringRef and ArrayRef. Found by asan. llvm-svn: 351955
*	Move saturated arithmetic intrinsics to other integer intrinsics. NFCI.	Simon Pilgrim	2019-01-23	1	-4/+4
\| \| \| \| \| \|	They were in the floating point group. llvm-svn: 351953
*	[AMDGPU] With XNACK, cannot clause a load with result coalesced with operand	Tim Renouf	2019-01-23	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With XNACK, an smem load whose result is coalesced with an operand (thus it overwrites its own operand) cannot appear in a clause, because some other instruction might XNACK and restart the whole clause. The clause breaker already realized that an smem that overwrites an operand cannot appear in a clause, and broke the clause. The problem that this commit fixes is that the SIFormMemoryClauses optimization formed a bundle with early clobber, which caused the earlier code that set up the coalesced operand to be removed as dead. Differential Revision: https://reviews.llvm.org/D57008 Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016 llvm-svn: 351950
*	[HotColdSplitting] Remove unused SSAUpdater.h include (NFC).	Florian Hahn	2019-01-23	1	-1/+0
\| \| \| \|	llvm-svn: 351945
*	[ARM] Alter the register allocation order for minsize on Thumb2	David Green	2019-01-23	1	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently in Arm code, we allocate LR first, under the assumption that it needs to be saved anyway. Unfortunately this has the disadvantage that it will require any instructions using it to be the longer thumb2 instructions, not the shorter thumb1 ones. This switches the order when we are optimising for minsize, returning to the default order so that more lower registers can be used. It can end up requiring more pushed registers, but on average produces smaller code. Differential Revision: https://reviews.llvm.org/D56008 llvm-svn: 351938
*	[ARM][CGP] Check trunc type before replacing	Sam Parker	2019-01-23	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \|	In the last stage of type promotion, we replace any zext that uses a new trunc with the operand of the trunc. This is okay when we only allowed one type to be optimised, but now its the case that the trunc maybe needed to produce a more narrow type than the one we were optimising for. So we need to check this before doing the replacement. Differential Revision: https://reviews.llvm.org/D57041 llvm-svn: 351935
*	[DAGCombine] Enable more pre-indexed stores	Sam Parker	2019-01-23	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933
*	[SLH] AArch64: correctly pick temporary register to mask SP	Kristof Beyls	2019-01-23	1	-57/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930
*	[SystemZ] Handle DBG_VALUE instructions in two places in backend.	Jonas Paulsson	2019-01-23	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two backend optimizations failed to handle cases when compiled with -g, due to failing to consider DBG_VALUE instructions. This was in SystemZTargetLowering::emitSelect() and SystemZElimCompare::getRegReferences(). This patch makes sure that DBG_VALUEs are recognized so that they do not affect these optimizations. Tests for branch-on-count, load-and-trap and consecutive selects. Review: Ulrich Weigand https://reviews.llvm.org/D57048 llvm-svn: 351928
*	[IRCE] Support narrow latch condition for wide range checks	Max Kazantsev	2019-01-23	1	-11/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch relaxes restrictions on types of latch condition and range check. In current implementation, they should match. This patch allows to handle wide range checks against narrow condition. The motivating example is the following: int N = ... for (long i = 0; (int) i < N; i++) { if (i >= length) deopt; } In this patch, the option that enables this support is turned off by default. We'll wait until it is switched to true. Differential Revision: https://reviews.llvm.org/D56837 Reviewed By: reames llvm-svn: 351926
*	[Pipeliner] Add two pragmas to control software pipelining optimization	Brendon Cahoon	2019-01-23	1	-7/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#pragma clang loop pipeline(disable) Disable SWP optimization for the next loop. “disable” is the only possible value. #pragma clang loop pipeline_initiation_interval(number) Set value of initiation interval for SWP optimization to specified number value for the next loop. Number is the positive value greater than 0. These pragmas could be used for debugging or reducing compile time purposes. It is possible to disable SWP for concrete loops to save compilation time or to find bugs by not doing SWP to certain loops. It is possible to set value of initiation interval to concrete number to save compilation time by not doing extra pipeliner passes or to check created schedule for specific initiation interval. That is llvm part of the fix Clang part of fix: https://reviews.llvm.org/D55710 Patch by Alexey Lapshin! Differential Revision: https://reviews.llvm.org/D56403 llvm-svn: 351923
*	hwasan: Move memory access checks into small outlined functions on aarch64.	Peter Collingbourne	2019-01-23	5	-24/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920
*	MemoryBlock: Do not automatically extend a given size to a multiple of page ↵	Rui Ueyama	2019-01-23	2	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size. Previously, MemoryBlock automatically extends a requested buffer size to a multiple of page size because (I believe) doing it was thought to be harmless and with that you could get more memory (on average 2KiB on 4KiB-page systems) "for free". That programming interface turned out to be error-prone. If you request N bytes, you usually expect that a resulting object returns N for `size()`. That's not the case for MemoryBlock. Looks like there is only one place where we take the advantage of allocating more memory than the requested size. So, with this patch, I simply removed the automatic size expansion feature from MemoryBlock and do it on the caller side when needed. MemoryBlock now always returns a buffer whose size is equal to the requested size. Differential Revision: https://reviews.llvm.org/D56941 llvm-svn: 351916
*	[CodeView] Allow empty types in member functions	Josh Stone	2019-01-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: `CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void` return type if the function's type array was empty. After D54667, it started blindly indexing the 0th item for the return type, which fails in `getOperand` for empty arrays if assertions are enabled. This patch restores the `Void` return type for empty type arrays, and adds a test generated by Rust in line-only debuginfo mode. Reviewers: zturner, rnk Reviewed By: rnk Subscribers: hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D57070 llvm-svn: 351910
*	Fixed isReMaterializable setting for LUI instruction.	Ana Pazos	2019-01-22	1	-1/+2
\| \| \| \|	llvm-svn: 351895
*	[HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC	Vedant Kumar	2019-01-22	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	The splitting pass does not need BFI unless the Module actually has a profile summary. Do not calcualte BFI unless the summary is present. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.4% of the total to under 0.1%. llvm-svn: 351894
*	[HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC	Vedant Kumar	2019-01-22	1	-25/+21
\| \| \| \| \| \| \| \| \| \|	The splitting pass does not need (post)domtrees until after it's found a cold block. Defer domtree calculation until a cold block is found. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.8% of the total to 0.4%. llvm-svn: 351892
*	[LegalizeTypes] Add debug prints to the top of PromoteFloatOperand and ↵	Craig Topper	2019-01-22	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	PromoteFloatResult. Also add debug prints in the default case of the switches in these routines. Most if not all of the type legalization handlers already do this so this makes promoting floats consistent llvm-svn: 351890
*	AMDGPU/GlobalISel: Start selectively legalizing 16-bit operations	Matt Arsenault	2019-01-22	1	-4/+9
\| \| \| \| \| \| \| \|	It might be a bit nicer to use the fancy .legalIf and co. predicates, but this was requiring more boilerplate and disables the coverage assertions. llvm-svn: 351886
*	AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts	Matt Arsenault	2019-01-22	2	-2/+5
\| \| \| \|	llvm-svn: 351884
*	FileOutputBuffer: handle mmap(2) failure	Rui Ueyama	2019-01-22	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the underlying filesystem does not support mmap system call, FileOutputBuffer may fail when it attempts to mmap an output temporary file. This patch handles such situation. Unfortunately, it looks like it is very hard to test this functionality without a filesystem that doesn't support mmap using llvm-lit. I tested this locally by passing an invalid parameter to mmap so that it fails and falls back to the in-memory buffer. Maybe that's all what we can do. I believe it is reasonable to submit this without a test. Differential Revision: https://reviews.llvm.org/D56949 llvm-svn: 351883
*	GlobalISel: Allow shift amount to be a different type	Matt Arsenault	2019-01-22	8	-48/+115
\| \| \| \| \| \| \| \| \|	For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
*	[FileCheck] Suppress old -v/-vv diags if dumping input	Joel E. Denny	2019-01-22	1	-17/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old diagnostic form of the trace produced by -v and -vv looks like: ``` check1:1:8: remark: CHECK: expected string found in input CHECK: abc ^ <stdin>:1:3: note: found here ; abc def ^~~ ``` When dumping annotated input is requested (via -dump-input), I find that this old trace is not useful and is sometimes harmful: 1. The old trace is mostly redundant because the same basic information also appears in the input dump's annotations. 2. The old trace buries any error diagnostic between it and the input dump, but I find it useful to see any error diagnostic up front. 3. FILECHECK_OPTS=-dump-input=fail requests annotated input dumps only for failed FileCheck calls. However, I have to also add -v or -vv to get a full set of annotations, and that can produce massive output from all FileCheck calls in all tests. That's a real problem when I run this in the IDE I use, which grinds to a halt as it tries to capture all that output. When -dump-input=fail\|always, this patch suppresses the old trace from -v or -vv. Error diagnostics still print as usual. If you want the old trace, perhaps to see variable expansions, you can set -dump-input=none (the default). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D55825 llvm-svn: 351881
*	GlobalISel: Make buildConstant handle vectors	Matt Arsenault	2019-01-22	1	-4/+38
\| \| \| \| \| \| \|	Produce a splat build_vector similar to how SelectionDAG::getConstant does. llvm-svn: 351880
*	GlobalISel: Implement widen for extract_vector_elt elt type	Matt Arsenault	2019-01-22	2	-4/+32
\| \| \| \|	llvm-svn: 351871
*	GlobalISel: Implement fewerElementsVector for basic FP ops	Matt Arsenault	2019-01-22	2	-27/+65
\| \| \| \|	llvm-svn: 351866
*	Add missing include (cstdlib) to Demangle.h	Konstantin Zhuravlyov	2019-01-22	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57035 llvm-svn: 351861
*	AMDGPU/GlobalISel: Remove vectors from legal constant types	Matt Arsenault	2019-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 351859
*	GlobalISel: Support narrowing zextload/sextload	Matt Arsenault	2019-01-22	2	-0/+45
\| \| \| \|	llvm-svn: 351856
*	[SelectionDAGBuilder] Defer C_Register Assignments to be in line with	Nirav Dave	2019-01-22	1	-13/+3
\| \| \| \| \| \|	those of C_RegisterClass. NFCI. llvm-svn: 351854
*	GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANT	Matt Arsenault	2019-01-22	2	-4/+12
\| \| \| \|	llvm-svn: 351853
*	FileOutputBuffer: Handle "-" as stdout.	Rui Ueyama	2019-01-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was honestly a bit surprised that we didn't do this before. This patch is to handle "-" as the stdout so that if you pass `-o -` to lld, for example, it writes an output to stdout instead of file `-`. I thought that we might want to handle this at a higher level than FileOutputBuffer, because if we land this patch, we can no longer create a file whose name is `-` (there's a workaround though; you can pass `./-` instead of `-`). However, because raw_fd_ostream already handles `-` as a special file name, I think it's okay and actually consistent to handle `-` as a special name in FileOutputBuffer. Differential Revision: https://reviews.llvm.org/D56940 llvm-svn: 351852
*	Codegen support for atomicrmw fadd/fsub	Matt Arsenault	2019-01-22	11	-17/+63
\| \| \| \|	llvm-svn: 351851
*	Reapply "IR: Add fp operations to atomicrmw"	Matt Arsenault	2019-01-22	10	-14/+76
\| \| \| \| \| \| \|	This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850
*	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target.	Alexey Bataev	2019-01-22	3	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enable full support for the debug info. Reviewers: echristo Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D46189 llvm-svn: 351846
*	Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG"	Jordan Rupprecht	2019-01-22	1	-1/+1
\| \| \| \| \| \|	This is still causing compilation crashes in some targets. Will follow up shortly with a repro. llvm-svn: 351845
*	[DEBUG_INFO, NVPTX] Fix relocation info.	Alexey Bataev	2019-01-22	4	-22/+57
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Initial function labels must follow the debug location for the correct relocation info generation. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45784 llvm-svn: 351843
*	[DAGCombiner] narrow vector binop with 2 insert subvector operands	Sanjay Patel	2019-01-22	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825
*	[RISCV][NFC] Change naming scheme for RISC-V specific DAG nodes	Alex Bradbury	2019-01-22	1	-43/+50
\| \| \| \| \| \| \| \| \| \|	Previously we had names like 'Call' or 'Tail'. This potentially clashes with the naming scheme used elsewhere in RISCVInstrInfo.td. Many other backends would use names like AArch64call or PPCtail. I prefer the SystemZ approach, which uses prefixed all-lowercase names. This matches the naming scheme used for target-independent SelectionDAG nodes. llvm-svn: 351823
*	[X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y))	Simon Pilgrim	2019-01-22	1	-0/+70
\| \| \| \| \| \| \| \| \| \|	For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations. This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns. Differential Revision: https://reviews.llvm.org/D55935 llvm-svn: 351819
*	[X86][BtVer2] SSE2 vector shifts has local forwarding disabled	Simon Pilgrim	2019-01-22	1	-2/+2
\| \| \| \| \| \| \| \|	Similar to horizontal ops on D56777, the sse2 (but not mmx) bit shift ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57026 llvm-svn: 351817
*	Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI.	Simon Pilgrim	2019-01-22	1	-1/+1
\| \| \| \|	llvm-svn: 351816
*	[X86][BtVer2] X86ISD::VPERMILPV has local forwarding disabled	Simon Pilgrim	2019-01-22	1	-2/+2
\| \| \| \| \| \| \| \|	Similar to horizontal ops on D56777, the vpermilpd/vpermilps variable mask ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57022 llvm-svn: 351815
*	[CostModel][X86] Add ICMP Predicate specific costs	Simon Pilgrim	2019-01-22	1	-8/+49
\| \| \| \| \| \| \| \|	First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810