bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SelectionDAG] Update check in createOperands to reflect max() is a valid value.	Florian Hahn	2019-01-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value returned by max() is the last valid value, adjust the comparison accordingly. The code added in D55073 creates TokenFactors with max() operands. Reviewers: aemerson, efriedma, RKSimon, craig.topper Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D56738 llvm-svn: 351318
*	[Support] Remove error return value from one overload of fs::make_absolute	Pavel Labath	2019-01-16	2	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The version of make_absolute which accepted a specific directory to use as the "base" for the computation could never fail, even though it returned a std::error_code. The reason for that seems to be historical -- the CWD flavour (which can fail due to failure to retrieve CWD) was there first, and the new version was implemented by extending that. This removes the error return value from the non-CWD overload and reimplements the CWD version on top of that. This enables us to remove some dead code where people were pessimistically trying to handle the errors returned from this function. Reviewers: zturner, sammccall Subscribers: hiraditya, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D56599 llvm-svn: 351317
*	[NewPM][TSan] Reiterate the TSan port	Philip Pfaffe	2019-01-16	5	-35/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Second iteration of D56433 which got reverted in rL350719. The problem in the previous version was that we dropped the thunk calling the tsan init function. The new version keeps the thunk which should appease dyld, but is not actually OK wrt. the current semantics of function passes. Hence, add a helper to insert the functions only on the first time. The helper allows hooking into the insertion to be able to append them to the global ctors list. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56538 llvm-svn: 351314
*	[DAGCombine] Fix ReduceLoadWidth for shifted offsets	Sam Parker	2019-01-16	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310
*	[WebAssembly] COWS has been renamed to WASI.	Dan Gohman	2019-01-16	1	-2/+2
\| \| \| \|	llvm-svn: 351297
*	Only promote args when function attributes are compatible	Tom Stellard	2019-01-16	2	-4/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296
*	[InstCombine]Avoid introduction of unaligned mem access	Serguei Katkov	2019-01-16	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295
*	[WebAssembly] Store section alignment as a power of 2	Sam Clegg	2019-01-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This change bumps for version number of the wasm object file metadata. See https://github.com/WebAssembly/tool-conventions/pull/92 Differential Revision: https://reviews.llvm.org/D56758 llvm-svn: 351285
*	[GISel]: Add support for CSEing continuously during GISel passes.	Aditya Nandakumar	2019-01-16	13	-76/+816
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283
*	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp	Mandeep Singh Grang	2019-01-16	4	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281
*	[X86] Rename SHRUNKBLEND ISD node to BLENDV.	Craig Topper	2019-01-16	3	-15/+16
\| \| \| \| \| \|	That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use. llvm-svn: 351278
*	[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar ↵	Craig Topper	2019-01-15	2	-2/+31
\| \| \| \| \| \| \| \|	integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275
*	AMDGPU: Raise the priority of MAD24 in instruction selection.	Changpeng Fang	2019-01-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly estimate register pressure during instruction selection, we can give mad a higher priority. In this work, we raise the priority for mad24 in selection and resolve the performance regression. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D56745 llvm-svn: 351273
*	[VFS] Move RedirectingFileSystem interface into header (NFC)	Jonas Devlieghere	2019-01-15	1	-339/+166
\| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the RedirectingFileSystem into the header so it can be extended. This is needed in LLDB we need a way to obtain the external path to deal with FILE* and file descriptor APIs. Discussion on the mailing list: http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html Differential revision: https://reviews.llvm.org/D54277 llvm-svn: 351265
*	X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)	Roman Lebedev	2019-01-15	1	-9/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`. That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]]. That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it. It will not be unexpected if you will be doing further computations in 32-bit width. And then the current code breaks, as the tests show. The basic idea/pattern here is following: 1. We have `i64` input 2. We perform `i64` right-shift on it. 3. We `trunc`ate that shifted value 4. We do all further work (masking) in `i32` Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift. BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64` and truncate the result after masking: https://rise4fun.com/Alive/K4B ``` Name: @bextr64_32_b1 -> @bextr64_32_b0 %shiftedval = lshr i64 %val, %numskipbits %truncshiftedval = trunc i64 %shiftedval to i32 %widenumlowbits1 = zext i8 %numlowbits to i32 %notmask1 = shl nsw i32 -1, %widenumlowbits1 %mask1 = xor i32 %notmask1, -1 %res = and i32 %truncshiftedval, %mask1 => %shiftedval = lshr i64 %val, %numskipbits %widenumlowbits = zext i8 %numlowbits to i64 %notmask = shl nsw i64 -1, %widenumlowbits %mask = xor i64 %notmask, -1 %wideres = and i64 %shiftedval, %mask %res = trunc i64 %wideres to i32 ``` Thus, we are again able to extract that `lshr` into `BEXTR`'s control. Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea: ``` $ cat /tmp/old.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RSI # LLVM-EXEGESIS-LIVEIN EDX # LLVM-EXEGESIS-LIVEIN RDI movq %rsi, %rcx shrq %cl, %rdi shll $8, %edx bextrl %edx, %edi, %eax $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o --- mode: latency key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.6638, per_snippet_value: 2.6552 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o --- mode: uops key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... ``` vs ``` $ cat /tmp/new.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RDX # LLVM-EXEGESIS-LIVEIN SIL # LLVM-EXEGESIS-LIVEIN RDI shlq $8, %rdx movzbl %sil, %eax orq %rdx, %rax bextrq %rax, %rdi, %rax $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o --- mode: latency key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.7454, per_snippet_value: 2.9816 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o --- mode: uops key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... ``` ^ latency increased (worse). Except //maybe// not really. Like with all synthetic benchmarks, they //may// be misleading. Let's take a look on some actual real-world hotpath. In this case it's 'my' [[ https://github.com/darktable-org/rawspeed \| RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ https://github.com/darktable-org/rawspeed/blob/e3316dc85127c2c29baa40f998f198a7b278bf36/src/librawspeed/decompressors/VC5Decompressor.cpp#L814 \| GoPro VC5 decompressor ]]: ``` raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM 2018-12-22 21:23:03 Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench Run on (8 X 4012.81 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.41, 2.41, 2.03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 40 ms 40 ms 128 0.322244 7.96974 12M 37.4457M 298.534M 3.12047 24.8778 0.040465 GOPR9172.GPR/threads:8/real_time_median 39 ms 39 ms 128 0.312606 7.99155 12M 38.387M 306.788M 3.19891 25.5656 0.039115 GOPR9172.GPR/threads:8/real_time_stddev 4 ms 3 ms 128 0.0271557 0.130575 0 2.4941M 21.3909M 0.207842 1.78257 3.81081m RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9 2018-12-22 21:23:08 Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Run on (8 X 4013.1 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.78, 2.50, 2.06 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 39 ms 39 ms 128 0.311533 7.97323 12M 38.6828M 308.471M 3.22356 25.706 0.0390928 GOPR9172.GPR/threads:8/real_time_median 38 ms 38 ms 128 0.304231 7.99005 12M 39.4437M 315.527M 3.28698 26.294 0.0380316 GOPR9172.GPR/threads:8/real_time_stddev 3 ms 3 ms 128 0.0229149 0.133814 0 2.26225M 19.1421M 0.188521 1.59517 3.13671m Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 GOPR9172.GPR/threads:8/real_time_mean -0.0339 -0.0316 40 39 40 39 GOPR9172.GPR/threads:8/real_time_median -0.0277 -0.0274 39 38 39 38 GOPR9172.GPR/threads:8/real_time_stddev -0.1769 -0.1267 4 3 3 3 ``` I.e. this results in //roughly// -3% improvements in perf. While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]], it won't address it fully. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: craig.topper Subscribers: courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D56052 llvm-svn: 351253
*	treat invoke like call	David Callahan	2019-01-15	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: InvokeInst should be treated like CallInst and assigned a separate discriminator. This is particularly import when an Invoke is converted to a Call during compilation and so can invalidate sample profile data collected wtih different link time optimizations Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56491 llvm-svn: 351251
*	[SanitizerCoverage] Don't create comdat for interposable functions.	Matt Morehouse	2019-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Comdat groups override weak symbol behavior, allowing the linker to keep the comdats for weak symbols in favor of comdats for strong symbols. Fixes the issue described in: https://bugs.chromium.org/p/chromium/issues/detail?id=918662 Reviewers: eugenis, pcc, rnk Reviewed By: pcc, rnk Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56516 llvm-svn: 351247
*	[X86] Add versions of the avx512 gather intrinsics that take the mask as a ↵	Craig Topper	2019-01-15	2	-3/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234
*	[MSP430] Recognize '{' as a line separator	Anton Korobeynikov	2019-01-15	1	-0/+1
\| \| \| \| \| \| \|	msp430-as supports multiple assembly statements on the same line separated by a '{' character. llvm-svn: 351233
*	[Nios2] Remove Nios2 backend	Craig Topper	2019-01-15	52	-2953/+0
\| \| \| \| \| \| \| \|	As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months. Differential Revision: https://reviews.llvm.org/D56691 llvm-svn: 351231
*	Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-15	3	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
*	[WebAssembly] Fix updating/moving DBG_VALUEs in RegStackify	Yury Delendik	2019-01-15	4	-52/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As described in PR40209, there can be issues in DBG_VALUEs handling when multiple defs present in a BB. This patch adds logic for detection of related to def DBG_VALUEs and localizes register update and movement to found DBG_VALUEs. Reviewers: aheejin Subscribers: mgorny, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D56401 llvm-svn: 351216
*	We can improve the performance (generally) by memo-izing the action to map a ↵	David Callahan	2019-01-15	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	debug location to its function summary. Summary: Here are timings (as reported by "opt -time-passes") for sample-profile pass for some files holding hot functions from a major service©r. Average 17% reduction. Delta column is 100*(old-new)/old. ``` Old New Delta 0.0537 0.0538 -0.2% 0.8155 0.6522 20.0% 0.0779 0.0751 3.6% 0.0727 0.0913 -25.6% 0.1622 0.1302 19.7% 0.0627 0.0594 5.3% 0.0766 0.0744 2.9% 0.6426 0.4387 31.7% 0.3521 0.2776 21.2% 0.3549 0.2721 23.3% 0.0912 0.0904 0.9% 0.1236 0.1059 14.3% 0.0854 0.0866 -1.4% 0.0757 0.0722 4.6% 0.1293 0.1147 11.3% 0.1354 0.1122 17.1% 0.0767 0.0770 -0.4% 0.1135 0.0968 14.7% 0.0524 0.0608 -16.0% 0.1279 0.1106 13.5% ========== 3.6820 3.0520 17.1% Total ``` Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D56435 llvm-svn: 351211
*	[SelectionDAG] Check membership of register in class for single	Nirav Dave	2019-01-15	1	-6/+1
\| \| \| \| \| \| \| \| \|	register constraints. NFCI. Now that X86's ST(7) constraints are fixed this check can be reinstated. llvm-svn: 351207
*	[X86] Fix register class for assembly constraints to ST(7). NFCI.	Nirav Dave	2019-01-15	2	-4/+13
\| \| \| \| \| \| \| \|	Modify getRegForInlineAsmConstraint to return special singleton register class when a constraint references ST(7) not RFP80 for which ST(7) is not a member. llvm-svn: 351206
*	[X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero ↵	Simon Pilgrim	2019-01-15	1	-5/+4
\| \| \| \| \| \| \| \| \| \|	(PR40306) If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements. We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets. llvm-svn: 351203
*	[DAGCombiner] reduce buildvec of zexted extracted element to shuffle	Sanjay Patel	2019-01-15	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'. That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops. We aren't producing optimal shuffle code in some cases though, so the patch is limited to reduce regressions. Differential Revision: https://reviews.llvm.org/D56281 llvm-svn: 351198
*	Revert r351138 "[ORC] Move ORC Core symbol map and set types into their own	Lang Hames	2019-01-15	3	-256/+233
\| \| \| \| \| \| \| \|	header: CoreTypes.h." This commit broke some bots. Reverting while I investigate. llvm-svn: 351195
*	[SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction	Zaara Syeda	2019-01-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for unswitching a non-trivial switch instruction. This is to fix a bug that it increments NumBranches even for the case of switch instruction. There is no functional change in this patch. Differential Revision: https://reviews.llvm.org/D56408 llvm-svn: 351193
*	[InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.	Florian Hahn	2019-01-15	1	-4/+3
\| \| \| \| \| \| \| \| \|	Otherwise instcombine gets stuck in a cycle. The canonicalization was added in D55961. This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400 llvm-svn: 351187
*	[NFC] Remove some code duplication	Max Kazantsev	2019-01-15	1	-26/+9
\| \| \| \|	llvm-svn: 351185
*	[NFC] Remove obsolete enum RangeCheckKind	Max Kazantsev	2019-01-15	1	-59/+16
\| \| \| \|	llvm-svn: 351183
*	[NFC] Decrease if nest	Max Kazantsev	2019-01-15	1	-18/+14
\| \| \| \|	llvm-svn: 351180
*	[NFC] Move some functions to LoopUtils	Max Kazantsev	2019-01-15	2	-42/+42
\| \| \| \|	llvm-svn: 351179
*	[WebAssembly] Support multilibs for wasm32 and add a wasm OS that uses it	Dan Gohman	2019-01-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This adds support for multilib paths for wasm32 targets, following [Debian's Multiarch conventions], and also adds an experimental OS name in order to test it. [Debian's Multiarch conventions]: https://wiki.debian.org/Multiarch/ Differential Revision: https://reviews.llvm.org/D56553 llvm-svn: 351163
*	[WebAssembly] Expand SIMD shifts while V8's implementation disagrees	Thomas Lively	2019-01-15	1	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: V8 currently implements SIMD shifts as taking an immediate operation, which disagrees with the spec proposal and the toolchain implementation. As a stopgap measure to get things working, unroll all vector shifts. Since this is a temporary measure, there are no tests. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56520 llvm-svn: 351151
*	AMDGPU: Add a fast path for icmp.i1(src, false, NE)	Marek Olsak	2019-01-15	3	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
*	[AArch64] Adjust the feature set for Exynos	Evandro Menezes	2019-01-15	1	-0/+1
\| \| \| \| \| \|	Enable the fusion of arithmetic and logic instructions for Exynos M4. llvm-svn: 351149
*	[X86] Avoid clobbering ESP/RSP in the epilogue.	Reid Kleckner	2019-01-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to be used for tail calls, but this also caused `findDeadCallerSavedReg` to think they were acceptable targets for clobbering. Filter them out. Fixes PR40289. Patch by Geoffry Song! Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D56617 llvm-svn: 351146
*	[EarlyIfConversion] Don't if-convert unconditional branches.	Eli Friedman	2019-01-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	A block ending in an unconditional branch can have two successors if one is a landing pad. In practice, I think this only has an effect on Windows because landing pads are never empty for Itanium unwinding. (Alternatively, I could add a check to AArch64InstrInfo::canInsertSelect, but this seems more obvious.) Differential Revision: https://reviews.llvm.org/D56468 llvm-svn: 351142
*	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .	Eli Friedman	2019-01-15	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, with D56544, the intrinsic will be expanded to an integer csel, which is probably not what the user expected. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. While I'm here, also add some error checking so we don't generate illegal ABS nodes. Differential Revision: https://reviews.llvm.org/D56616 llvm-svn: 351141
*	[AArch64] Add new target feature to fuse arithmetic and logic operations	Evandro Menezes	2019-01-14	3	-7/+116
\| \| \| \| \| \| \| \| \|	This feature enables the fusion of some arithmetic and logic instructions together. Differential revision: https://reviews.llvm.org/D56572 llvm-svn: 351139
*	[ORC] Move ORC Core symbol map and set types into their own header: CoreTypes.h.	Lang Hames	2019-01-14	3	-233/+256
\| \| \| \| \| \| \|	This will allow other utilities (including a future RuntimeDyld replacement) to use these types without pulling in the major Core types (JITDylib, etc.). llvm-svn: 351138
*	[X86] Fix unused variable warning in Release builds. NFC.	Benjamin Kramer	2019-01-14	1	-2/+2
\| \| \| \|	llvm-svn: 351136
*	Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-14	3	-35/+4
\| \| \| \| \| \| \| \| \|	This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
*	[Object] Return a symbol_iterator, rather than a basic_symbol_iterator, from	Lang Hames	2019-01-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachOObjectFile::getSymbolByIndex. ObjectFile derivatives should prefer symbol_iterator/SymbolRef over basic_symbol_iterator/BasicSymbolRef where possible, as the former retain their link to the ObjectFile (rather than a SymbolicFile) and provide more functionality. No test for this: Existing code is working, and we don't have (m)any libObject unit tests. I'll think about how we can test more systematically going forward. llvm-svn: 351128
*	[WebAssembly][FastISel] Do not assume naive CmpInst lowering	Thomas Lively	2019-01-14	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See test/CodeGen/WebAssembly/PR40172.ll for an explanation. Reviewers: dschuff, aheejin Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100 Differential Revision: https://reviews.llvm.org/D56457 llvm-svn: 351127
*	[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors	Nikita Popov	2019-01-14	3	-4/+35
\| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
*	[opaque pointer types] Update GetElementPtr creation APIs to	James Y Knight	2019-01-14	1	-7/+41
\| \| \| \| \| \| \| \| \| \|	consistently accept a pointee-type argument. Note: this also adds a new C API and soft-deprecates the old C API. Differential Revision: https://reviews.llvm.org/D56559 llvm-svn: 351124
*	[opaque pointer types] Update LoadInst creation APIs to consistently	James Y Knight	2019-01-14	2	-57/+24
\| \| \| \| \| \| \| \| \| \|	accept a return-type argument. Note: this also adds a new C API and soft-deprecates the old C API. Differential Revision: https://reviews.llvm.org/D56558 llvm-svn: 351123