bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add a one use check to the setcc inversion code in ↵	Craig Topper	2019-01-16	1	-14/+11
\| \| \| \| \| \| \| \| \| \|	combineVSelectWithAllOnesOrZeros If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc. Differential Revision: https://reviews.llvm.org/D56765 llvm-svn: 351378
*	[X86] Add test case for D56765. NFC	Craig Topper	2019-01-16	1	-0/+36
\| \| \| \|	llvm-svn: 351377
*	[X86] Add additional saturating add/sub vector tests; NFC	Nikita Popov	2019-01-16	3	-581/+7174
\| \| \| \| \| \| \|	Additional tests for vNi32 and vNi64. I've added these for usub.sat before, this covers uadd.sat, ssub.sat and sadd.sat. llvm-svn: 351375
*	[X86][BtVer2] Update latency of horizontal operations.	Andrea Di Biagio	2019-01-16	4	-60/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Jaguar, horizontal adds/subs have local forwarding disable. That means, we pay a compulsory extra cycle of write-back stage, and the value is not available until the end of that stage. This patch changes the latency of horizontal operations by adding an extra cycle. With this patch, latency numbers now match what is reported by perf. I plan to send another patch to also 'fix' the latency of shuffle operations (on Jaguar, local forwarding is disabled for vector shuffles too). Differential Revision: https://reviews.llvm.org/D56777 llvm-svn: 351366
*	[X86] Regenerate test	Simon Pilgrim	2019-01-16	1	-3/+3
\| \| \| \| \| \|	Split check-prefixes to support a future commit llvm-svn: 351362
*	[x86] add tests for extracted scalar casts (PR39974); NFC	Sanjay Patel	2019-01-16	1	-0/+205
\| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=39974 llvm-svn: 351354
*	[x86] lower shuffle of extracts to AVX2 vperm instructions	Sanjay Patel	2019-01-16	1	-90/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was trying to prevent shuffle regressions while matching more horizontal ops and ended up here: shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0 The affected tests were added for: https://bugs.llvm.org/show_bug.cgi?id=34380 This patch won't change the examples in the bug report itself, but we should be able to extend this to catch more types. Differential Revision: https://reviews.llvm.org/D56756 llvm-svn: 351346
*	[X86][SSE] Add additional PR40318 shuffle test cases	Simon Pilgrim	2019-01-16	2	-0/+154
\| \| \| \|	llvm-svn: 351333
*	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp	Mandeep Singh Grang	2019-01-16	5	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281
*	[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar ↵	Craig Topper	2019-01-15	1	-92/+108
\| \| \| \| \| \| \| \|	integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275
*	X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)	Roman Lebedev	2019-01-15	1	-35/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`. That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]]. That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it. It will not be unexpected if you will be doing further computations in 32-bit width. And then the current code breaks, as the tests show. The basic idea/pattern here is following: 1. We have `i64` input 2. We perform `i64` right-shift on it. 3. We `trunc`ate that shifted value 4. We do all further work (masking) in `i32` Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift. BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64` and truncate the result after masking: https://rise4fun.com/Alive/K4B ``` Name: @bextr64_32_b1 -> @bextr64_32_b0 %shiftedval = lshr i64 %val, %numskipbits %truncshiftedval = trunc i64 %shiftedval to i32 %widenumlowbits1 = zext i8 %numlowbits to i32 %notmask1 = shl nsw i32 -1, %widenumlowbits1 %mask1 = xor i32 %notmask1, -1 %res = and i32 %truncshiftedval, %mask1 => %shiftedval = lshr i64 %val, %numskipbits %widenumlowbits = zext i8 %numlowbits to i64 %notmask = shl nsw i64 -1, %widenumlowbits %mask = xor i64 %notmask, -1 %wideres = and i64 %shiftedval, %mask %res = trunc i64 %wideres to i32 ``` Thus, we are again able to extract that `lshr` into `BEXTR`'s control. Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea: ``` $ cat /tmp/old.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RSI # LLVM-EXEGESIS-LIVEIN EDX # LLVM-EXEGESIS-LIVEIN RDI movq %rsi, %rcx shrq %cl, %rdi shll $8, %edx bextrl %edx, %edi, %eax $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o --- mode: latency key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.6638, per_snippet_value: 2.6552 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... $ cat /tmp/old.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o --- mode: uops key: instructions: - 'MOV64rr RCX RSI' - 'SHR64rCL RDI RDI' - 'SHL32ri EDX EDX i_0x8' - 'BEXTR32rr EAX EDI EDX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3 ... ``` vs ``` $ cat /tmp/new.s # bextr64_32_b1 # LLVM-EXEGESIS-LIVEIN RDX # LLVM-EXEGESIS-LIVEIN SIL # LLVM-EXEGESIS-LIVEIN RDI shlq $8, %rdx movzbl %sil, %eax orq %rdx, %rax bextrq %rax, %rdi, %rax $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=latency -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o --- mode: latency key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 0.7454, per_snippet_value: 2.9816 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... $ cat /tmp/new.s \| ./bin/llvm-exegesis -mode=uops -snippets-file=- Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o --- mode: uops key: instructions: - 'SHL64ri RDX RDX i_0x8' - 'MOVZX32rr8 EAX SIL' - 'OR64rr RAX RAX RDX' - 'BEXTR64rr RAX RDI RAX' config: '' register_initial_values: [] cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: PdFPU0, value: 0, per_snippet_value: 0 } - { key: PdFPU1, value: 0, per_snippet_value: 0 } - { key: PdFPU2, value: 0, per_snippet_value: 0 } - { key: PdFPU3, value: 0, per_snippet_value: 0 } - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 } error: '' info: '' assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3 ... ``` ^ latency increased (worse). Except //maybe// not really. Like with all synthetic benchmarks, they //may// be misleading. Let's take a look on some actual real-world hotpath. In this case it's 'my' [[ https://github.com/darktable-org/rawspeed \| RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ https://github.com/darktable-org/rawspeed/blob/e3316dc85127c2c29baa40f998f198a7b278bf36/src/librawspeed/decompressors/VC5Decompressor.cpp#L814 \| GoPro VC5 decompressor ]]: ``` raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM 2018-12-22 21:23:03 Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench Run on (8 X 4012.81 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.41, 2.41, 2.03 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 40 ms 40 ms 128 0.322244 7.96974 12M 37.4457M 298.534M 3.12047 24.8778 0.040465 GOPR9172.GPR/threads:8/real_time_median 39 ms 39 ms 128 0.312606 7.99155 12M 38.387M 306.788M 3.19891 25.5656 0.039115 GOPR9172.GPR/threads:8/real_time_stddev 4 ms 3 ms 128 0.0271557 0.130575 0 2.4941M 21.3909M 0.207842 1.78257 3.81081m RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9 2018-12-22 21:23:08 Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Run on (8 X 4013.1 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 3.78, 2.50, 2.06 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_mean 39 ms 39 ms 128 0.311533 7.97323 12M 38.6828M 308.471M 3.22356 25.706 0.0390928 GOPR9172.GPR/threads:8/real_time_median 38 ms 38 ms 128 0.304231 7.99005 12M 39.4437M 315.527M 3.28698 26.294 0.0380316 GOPR9172.GPR/threads:8/real_time_stddev 3 ms 3 ms 128 0.0229149 0.133814 0 2.26225M 19.1421M 0.188521 1.59517 3.13671m Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------- GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 GOPR9172.GPR/threads:8/real_time_mean -0.0339 -0.0316 40 39 40 39 GOPR9172.GPR/threads:8/real_time_median -0.0277 -0.0274 39 38 39 38 GOPR9172.GPR/threads:8/real_time_stddev -0.1769 -0.1267 4 3 3 3 ``` I.e. this results in //roughly// -3% improvements in perf. While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 \| PR36419 ]], it won't address it fully. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: craig.topper Subscribers: courbet, llvm-commits Differential Revision: https://reviews.llvm.org/D56052 llvm-svn: 351253
*	[X86] Add versions of the avx512 gather intrinsics that take the mask as a ↵	Craig Topper	2019-01-15	2	-146/+1056
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234
*	Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-15	2	-1596/+673
\| \| \| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219
*	[X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero ↵	Simon Pilgrim	2019-01-15	2	-18/+12
\| \| \| \| \| \| \| \| \| \|	(PR40306) If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements. We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets. llvm-svn: 351203
*	[X86] Add PR40318 shuffle test case	Simon Pilgrim	2019-01-15	1	-0/+32
\| \| \| \| \| \|	The other test case is already covered by the PR40306 test case, which was mainly concerned with SSSE3 codegen. llvm-svn: 351201
*	Remove irrelevant references to legacy git repositories from	James Y Knight	2019-01-15	4	-5/+5
\| \| \| \| \| \| \| \| \|	compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
*	[DAGCombiner] reduce buildvec of zexted extracted element to shuffle	Sanjay Patel	2019-01-15	1	-155/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'. That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops. We aren't producing optimal shuffle code in some cases though, so the patch is limited to reduce regressions. Differential Revision: https://reviews.llvm.org/D56281 llvm-svn: 351198
*	[NFC][X86] extract-bits.ll: add test with truncation with extra-use.	Roman Lebedev	2019-01-15	1	-342/+487
\| \| \| \| \| \|	That extra-use should prevent D56052 from looking past the trunc. llvm-svn: 351182
*	[X86] Upgrade some avx512bw shift intrinsics that were removed a while ago. NFC	Craig Topper	2019-01-15	4	-50/+341
\| \| \| \| \| \|	Masking was removed from these intrinsics and I guess we didn't update the tests then. llvm-svn: 351165
*	[X86] Add test cases for D56695. NFC	Craig Topper	2019-01-15	4	-2/+481
\| \| \| \|	llvm-svn: 351162
*	[X86] Switch the triple on avx2-intrinsics-x86.ll to be -unknown-unknown ↵	Craig Topper	2019-01-15	1	-715/+715
\| \| \| \| \| \| \| \| \| \|	instead of darwin so the constant pool entries will be filtered better by the script. Darwin uses LCPI instead of .LCPI so the filter doesn't work. This is silly, but it will help reduce some future some test diffs. llvm-svn: 351161
*	[X86] Avoid clobbering ESP/RSP in the epilogue.	Reid Kleckner	2019-01-15	2	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to be used for tail calls, but this also caused `findDeadCallerSavedReg` to think they were acceptable targets for clobbering. Filter them out. Fixes PR40289. Patch by Geoffry Song! Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D56617 llvm-svn: 351146
*	Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"	Nikita Popov	2019-01-14	2	-673/+1596
\| \| \| \| \| \| \| \| \|	This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
*	[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors	Nikita Popov	2019-01-14	2	-1596/+673
\| \| \| \| \| \| \| \| \| \| \|	Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
*	[X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for ↵	Simon Pilgrim	2019-01-14	1	-3/+5
\| \| \| \| \| \| \| \|	shuffle-with-zero (PR40306) If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements. llvm-svn: 351103
*	[x86] lower extracted add/sub to horizontal vector math	Sanjay Patel	2019-01-14	3	-388/+641
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0 This is the integer sibling to D56011. There's an additional restriction to only to do this transform in the case where we don't have extra extracts from the source vector. Without that, we can fail to match larger horizontal patterns that are more beneficial than this minimal case. An improvement to the more general h-op lowering may allow us to remove the restriction here in a follow-up. llvm-svn: 351093
*	[X86] Add PR40306 shuffle test case	Simon Pilgrim	2019-01-14	1	-0/+29
\| \| \| \|	llvm-svn: 351078
*	[DAGCombiner] Add (sub_sat x, x) -> 0 combine	Simon Pilgrim	2019-01-14	2	-12/+4
\| \| \| \|	llvm-svn: 351073
*	[DAGCombiner] Enable sub saturation constant folding	Simon Pilgrim	2019-01-14	2	-28/+10
\| \| \| \|	llvm-svn: 351072
*	[X86] Add sub saturation constant folding and self tests.	Simon Pilgrim	2019-01-14	2	-2/+152
\| \| \| \|	llvm-svn: 351071
*	[DAGCombiner] Add add/sub saturation undef handling	Simon Pilgrim	2019-01-14	4	-33/+13
\| \| \| \| \| \| \| \|	Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070
*	[X86] Add add/sub saturation undef tests.	Simon Pilgrim	2019-01-14	4	-4/+116
\| \| \| \|	llvm-svn: 351066
*	[DAGCombiner] Enable add saturation constant folding	Simon Pilgrim	2019-01-14	2	-18/+36
\| \| \| \|	llvm-svn: 351060
*	[DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf	Jeremy Morse	2019-01-14	1	-20/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following PR39807, the way in which SimplifyCFG hoists common code on branch paths was fixed in r347782. However this left extra code hanging around HoistThenElseCodeToIf that wasn't necessary and needlessly complicated matters -- we no longer need to look up through the 'if' basic block to find a location for hoisted 'select' insts, we can instead use the location chosen by applyMergedLocation. This patch deletes that extra logic, and updates a regression test to reflect the new logic (selects get the merged location, not a previous insts location). Differential Revision: https://reviews.llvm.org/D55272 llvm-svn: 351058
*	[DAGCombiner] Add add saturation constant folding tests.	Simon Pilgrim	2019-01-14	2	-14/+76
\| \| \| \| \| \|	Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057
*	Replace "no-frame-pointer-*" function attributes with "frame-pointer"	Francis Visoiu Mistrih	2019-01-14	41	-52/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' * \| xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * \| xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049
*	[X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select ↵	Craig Topper	2019-01-14	6	-34/+316
\| \| \| \| \| \| \| \|	in IR instead. Fixes PR40259 llvm-svn: 351035
*	[X86] Add new test file that was supposed to go with r351028.	Craig Topper	2019-01-14	1	-0/+44
\| \| \| \|	llvm-svn: 351034
*	[X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a ↵	Craig Topper	2019-01-14	1	-18/+28
\| \| \| \| \| \| \| \| \| \|	vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028
*	[DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)	Simon Pilgrim	2019-01-13	1	-5/+5
\| \| \| \| \|	NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
*	[DAGCombiner] Some very basic add/sub saturation combines.	Simon Pilgrim	2019-01-13	4	-66/+18
\| \| \| \| \| \|	Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
*	[X86] Add some basic add/sub saturation combine tests.	Simon Pilgrim	2019-01-13	4	-0/+300
\| \| \| \| \| \|	The actual combines will be added in a future commit. llvm-svn: 351023
*	[X86] More aggressive shuffle mask widening in combineExtractWithShuffle	Simon Pilgrim	2019-01-12	1	-2/+1
\| \| \| \| \| \|	Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013
*	[DAGCombiner] fold insert_subvector of insert_subvector	Sanjay Patel	2019-01-12	2	-100/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
*	[X86] Add more usub.sat vector tests; NFC	Nikita Popov	2019-01-12	1	-143/+1560
\| \| \| \| \| \|	Add additional vXi32 and vXi64 tests. llvm-svn: 351003
*	[X86] Improve vXi64 ISD::ABS codegen with SSE41+	Simon Pilgrim	2019-01-12	2	-135/+254
\| \| \| \| \| \| \| \|	Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999
*	[X86][AARCH64] Improve ISD::ABS support	Simon Pilgrim	2019-01-12	2	-22/+19
\| \| \| \| \| \| \| \|	This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
*	[X86] Add ISD node for masked version of CVTPS2PH.	Craig Topper	2019-01-12	2	-17/+17
\| \| \| \| \| \| \| \| \| \|	The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
*	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not ↵	Craig Topper	2019-01-12	6	-240/+108
\| \| \| \| \| \| \| \| \| \|	avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989
*	[X86] Add ISD nodes for masked truncate so we can properly represent when ↵	Craig Topper	2019-01-12	4	-302/+309
\| \| \| \| \| \| \| \| \| \| \| \|	the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985