bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Add support for detecting SUB(SPLAT_BV, SPLAT) cases for ↵	Simon Pilgrim	2018-05-31	3	-214/+57
\| \| \| \| \| \| \| \|	shift-rotate patterns. This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426). llvm-svn: 333641
*	[X86] Update the fast-isel tests for _mm_rcp_ss, _mm_rsqrt_ss, and ↵	Craig Topper	2018-05-30	1	-27/+3
\| \| \| \| \| \|	_mm_sqrt_ss to match clang codegen after r333572. llvm-svn: 333573
*	[X86][AVX512BW] Fixed check prefix copy+paste typo in avx512bw-intrinsics.ll	Simon Pilgrim	2018-05-30	1	-570/+570
\| \| \| \| \| \|	Prefix was for AVX512F instead of AVX512BW llvm-svn: 333560
*	[X86] Lowering FMA intrinsics to native IR (LLVM part)	Gabor Buella	2018-05-30	8	-70/+9739
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for Clang lowering of fused intrinsics. This patch: 1. Removes bindings to clang fma intrinsics. 2. Introduces new LLVM unmasked intrinsics with rounding mode: int_x86_avx512_vfmadd_pd_512 int_x86_avx512_vfmadd_ps_512 int_x86_avx512_vfmaddsub_pd_512 int_x86_avx512_vfmaddsub_ps_512 supported with a new intrinsic type (INTR_TYPE_3OP_RM). 3. Introduces new x86 fmaddsub/fmsubadd folding. 4. Introduces new tests for code emitted by sequentions introduced in Clang part. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D47443 llvm-svn: 333554
*	[X86][AVX512] Replace -cpu=knl with -mattr=+avx512f for avx512-intrinsics tests	Simon Pilgrim	2018-05-30	2	-64/+127
\| \| \| \| \| \| \| \|	It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes. This adds vzeroupper at the end of some tests as we lose the 'FeatureFastPartialYMMorZMMWrite' feature from KNL, since Skylake+ don't support this its probably better. llvm-svn: 333549
*	[X86][SSE] Remove unnecessary -cpu from sttni tests	Simon Pilgrim	2018-05-30	1	-16/+16
\| \| \| \| \| \|	It was noticed on D47377 that these tests (for PR37246) were being unnecessarily affected by scheduler changes. llvm-svn: 333546
*	[X86][SSE] Replace -cpu with equivalent -mattr for vec_cast tests	Simon Pilgrim	2018-05-30	3	-6/+6
\| \| \| \| \| \|	It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes. llvm-svn: 333545
*	[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so ↵	Craig Topper	2018-05-29	1	-16/+16
\| \| \| \| \| \| \| \|	that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction. The order should be controlled in the input pattern. llvm-svn: 333463
*	[X86] Fix a potential crash that occur after r333419.	Craig Topper	2018-05-29	1	-0/+20
\| \| \| \| \| \|	The code could issue a truncate from a small type to larger type. We need to extend in that case instead. llvm-svn: 333460
*	[X86][SSE] Regenerate sdiv combine tests	Simon Pilgrim	2018-05-29	1	-65/+65
\| \| \| \|	llvm-svn: 333431
*	[X86][AVX] Regenerate vzeroall/vzeroupper cleanup tests	Simon Pilgrim	2018-05-29	1	-8/+8
\| \| \| \|	llvm-svn: 333430
*	[X86] Scalar mask and scalar move optimizations	Alexander Ivchenko	2018-05-29	2	-119/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Introduction of mask scalar TableGen patterns. 2. Introduction of new scalar move TableGen patterns and refactoring of existing ones. 3. Folding of pattern created by introducing scalar masking in Clang header files. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D47012 llvm-svn: 333419
*	StackColoring: better handling of statically unreachable code	Than McIntosh	2018-05-29	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid assert/crash during liveness calculation in situations where the incoming machine function has statically unreachable BBs. Second attempt at submitting; this version of the change includes a revised testcase. Fixes PR37130. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47372 llvm-svn: 333416
*	[X86] Disable a DAG combine to allow packed AVX512DQ instructions to be ↵	Craig Topper	2018-05-29	1	-20/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	consistently used for i64->float/double conversions. Summary: We already get this right if the i64 didn't come from a load. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47439 llvm-svn: 333393
*	[X86][Sched] Add InstRW for CLC on Intel after SNB.	Clement Courbet	2018-05-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392
*	[X86] Remove masked vpermi2var/vpermt2var intrinsics and autoupgrade.	Craig Topper	2018-05-29	15	-248/+1250
\| \| \| \| \| \|	We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added. llvm-svn: 333388
*	[X86] Add unmasked vermi2var intrinsics so we can use explicit select ↵	Craig Topper	2018-05-29	6	-0/+1589
\| \| \| \| \| \| \| \|	instructions for masking in clang. This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch. llvm-svn: 333386
*	[X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.	Craig Topper	2018-05-28	6	-36/+36
\| \| \| \| \| \| \| \| \| \|	These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383
*	[X86] Don't hardcode scheduler class	Simon Pilgrim	2018-05-27	1	-4/+4
\| \| \| \| \| \|	Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added. llvm-svn: 333360
*	[X86] Remove masking from avx512ifma intrinsics. Use a select instead.	Craig Topper	2018-05-26	6	-126/+1074
\| \| \| \| \| \|	This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6. llvm-svn: 333346
*	Add test case for D46505 . NFC	Amaury Sechet	2018-05-26	2	-0/+77
\| \| \| \|	llvm-svn: 333341
*	[CodeGenPrepare] Revert r331783	Guozhi Wei	2018-05-25	3	-3/+20
\| \| \| \| \| \|	The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further. llvm-svn: 333305
*	[X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)	Simon Pilgrim	2018-05-25	2	-18/+18
\| \| \| \| \| \| \| \|	As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271
*	[x86] invpcid LLVM intrinsic	Gabor Buella	2018-05-25	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-add the feature flag for invpcid, which was removed in r294561. Add an intrinsic, which always uses a 32 bit integer as first argument, while the instruction actually uses a 64 bit register in 64 bit mode for the INVPCID_TYPE argument. Reviewers: craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47141 llvm-svn: 333255
*	[x86] add vector load-cmp-select tests; NFC	Sanjay Patel	2018-05-24	1	-0/+289
\| \| \| \|	llvm-svn: 333185
*	Added a testcase for PR31593. A patch (r291535) that fixed this bug didn't ↵	Ekaterina Romanova	2018-05-24	1	-0/+33
\| \| \| \| \| \| \| \|	have a testcase. Differential Revision: https://reviews.llvm.org/D47129 llvm-svn: 333167
*	[DWARFv5] Put the DWO ID in its place.	Paul Robinson	2018-05-22	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \|	In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an attribute on the CU DIE. This changes the size of those headers, so use the parsed size whenever we have one, for simplicitly. Differential Revision: https://reviews.llvm.org/D47158 llvm-svn: 333004
*	[x86] NFC Add some more shuffle-vs-trunc tests	Gabor Buella	2018-05-22	2	-0/+599
\| \| \| \| \| \|	These are related to: https://reviews.llvm.org/D46957 llvm-svn: 332962
*	[DAG] fold FP binops with undef operands to NaN	Sanjay Patel	2018-05-21	4	-488/+281
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the FP sibling of D43141 with the corresponding IR change in rL327212. We can't propagate undef here because if a variable operand is a NaN, these binops must propagate NaN. Neither global nor node-level fast-math makes a difference. If we have 'nnan', I think later folds can turn the NaN into undef. The tests in X86/fp-undef.ll are meant to be the definitive verification for these folds - everything reduces identically now. The other test changes are collateral damage. They may need to be altered to preserve their intent. Differential Revision: https://reviews.llvm.org/D47026 llvm-svn: 332920
*	[X86] Remove 128/256-bit cvtdq2ps, cvtudq2ps, cvtqq2pd, cvtuqq2pd intrinsics.	Craig Topper	2018-05-21	12	-183/+398
\| \| \| \| \| \|	These can all be implemented with sitofp/uitofp instructions. llvm-svn: 332916
*	[DAGCombiner] isAllOnesConstantOrAllOnesSplatConstant(): look through bitcasts	Roman Lebedev	2018-05-21	1	-17/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: As pointed out in D46528, we errneously transform cases like `xor X, -1`, even though we use said function. It's because the `-1` is actually a bitcast there. So i think we can just look through it in the function. Differential Revision: https://reviews.llvm.org/D47156 llvm-svn: 332905
*	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.	Roman Lebedev	2018-05-21	3	-263/+241
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904
*	[X86][AArch64][NFC] Add tests for vector masked merge unfolding	Roman Lebedev	2018-05-21	3	-1/+5312
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`). Differential Revision: https://reviews.llvm.org/D46008 llvm-svn: 332903
*	[DAGCombiner] Use computeKnownBits to match rotate patterns that have had ↵	Craig Topper	2018-05-21	2	-50/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	their amount masking modified by simplifyDemandedBits SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates. This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work. As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this. I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it. Differential Revision: https://reviews.llvm.org/D47116 llvm-svn: 332895
*	[X86] Remove some unneeded check lines that I copy and pasted when I made ↵	Craig Topper	2018-05-21	1	-22/+0
\| \| \| \| \| \|	vector tests from some scalar test cases. llvm-svn: 332892
*	[X86] Remove masking from vpternlog intrinsics. Use a select in IR instead.	Craig Topper	2018-05-21	8	-254/+969
\| \| \| \| \| \| \| \|	This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics. Differential Revision: https://reviews.llvm.org/D47124 llvm-svn: 332890
*	CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it ↵	Peter Collingbourne	2018-05-21	3	-28/+34
\| \| \| \| \| \| \| \| \| \|	up to dwo output. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47089 llvm-svn: 332881
*	[X86] Add test cases for D47012.	Craig Topper	2018-05-21	2	-0/+702
\| \| \| \| \| \|	Patch by Thomasz Krupa. llvm-svn: 332872
*	[X86] Add test cases for missed vector rotate matching due to ↵	Craig Topper	2018-05-21	1	-0/+114
\| \| \| \| \| \| \| \|	SimplifyDemandedBits interfering with the AND masks As requested in D47116 llvm-svn: 332869
*	[X86] - Avoid SFB pass - fix bug in updating the offsets for newly created ↵	Lama Saba	2018-05-21	1	-0/+107
\| \| \| \| \| \| \|	copies Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688 llvm-svn: 332849
*	[X86][SSE] Support v4i32 rotations (PR37426)	Simon Pilgrim	2018-05-21	4	-389/+270
\| \| \| \| \| \| \| \| \| \|	As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation. v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering. Differential Revision: https://reviews.llvm.org/D46954 llvm-svn: 332832
*	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵	Craig Topper	2018-05-20	19	-164/+478
\| \| \| \| \| \| \| \|	in IR instead. Someday maybe we'll use selects for all intrinsics. llvm-svn: 332824
*	[X86] Add test cases to show missed rotate opportunities due to ↵	Craig Topper	2018-05-20	1	-0/+64
\| \| \| \| \| \|	SimplifyDemandedBits. llvm-svn: 332815
*	[x86] add more FP with FMF simplification tests; NFC	Sanjay Patel	2018-05-18	1	-13/+64
\| \| \| \|	llvm-svn: 332780
*	DAG: Fix crash on shift with large shift amounts	Matt Arsenault	2018-05-18	1	-0/+29
\| \| \| \| \| \|	Fixes bug 37521. llvm-svn: 332774
*	adding baseline fp fold tests for unsafe on and off	Michael Berg	2018-05-18	1	-0/+62
\| \| \| \|	llvm-svn: 332756
*	[X86] Add GPR<->XMM Schedule Tags	Simon Pilgrim	2018-05-18	4	-291/+291
\| \| \| \| \| \| \| \| \| \|	BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745
*	[X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB	Craig Topper	2018-05-18	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel. This patch adds a custom legalization hook to morph it to a vXi8 vselect instead. This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly. Fixes PR37499 Differential Revision: https://reviews.llvm.org/D47025 llvm-svn: 332743
*	Revert changes from D46265.	Than McIntosh	2018-05-18	1	-159/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This is a revert of the changes from https://reviews.llvm.org/D46265; the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot failures. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47061 llvm-svn: 332742
*	[X86] Update fast-isel test cases for _mm256_mask_cvtepi16_epi8 to match ↵	Craig Topper	2018-05-18	1	-14/+6
\| \| \| \| \| \|	clang r332738. llvm-svn: 332740