bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target ↵	Simon Pilgrim	2018-09-29	1	-18/+18
\| \| \| \| \| \| \| \|	shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390
*	[X86][SSE] LowerScalarImmediateShift - remove 32-bit vXi64 special case ↵	Simon Pilgrim	2018-09-29	1	-129/+65
\| \| \| \| \| \| \| \|	handling. This is all handled generally by getTargetConstantBitsFromNode now llvm-svn: 343387
*	Fix signed/unsigned mismatch warning. NFCI.	Simon Pilgrim	2018-09-29	1	-1/+1
\| \| \| \|	llvm-svn: 343385
*	[X86] getTargetConstantBitsFromNode - add support for rearranging constant ↵	Simon Pilgrim	2018-09-29	1	-0/+47
\| \| \| \| \| \| \| \|	bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384
*	[X86][SSE] LowerScalarImmediateShift - use getTargetConstantBitsFromNode to ↵	Simon Pilgrim	2018-09-29	1	-64/+74
\| \| \| \| \| \| \| \| \| \|	get immediate data Don't just attempt to find a splat build vector. First step towards getting rid of all the 32-bit special case code. llvm-svn: 343383
*	[X86] getTargetConstantBitsFromNode - fix self-move assertions from gcc ↵	Simon Pilgrim	2018-09-29	1	-2/+6
\| \| \| \| \| \|	builds due to rL343375 llvm-svn: 343377
*	[X86] getTargetConstantBitsFromNode - add support for peeking through ↵	Simon Pilgrim	2018-09-29	1	-0/+15
\| \| \| \| \| \|	ISD::EXTRACT_SUBVECTOR llvm-svn: 343375
*	[X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targets	Simon Pilgrim	2018-09-29	1	-4/+3
\| \| \| \| \| \|	The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373
*	[X86][Btver2] PSUBS/PSUBUS instructions are zero-idioms	Simon Pilgrim	2018-09-28	1	-0/+9
\| \| \| \| \| \|	Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB llvm-svn: 343321
*	[X86][Btver2] CVTSS2I/CVTSD2I - add missing JFPU0 pipe	Simon Pilgrim	2018-09-28	1	-2/+2
\| \| \| \| \| \| \| \|	We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe) Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343314
*	[X86][Btver2] Fix BSF/BSR schedule	Simon Pilgrim	2018-09-28	1	-2/+2
\| \| \| \| \| \| \| \|	Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311
*	[X86][BtVer2] Fix PHMINPOS schedule resources typo	Simon Pilgrim	2018-09-28	1	-1/+1
\| \| \| \| \| \|	PHMINPOS can run on either JFPU pipe llvm-svn: 343299
*	[X86] Remove BT/BTC/BTR/BTS rr/ri overrides	Simon Pilgrim	2018-09-27	1	-4/+3
\| \| \| \|	llvm-svn: 343241
*	[X86][Btver2] (V)MPSADBW instructions take 3uops not 1	Simon Pilgrim	2018-09-27	1	-1/+1
\| \| \| \|	llvm-svn: 343238
*	[X86][Btver2] BTC/BTR/BTS instructions take 2uops not 1	Simon Pilgrim	2018-09-27	1	-1/+1
\| \| \| \|	llvm-svn: 343234
*	[X86] Split BT and BTC/BTR/BTS scheduler classes	Simon Pilgrim	2018-09-27	11	-28/+33
\| \| \| \|	llvm-svn: 343233
*	[X86][Btver2] BLSI/BLSMSK/BLSR instructions take 2uops not 1 (same as TZCNT)	Simon Pilgrim	2018-09-27	1	-1/+1
\| \| \| \|	llvm-svn: 343227
*	[X86][Btver2] TZCNT instructions take 2uops not 1	Simon Pilgrim	2018-09-27	1	-1/+1
\| \| \| \|	llvm-svn: 343200
*	[X86][Btver2] Add uops counter for exegesis reports	Simon Pilgrim	2018-09-27	1	-0/+1
\| \| \| \|	llvm-svn: 343194
*	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)	Fangrui Song	2018-09-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
*	[X86][SSE] canReduceVMulWidth - use ComputeNumSignBits/SignBitIsZero directly	Simon Pilgrim	2018-09-26	1	-17/+1
\| \| \| \| \| \|	Don't reinvent the wheel for BUILD_VECTOR/ZERO_EXTEND - its only the ANY_EXTEND special case that needs handling. llvm-svn: 343096
*	[llvm-exegesis] Add support for measuring NumMicroOps.	Clement Courbet	2018-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Example output for vzeroall: --- mode: uops key: instructions: - 'VZEROALL' config: '' register_initial_values: cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006, key: '3' } - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011, key: '4' } - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004, key: '5' } - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018, key: '6' } - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002, key: '7' } - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019, key: '8' } - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033, key: '9' } - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001, key: '10' } - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069, key: NumMicroOps } error: '' info: '' assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3 ... Reviewers: gchatelet Subscribers: tschuett, RKSimon, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D52539 llvm-svn: 343094
*	[X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151)	Simon Pilgrim	2018-09-26	1	-0/+24
\| \| \| \| \| \| \| \| \| \|	Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093
*	[X86] Allow movmskpd/ps ISD nodes to be created and selected with integer ↵	Craig Topper	2018-09-25	2	-18/+24
\| \| \| \| \| \| \| \| \| \|	input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046
*	[X86] combineUIntToFP - Fix UINT_TO_FP(vXi1) comment (PR39078)	Simon Pilgrim	2018-09-25	1	-1/+1
\| \| \| \|	llvm-svn: 343026
*	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)	Sanjay Patel	2018-09-25	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008
*	[X86] Add AVX512 support to combineVectorSizedSetCCEquality.	Craig Topper	2018-09-25	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989
*	Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides	Simon Pilgrim	2018-09-25	5	-27/+81
\| \| \| \| \| \| \| \|	As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
*	[X86] Don't create FILD ISD nodes when X87 is disabled.	Craig Topper	2018-09-25	1	-1/+2
\| \| \| \| \| \| \| \|	The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934
*	[X86] Remove superfluous curly braces. NFC	Craig Topper	2018-09-25	1	-2/+1
\| \| \| \|	llvm-svn: 342933
*	[X86] Update comment. Use 'glued' instead of 'flagged' NFC	Craig Topper	2018-09-25	1	-1/+1
\| \| \| \|	llvm-svn: 342932
*	[X86] Remove shift/rotate by CL memory (RMW) overrides	Simon Pilgrim	2018-09-24	5	-81/+27
\| \| \| \| \| \|	The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
*	[X86] Remove WriteDiv/WriteIDiv schedule overrides - use classes directly. NFCI.	Simon Pilgrim	2018-09-24	4	-125/+70
\| \| \| \| \| \| \| \|	We're missing quite a bit of data for these instruction, removing the overrides makes this obvious - inconsistent reg/mem variants is a concern as well. Also, we have Divider resources (HWDivider etc.) but they aren't actually used consistently. llvm-svn: 342904
*	[X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)	Simon Pilgrim	2018-09-24	11	-373/+183
\| \| \| \| \| \| \| \|	Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
*	[X86] Split WriteShift/WriteRotate schedule classes by CL usage.	Simon Pilgrim	2018-09-23	11	-95/+64
\| \| \| \| \| \|	Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models. llvm-svn: 342852
*	[X86] Remove unnecessary WriteRotate override. NFCI.	Simon Pilgrim	2018-09-23	1	-4/+2
\| \| \| \| \| \|	SNB was the last override for ROT(L\|R)r(1\|i) - they now all use WriteRotate correctly. llvm-svn: 342848
*	Fix line ending mismatches. NFCI.	Simon Pilgrim	2018-09-23	1	-6/+6
\| \| \| \|	llvm-svn: 342847
*	[X86] RORmCL instruction models should match ROLmCL etc.	Simon Pilgrim	2018-09-23	4	-28/+4
\| \| \| \| \| \| \| \|	Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models. Yet another step on the scheduler model cleanup marathon...... llvm-svn: 342846
*	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with ↵	Sanjay Patel	2018-09-23	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844
*	[X86] Added missing RCL/RCR schedule overrides to the generic SNB model	Simon Pilgrim	2018-09-23	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842
*	[X86] Remove unnecessary WriteRotate overrides. NFCI.	Simon Pilgrim	2018-09-23	4	-26/+6
\| \| \| \|	llvm-svn: 342841
*	[X86] Move RORX instructions back to WriteShift schedule class	Simon Pilgrim	2018-09-23	1	-2/+4
\| \| \| \| \| \|	Despite being rotates, these more modern instructions avoid many of the quirks of the regular x86 rotate instructions and consistently have a schedule closer to shifts. llvm-svn: 342839
*	[X86] Add WriteRotate schedule class, splitting off from WriteShift.	Simon Pilgrim	2018-09-23	11	-16/+27
\| \| \| \| \| \| \| \|	NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit. Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward. llvm-svn: 342837
*	[X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI.	Craig Topper	2018-09-23	1	-0/+5
\| \| \| \| \| \| \| \|	Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830
*	[X86] Fix a few typos in comments.	Craig Topper	2018-09-23	1	-2/+2
\| \| \| \|	llvm-svn: 342829
*	[X86] Fix inline expansion for memset in x32	Craig Topper	2018-09-22	2	-23/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Similar to D51893 which was for memcpy Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52063 llvm-svn: 342796
*	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) for ↵	Craig Topper	2018-09-22	1	-8/+15
\| \| \| \| \| \| \| \|	vXi8 vectors. We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove. llvm-svn: 342795
*	[X86] Teach fast isel to use MOV32ri64 for loading an unsigned 32 immediate ↵	Craig Topper	2018-09-21	1	-9/+1
\| \| \| \| \| \| \| \|	into a 64-bit register. Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same. llvm-svn: 342788
*	[X86][Sched] Add zero idiom sched data to the SNB model.	Clement Courbet	2018-09-21	1	-1/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736
*	[X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms.	Andrea Di Biagio	2018-09-21	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S\|D)Yrr and VANDNP(S\|D)Yrr. This is a follow-up of r342555. On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at register-renaming stage. The other opcode has to be executed to set the upper half of the destination YMM. Same for VANDNP(S\|D)Yrr. Differential Revision: https://reviews.llvm.org/D52347 llvm-svn: 342728