bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] SET0 to use XMM registers where possible PR26018 PR32862	Dinar Temirbulatov	2017-08-03	1	-8/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926
*	Delete Default and JITDefault code models	Rafael Espindola	2017-08-03	3	-20/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911
*	[X86][SSE] Added missing vector logic intrinsic schedules	Simon Pilgrim	2017-08-01	1	-10/+6
\| \| \| \| \| \| \| \|	Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715
*	[X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask	Craig Topper	2017-08-01	1	-5/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706
*	[X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules	Simon Pilgrim	2017-08-01	3	-8/+10
\| \| \| \| \| \| \| \|	Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701
*	[X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules	Simon Pilgrim	2017-08-01	1	-2/+2
\| \| \| \|	llvm-svn: 309699
*	[AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores ↵	Craig Topper	2017-08-01	1	-13/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693
*	[AVX-512] Add unmasked subvector inserts and extract to the execution domain ↵	Craig Topper	2017-07-31	1	-0/+24
\| \| \| \| \| \|	tables. llvm-svn: 309632
*	[X86][MMX] Added custom lowering action for MMX SELECT (PR30418)	Konstantin Belochapka	2017-07-31	1	-0/+13
\| \| \| \| \| \| \|	Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch Differential Revision: https://reviews.llvm.org/D34661 llvm-svn: 309614
*	[AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer ↵	Craig Topper	2017-07-31	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589
*	Strip trailing whitespace. NFCI.	Simon Pilgrim	2017-07-31	1	-7/+7
\| \| \| \|	llvm-svn: 309584
*	Fix typo in comment.	Simon Pilgrim	2017-07-31	1	-1/+1
\| \| \| \|	llvm-svn: 309583
*	Do not recombine FMA when that is not needed.	Amaury Sechet	2017-07-31	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: As per title. This creates useless recombines. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33848 llvm-svn: 309578
*	[Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.	Alexey Bataev	2017-07-31	2	-4/+5
\| \| \| \|	llvm-svn: 309563
*	[X86][AVX512] Add masked MOVS[S\|D] patterns	Guy Blank	2017-07-31	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546
*	[X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a ↵	Craig Topper	2017-07-31	1	-0/+4
\| \| \| \| \| \| \| \|	load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535
*	[x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>"	Coby Tayree	2017-07-30	1	-1/+2
\| \| \| \| \| \| \| \| \|	MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same. Test: D35893 Differential Revision: https://reviews.llvm.org/D35892 llvm-svn: 309509
*	[X86] Add addsub intrinsics to the intrinsic lowering table so we have a ↵	Craig Topper	2017-07-30	2	-48/+24
\| \| \| \| \| \|	single set of isel patterns. llvm-svn: 309502
*	[SelectionDAG][X86] CombineBT - more aggressively determine demanded bits	Simon Pilgrim	2017-07-29	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486
*	[MachineOutliner] NFC: Change IsTailCall to a call class + frame class	Jessica Paquette	2017-07-29	2	-88/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit - Removes IsTailCall and replaces it with a target-defined unsigned - Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall - Adds a call class + frame class classification to OutlinedFunction and Candidate respectively This accomplishes a couple things. Firstly, we don't need the notion of tail call in the general outlining algorithm. Secondly, we now can have different "outlining classes" for each candidate within a set of candidates. This will make it easy to add new ways to outline sequences for certain targets and dynamically choose an appropriate cost model for a sequence depending on the context that that sequence lives in. Ultimately, this should get us closer to being able to do something like, say avoid saving the link register when outlining AArch64 instructions. llvm-svn: 309475
*	Remove the unused offset from DBG_VALUE (NFC)	Adrian Prantl	2017-07-28	1	-2/+3
\| \| \| \| \| \| \|	Followup to r309426. rdar://problem/33580047 llvm-svn: 309450
*	[MachineOutliner] NFC: Split up getOutliningBenefit	Jessica Paquette	2017-07-28	2	-20/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is some more cleanup in preparation for some actual functional changes. This splits getOutliningBenefit into two cost functions: getOutliningCallOverhead and getOutliningFrameOverhead. These functions return the number of instructions that would be required to call a specific function and the number of instructions that would be required to construct a frame for a specific funtion. The actual outlining benefit logic is moved into the outliner, which calls these functions. The goal of refactoring getOutliningBenefit is to: - Get us closer to getting rid of the IsTailCall flag - Further split up "target-specific" things and "general algorithm" things llvm-svn: 309356
*	[X86] Fix latent bug in sibcall eligibility logic	Reid Kleckner	2017-07-28	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The X86 tail call eligibility logic was correct when it was written, but the addition of inalloca and argument copy elision broke its assumptions. It was assuming that fixed stack objects were immutable. Currently, we aim to emit a tail call if no arguments have to be re-arranged in memory. This code would trace the outgoing argument values back to check if they are loads from an incoming stack object. If the stack argument is immutable, then we won't need to store it back to the stack when we tail call. Fortunately, stack objects track their mutability, so we can just make the obvious check to fix the bug. This was http://crbug.com/749826 llvm-svn: 309343
*	[X86] Don't lie about legality to TLI's demanded bits.	Ahmed Bougacha	2017-07-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Like r309323, X86 had a typo where it passed the wrong flags to TLO. Found by inspection; I haven't been able to tickle this into having observable behavior. I don't think it does, given that X86 doesn't have custom demanded bits logic, and the generic logic doesn't have a lot of exposure to illegal constructs. llvm-svn: 309325
*	[X86] SET0 to use XMM registers where possible PR26018 PR32862	Dinar Temirbulatov	2017-07-27	1	-5/+14
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
*	Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.	Andrew V. Tischenko	2017-07-27	1	-0/+11
\| \| \| \| \| \|	Differential Revision https://reviews.llvm.org/D35834 llvm-svn: 309269
*	[X86] Tidyup MaskedLoad/Store mask creation. NFCI.	Simon Pilgrim	2017-07-27	1	-8/+3
\| \| \| \| \| \|	Assign all concat elements to zero and then just replace the first element, instead of setting them all to null and copying everything in. llvm-svn: 309261
*	Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. ↵	Peter Collingbourne	2017-07-26	1	-4/+3
\| \| \| \| \| \| \| \|	NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159
*	DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset	Zvi Rackover	2017-07-26	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adding support for combining power2-strided build_vector's where the first build_vectori's operand is extracted from a non-zero index. Example: v4i32 build_vector((extract_elt V, 1), (extract_elt V, 3), (extract_elt V, 5), (extract_elt V, 7)) --> v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64) Reviewers: delena, RKSimon, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35700 llvm-svn: 309108
*	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.	Michael Zuckerman	2017-07-26	1	-7/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch expands the support of lowerInterleavedStore to 32x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low). The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 32 chars: c0, c1, , c31 m0, m1, , m31 y0, y1, , y31 k0, k1, ., k31 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers: dorit Farhana RKSimon guyblank DavidKreitzer Differential Revision: https://reviews.llvm.org/D34601 llvm-svn: 309086
*	TargetLowering: Change isShuffleMaskLegal's mask argument type to ↵	Zvi Rackover	2017-07-26	2	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085
*	[X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵	Michael Zuckerman	2017-07-26	2	-47/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86InterleavedAccess part1. splitting patch D34601 into two part. This part changes the location of two functions. The second part will be based on that patch. This was requested by @RKSimon. Reviewers: 1. dorit 2. Farhana 3. RKSimon 4. guyblank 5. DavidKreitzer llvm-svn: 309084
*	[X86] Prevent selecting masked aligned load instructions if the load should ↵	Craig Topper	2017-07-26	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	be non-temporal Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35712 llvm-svn: 309079
*	Update the comments on default subtargets based on feedback.	Eric Christopher	2017-07-25	1	-2/+3
\| \| \| \|	llvm-svn: 309041
*	Revert "This patch enables the usage of constant Enum identifiers within ↵	Eric Christopher	2017-07-25	1	-55/+21
\| \| \| \| \| \| \| \|	Microsoft style inline assembly statements." This reverts commit r308966. llvm-svn: 309005
*	[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)	Simon Pilgrim	2017-07-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
*	X86 Asm uses assertions instead of proper diagnostic. This patch fixes that.	Andrew V. Tischenko	2017-07-25	1	-23/+57
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35115 llvm-svn: 308972
*	This patch enables the usage of constant Enum identifiers within Microsoft ↵	Matan Haroush	2017-07-25	1	-21/+55
\| \| \| \| \| \| \| \| \| \|	style inline assembly statements. Differential Revision: https://reviews.llvm.org/D33277 https://reviews.llvm.org/D33278 llvm-svn: 308966
*	Revert "[X86][InlineAsm][Ms Compatibility]Prefer variable name over a ↵	Reid Kleckner	2017-07-24	1	-12/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	register when the two collides" This reverts r308867 and r308866. It broke the sanitizer-windows buildbot on C++ code similar to the following: namespace cl { } void f() { __asm { mov al, cl } } t.cpp(4,13): error: unexpected namespace name 'cl': expected expression mov al, cl ^ In this case, MSVC parses 'cl' as a register, not a namespace. llvm-svn: 308926
*	[X86][AVX512] Add patterns for masked AVX512 floating point compare ↵	Ayman Musa	2017-07-24	1	-1/+52
\| \| \| \| \| \| \| \| \| \| \|	instructions that were missing. patterns were missed by D33188. Adding for completion. +Updating test. Differential Revesion: https://reviews.llvm.org/D35179 llvm-svn: 308868
*	[X86][InlineAsm][Ms Compatibility]Prefer variable name over a register when ↵	Coby Tayree	2017-07-24	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the two collides On MS-style, the following snippet: int eax; __asm mov eax, ebx should yield loading of ebx, into the location pointed by the variable eax This patch sees to it. Currently, a reg-to-reg move would have been invoked. clang: D34740 Differential Revision: https://reviews.llvm.org/D34739 llvm-svn: 308866
*	[CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos ↵	Petr Hosek	2017-07-23	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	optimization Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D35748 llvm-svn: 308854
*	[X86] Add some hasSideEffects=0 flags.	Craig Topper	2017-07-23	2	-1/+4
\| \| \| \|	llvm-svn: 308835
*	[X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful ↵	Craig Topper	2017-07-23	2	-6/+59
\| \| \| \| \| \| \| \|	complexity adjustment to keep shift by immediate using the legacy instructions. These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns. llvm-svn: 308834
*	[X86] Add nopq instruction which is a rex encoded version of nopl for gas ↵	Craig Topper	2017-07-22	1	-0/+4
\| \| \| \| \| \|	compatibility. llvm-svn: 308818
*	[X86] Add register form of NOPL and NOPW for assembler/disassembler.	Craig Topper	2017-07-22	1	-0/+5
\| \| \| \| \| \|	Fixes PR32805. llvm-svn: 308817
*	X86InterleaveAccess: A fix for bug33826	Farhana Aleen	2017-07-21	1	-13/+18
\| \| \| \| \| \| \| \|	Reviewers: DavidKreitzer Differential Revision: https://reviews.llvm.org/D35638 llvm-svn: 308784
*	[SystemZ, LoopStrengthReduce]	Jonas Paulsson	2017-07-21	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
*	[X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSK	Simon Pilgrim	2017-07-21	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction. This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results. In future we could probably generalize this to handle more cases. Differential Revision: https://reviews.llvm.org/D35303 llvm-svn: 308723
*	[AVX-512] Fix a bug that prevented some non-temporal loads from using the ↵	Craig Topper	2017-07-21	1	-3/+3
\| \| \| \| \| \| \| \|	movntdqa instruction. The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits. llvm-svn: 308702