bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM] GlobalISel: Allow i8 and i16 adds	Diana Picus	2016-12-19	1	-1/+6
\| \| \| \| \| \| \| \| \|	Teach the instruction selector and legalizer that it's ok to have adds with 8 or 16-bit integers. This is the second part of https://reviews.llvm.org/D27704 llvm-svn: 290105
*	[ARM] GlobalISel: Select i8 and i16 copies	Diana Picus	2016-12-19	1	-2/+9
\| \| \| \| \| \| \| \| \|	Teach the instruction selector that it's ok to copy small values from physical registers. First part of https://reviews.llvm.org/D27704 llvm-svn: 290104
*	[Power9] Processor Model for Scheduling	Ehsan Amiri	2016-12-19	4	-3/+1145
\| \| \| \| \| \| \| \|	PWR9 processor model for instruction scheduling. A subsequent patch will migrate PWR9 to Post RA MIScheduler. https://reviews.llvm.org/D24525 llvm-svn: 290102
*	[Hexagon] Restore minimum profit check accidentally changed in r290024	Malcolm Parsons	2016-12-19	1	-2/+2
\| \| \| \|	llvm-svn: 290100
*	[ARM] GlobalISel: Lower more than 4 arguments	Diana Picus	2016-12-19	1	-10/+22
\| \| \| \| \| \| \| \| \| \|	This adds support for lowering more than 4 arguments (although still i32 only). It uses the handleAssignments / ValueHandler infrastructure extracted from the AArch64 backend in r288658. Differential Revision: https://reviews.llvm.org/D27195 llvm-svn: 290098
*	AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for ↵	Sam Kolton	2016-12-19	4	-72/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functime metadata V2.0 Summary: Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata. Between them user can put YAML string that would be directly put to the generated note. E.g.: ''' .hsa_code_object_metadata { amd.MDVersion: [ 2, 0 ] } .end_hsa_code_object_metadata ''' Based on D25046 Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye Differential Revision: https://reviews.llvm.org/D27619 llvm-svn: 290097
*	[ARM] GlobalISel: Support loading from the stack	Diana Picus	2016-12-19	3	-10/+45
\| \| \| \| \| \| \| \| \| \|	Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit scalars only). This will be useful for functions that need to pass arguments on the stack. First part of https://reviews.llvm.org/D27195. llvm-svn: 290096
*	[X86] When recognizing vector loads or VZEXT_LOAD in selectScalarSSELoad ↵	Craig Topper	2016-12-19	1	-2/+2
\| \| \| \| \| \|	make sure we pass the load's user rather than load itself to the second operand of IsLegalToFold. llvm-svn: 290089
*	[X86] Remove all of the patterns that use X86ISD:FAND/FXOR/FOR/FANDN except ↵	Craig Topper	2016-12-19	2	-131/+42
\| \| \| \| \| \| \| \|	for the ones needed for SSE1. Anything SSE2 or above uses the integer ISD opcode. This removes 11721 bytes from the DAG isel table or 2.2% llvm-svn: 290073
*	Revert r289955 and r289962. This is causing lots of ASAN failures for us.	Daniel Jasper	2016-12-18	1	-22/+10
\| \| \| \| \| \| \| \|	Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066
*	[X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC.	Michael Zuckerman	2016-12-18	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Commit on behalf of Gadi Haber Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified. The changed encodings are validated with XED. Rviewers: delena, igorb Differential revision: https://reviews.llvm.org/D27802 llvm-svn: 290065
*	[X86][SSE] Add support for combining target shuffles to SHUFPS.	Simon Pilgrim	2016-12-18	1	-2/+108
\| \| \| \| \| \|	As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
*	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations ↵	Craig Topper	2016-12-18	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \|	if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060
*	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵	Craig Topper	2016-12-18	3	-5/+22
\| \| \| \| \| \| \| \|	DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
*	[AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops ↵	Craig Topper	2016-12-18	2	-2/+2
\| \| \| \| \| \|	for scalars. I missed this in r290049. llvm-svn: 290055
*	[AVX-512] Use EVEX encoded logic operations for scalar types when they are ↵	Craig Topper	2016-12-17	2	-1/+38
\| \| \| \| \| \|	available. This gives the register allocator more registers to work with. llvm-svn: 290049
*	Revert "AArch64CollectLOH: Rewrite as block-local analysis."	Matthias Braun	2016-12-17	1	-279/+841
\| \| \| \| \| \| \| \|	It is still breaking Chrome. http://llvm.org/PR31361 This reverts commit r290026. llvm-svn: 290047
*	[Hexagon] Other attempt to fix build with enabled asserts broken in 290024 ↵	Eugene Zelenko	2016-12-17	1	-0/+1
\| \| \| \| \| \|	(NFC). llvm-svn: 290028
*	[Hexagon] Fix build with enabled asserts broken in 290024 (NFC).	Eugene Zelenko	2016-12-17	1	-0/+1
\| \| \| \|	llvm-svn: 290027
*	AArch64CollectLOH: Rewrite as block-local analysis.	Matthias Braun	2016-12-17	1	-841/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r288561: Liveness tracking should be correct now after r290014. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 290026
*	[Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2016-12-17	11	-163/+220
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 290024
*	AArch64: Enable post-ra liveness updates	Matthias Braun	2016-12-16	3	-1/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
*	Implement LaneBitmask::any(), use it to replace !none(), NFCI	Krzysztof Parzyszek	2016-12-16	5	-11/+11
\| \| \| \|	llvm-svn: 289974
*	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.	Eli Friedman	2016-12-16	3	-19/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
*	AMDGPU: Fix name for v_ashrrev_i16	Matt Arsenault	2016-12-16	1	-3/+3
\| \| \| \|	llvm-svn: 289967
*	Fix -Wself-assign from r289955	Hans Wennborg	2016-12-16	1	-7/+8
\| \| \| \|	llvm-svn: 289962
*	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, ↵	Hans Wennborg	2016-12-16	1	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955
*	[X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling ↵	Simon Pilgrim	2016-12-16	1	-3/+4
\| \| \| \| \| \| \| \|	DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950
*	[X86][AVX512] use a single shufps for 512-bit vectors when it can save ↵	Simon Pilgrim	2016-12-16	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \|	instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946
*	[GlobalISel] Silence unused variable warnings in Release builds.	Benjamin Kramer	2016-12-16	1	-5/+4
\| \| \| \|	llvm-svn: 289941
*	[ARM] GlobalISel: Select add i32, i32	Diana Picus	2016-12-16	7	-9/+299
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add the minimal support necessary to select a function that returns the sum of two i32 values. This includes some support for argument/return lowering of i32 values through registers, as well as the handling of copy and add instructions throughout the GlobalISel pipeline. Differential Revision: https://reviews.llvm.org/D26677 llvm-svn: 289940
*	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.	Simon Pilgrim	2016-12-16	1	-9/+9
\| \| \| \| \| \|	We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937
*	[ARM] Expose methods to get the CCAssignFn. NFCI	Diana Picus	2016-12-16	2	-17/+21
\| \| \| \| \| \| \| \| \| \| \|	Add two public methods to ARMTargetLowering: CCAssignFnForCall and CCAssignFnForReturn, which are just calling the already existing private method CCAssignFnForNode. These will come in handy for GlobalISel on ARM. We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because the new methods are friendlier to the reader. llvm-svn: 289932
*	Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and ↵	Chandler Carruth	2016-12-16	1	-25/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	tail-call eligibility) This patch appears to result in trampolines in vtables being miscompiled when they in turn tail call a method. I've posted some preliminary details about the failure on the thread for this commit and talked to Hal. He was comfortable going ahead and reverting until we sort out what is wrong. llvm-svn: 289928
*	Revert 279703, it caused PR31404.	Nico Weber	2016-12-16	3	-92/+19
\| \| \| \|	llvm-svn: 289923
*	[Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2016-12-16	6	-151/+235
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 289907
*	[AArch64] Add FeatureSlowMisaligned128Store to Exynos M1 and M2	Evandro Menezes	2016-12-16	1	-0/+2
\| \| \| \| \| \| \|	This feature now gates such stores after r289845. Thus the Exynos processors now need this feature. llvm-svn: 289898
*	AMDGPU: Select branch on undef to uniform scc branch	Matt Arsenault	2016-12-15	3	-0/+21
\| \| \| \|	llvm-svn: 289877
*	AMDGPU: Fix asserting on returned tail calls	Matt Arsenault	2016-12-15	1	-2/+4
\| \| \| \|	llvm-svn: 289868
*	AMDGPU: Assembler support for vintrp instructions	Matt Arsenault	2016-12-15	3	-6/+108
\| \| \| \|	llvm-svn: 289866
*	[GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC.	Ahmed Bougacha	2016-12-15	3	-3/+3
\| \| \| \| \| \| \| \|	MachineLegalizer used to be the name of both the class and the member, causing GCC errors. r276522 fixed that by renaming the member to just 'Legalizer'. The 'class' workaround isn't necessary anymore; drop it. llvm-svn: 289848
*	[x86] use a single shufps for 256-bit vectors when it can save instructions	Sanjay Patel	2016-12-15	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \|	This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
*	[AArch64] Guard Misaligned 128-bit store penalty by subtarget feature	Matthew Simpson	2016-12-15	1	-1/+2
\| \| \| \| \| \| \| \| \|	This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
*	[AArch64][GlobalISel] Remove redundant RBI comments. NFC.	Ahmed Bougacha	2016-12-15	1	-20/+1
\| \| \| \| \| \| \|	It's brittle, and Doxygen already picks the overriden method's comment anyway. llvm-svn: 289844
*	[x86] use a single shufps when it can save instructions	Sanjay Patel	2016-12-15	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
*	[X86][SSE] Fix domains for scalar store instructions	Simon Pilgrim	2016-12-15	1	-0/+4
\| \| \| \| \| \|	As discussed on D27692 llvm-svn: 289834
*	[lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵	Jacques Pienaar	2016-12-15	2	-3/+14
\| \| \| \| \| \| \| \|	sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
*	[X86][AVX512] Moved instruction domain lookups to the right table. NFCI.	Simon Pilgrim	2016-12-15	1	-4/+4
\| \| \| \| \| \|	Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830
*	[X86][SSE] Fix domains for VZEXT_LOAD type instructions	Simon Pilgrim	2016-12-15	1	-0/+6
\| \| \| \| \| \| \| \|	Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
*	Fix for regression after Global Load Scalarization patch	Alexander Timofeev	2016-12-15	1	-1/+2
\| \| \| \|	llvm-svn: 289822