bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Fix folding immediates into mac src2	Matt Arsenault	2017-01-11	1	-0/+66
\| \| \| \| \| \| \|	Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
*	Revert "CodeGen: Allow small copyable blocks to "break" the CFG."	Kyle Butt	2017-01-11	57	-420/+205
\| \| \| \| \| \| \| \| \|	This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
*	[ARM] More aggressive matching for vpadd and vpaddl.	Eli Friedman	2017-01-11	2	-18/+234
\| \| \| \| \| \| \| \| \|	The new matchers work after legalization to make them simpler, and to avoid blocking other optimizations. Differential Revision: https://reviews.llvm.org/D27779 llvm-svn: 291693
*	[X86][XOP] Add vpermil2ps target shuffle -> insertps combine test	Simon Pilgrim	2017-01-11	1	-0/+14
\| \| \| \|	llvm-svn: 291690
*	[ARM] Fix test CodeGen/ARM/fpcmp_ueq.ll broken by rL290616	Evgeny Astigeevich	2017-01-11	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Commit rL290616 (https://reviews.llvm.org/rL290616) changed a checking command for the triple arm-apple-darwin in LLVM::CodeGen/ARM/fpcmp_ueq.ll. As a result of the changes the test could fail for the valid generated code. These changes fixes the test to check only instructions we would expect. Differential Revision: https://reviews.llvm.org/D28159 llvm-svn: 291678
*	X86 CodeGen: Optimized pattern for truncate with unsigned saturation.	Elena Demikhovsky	2017-01-11	2	-0/+231
\| \| \| \| \| \| \| \| \|	DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670
*	[X86][AVX512BW] Vectorize v64i8 vector shifts	Simon Pilgrim	2017-01-11	3	-3096/+174
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665
*	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}	Elad Cohen	2017-01-11	4	-9/+108
\| \| \| \| \| \| \| \| \| \| \|	The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660
*	Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor ↵	Craig Topper	2017-01-11	3	-339/+560
\| \| \| \| \| \| \| \|	AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent." Some test appears to be hanging on the build bots. llvm-svn: 291650
*	[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) ↵	Craig Topper	2017-01-11	3	-560/+339
\| \| \| \| \| \|	-> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent. llvm-svn: 291645
*	DAGCombiner: Add hasOneUse checks to fadd/fma combine	Matt Arsenault	2017-01-11	1	-0/+262
\| \| \| \| \| \| \| \|	Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642
*	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) ↵	Hans Wennborg	2017-01-11	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \|	to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640
*	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses	Hans Wennborg	2017-01-11	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630
*	AMDGPU/EG,CM: Add fp16 conversion instructions	Jan Vesely	2017-01-11	4	-35/+49
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622
*	[TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.	Justin Lebar	2017-01-10	3	-2/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously if you had * a function with the fast-math-enabled attr, followed by * a function without the fast-math attr, the second function would inherit the first function's fast-math-ness. This means that mixing fast-math and non-fast-math functions in a module was completely broken unless you explicitly annotated every non-fast-math function with "unsafe-fp-math"="false". This appears to have been broken since r176986 (March 2013), when the resetTargetOptions function was introduced. This patch tests the correct behavior as best we can. I don't think I can test FPDenormalMode and NoTrappingFPMath, because they aren't used in any backends during function lowering. Surprisingly, I also can't find any uses at all of LessPreciseFPMAD affecting generated code. The NVPTX/fast-math.ll test changes are an expected result of fixing this bug. When FMA is disabled, we emit add as "add.rn.f32", which prevents fma combining. Before this patch, fast-math was enabled in all functions following the one which explicitly enabled it on itself, so we were emitting plain "add.f32" where we should have generated "add.rn.f32". Reviewers: mkuper Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28507 llvm-svn: 291618
*	[NVPTX] Add CHECK-LABEL where appropriate to fast-math.ll test.	Justin Lebar	2017-01-10	1	-9/+4
\| \| \| \| \| \| \| \|	Also fix up whitespace. Test-only change. llvm-svn: 291617
*	AMDGPU: Constant fold when immediate is materialized	Matt Arsenault	2017-01-10	1	-0/+858
\| \| \| \| \| \|	In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
*	CodeGen: Allow small copyable blocks to "break" the CFG.	Kyle Butt	2017-01-10	57	-205/+420
\| \| \| \| \| \| \| \| \| \| \|	When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609
*	Make the test accept different OpCode values since it doesn't really care ↵	Douglas Yung	2017-01-10	1	-1/+1
\| \| \| \| \| \| \| \|	about the value. Differential Revision: https://reviews.llvm.org/D28487 llvm-svn: 291605
*	DAG: Avoid OOB when legalizing vector indexing	Matt Arsenault	2017-01-10	16	-654/+999
\| \| \| \| \| \| \| \| \|	If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604
*	[WebAssembly] Only RAUW a constant once in FixFunctionBitcasts	Derek Schuff	2017-01-10	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	When we collect 2 uses of a function in FindUses and then RAUW when we visit the first, we end up visiting the wrapper (because the second was RAUW'd). We still want to use RAUW instead of just Use->set() because it has special handling for Constants, so this patch just ensures that only one use of each constant is added to the work list. Differential Revision: https://reviews.llvm.org/D28504 llvm-svn: 291603
*	AMDGPU: Add tests for HasMultipleConditionRegisters	Matt Arsenault	2017-01-10	1	-0/+161
\| \| \| \| \| \|	This was enabled without many specific tests or the comment. llvm-svn: 291586
*	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions	Michael Zuckerman	2017-01-10	1	-0/+333
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584
*	[AArch64] Add support for lowering bitreverse to the rbit instruction.	Chad Rosier	2017-01-10	2	-24/+33
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28379 llvm-svn: 291575
*	[mips] Fix Mips MSA instrinsics	Simon Dardis	2017-01-10	2	-0/+2957
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The usage of some MIPS MSA instrinsics that took immediates could crash LLVM during lowering. This patch addresses that behaviour. Crucially this patch also makes the use of intrinsics with out of range immediates as producing an internal error. The ld,st instrinsics would trigger an assertion failure for MIPS64 as their lowering would attempt to add an i32 offset to a i64 pointer. Reviewers: vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D25438 llvm-svn: 291571
*	[mips] Honour -mno-odd-spreg for vector splat (again)	Simon Dardis	2017-01-10	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous the lowering of FILL_FW would use the MSA128W register class when performing a vector splat. Instead it should be honouring -mno-odd-spreg and only use the even registers when performing a splat from word to vector register. Logical follow-on from r230235. This fixes PR/31369. A previous commit was missing the test case and had another differential in it. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D28373 llvm-svn: 291566
*	AMD family 17h (znver1) enablement	Craig Topper	2017-01-10	4	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch enables the following 1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu). 2. ISAs that are enabled for "znver1" architecture. 3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used. 4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17. 5. For the time being, it uses the btver2 scheduler model. 6. Test file is updated to check this flag. This item is linked to clang review item https://reviews.llvm.org/D28018 Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits Differential Revision: https://reviews.llvm.org/D28017 llvm-svn: 291543
*	[X86][AVX512VL] Added AVX512VL to 128/256 bit vector shift tests	Simon Pilgrim	2017-01-09	7	-1/+842
\| \| \| \|	llvm-svn: 291488
*	PeepholeOptimizer: Do not replace SubregToReg(bitcast like)	Matthias Braun	2017-01-09	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \|	While we can usually replace bitcast like instructions (MachineInstr::isBitcast()) with a COPY this is not legal if any of the users uses SUBREG_TO_REG to assert the upper bits of the result are zero. Differential Revision: https://reviews.llvm.org/D28474 llvm-svn: 291483
*	Revert r291092 because it introduces a crash.	Michael Kuperstein	2017-01-09	1	-107/+0
\| \| \| \| \| \|	See PR31589 for details. llvm-svn: 291478
*	X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.	Vyacheslav Klochkov	2017-01-09	1	-0/+129
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28087 llvm-svn: 291473
*	AMDGPU: Add Assert[SZ]Ext during argument load creation	Matt Arsenault	2017-01-09	1	-75/+97
\| \| \| \| \| \| \| \| \| \| \|	For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461
*	[X86][AVX512] Enable v16i8/v32i8 vector shifts to use an ↵	Simon Pilgrim	2017-01-09	6	-318/+279
\| \| \| \| \| \| \| \| \| \|	extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets. Cost model updates to follow. llvm-svn: 291451
*	[X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate ↵	Simon Pilgrim	2017-01-09	6	-145/+56
\| \| \| \| \| \| \| \| \| \|	pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw). Cost model updates to follow. llvm-svn: 291445
*	[X86][AVX512DQ] Added AVX512DQ to 128/256 bit vector shift tests	Simon Pilgrim	2017-01-09	6	-84/+215
\| \| \| \|	llvm-svn: 291444
*	[SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.	Bjorn Pettersson	2017-01-09	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441
*	[AVX-512] Change another pattern that was using BLENDM to use masked moves. ↵	Craig Topper	2017-01-09	2	-23/+23
\| \| \| \| \| \|	A future patch will conver it back to BLENDM if its beneficial to register allocation. llvm-svn: 291419
*	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for ↵	Craig Topper	2017-01-09	11	-217/+153
\| \| \| \| \| \| \| \|	vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415
*	[AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect ↵	Craig Topper	2017-01-08	1	-28/+88
\| \| \| \| \| \|	of zeroes/ones when handling sign extends of i1 without VLX. llvm-svn: 291402
*	[X86] Add avx512bw and avx512dq command lines to the vector compare results ↵	Craig Topper	2017-01-08	1	-1498/+4602
\| \| \| \| \| \| \| \|	test. This is preparation for improving a case with avx512dq. llvm-svn: 291401
*	[x86] fix usage of stale operands when lowering select	Sanjay Patel	2017-01-08	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392
*	[AVR] Implement TargetLoweing::getRegisterByName	Dylan McKay	2017-01-07	1	-0/+17
\| \| \| \| \| \| \|	This allows the use of the 'read_register' intrinsics used by clang's named register globals features. llvm-svn: 291375
*	[AVX-512] Remove patterns from the other VBLENDM instructions. They are all ↵	Craig Topper	2017-01-07	12	-172/+300
\| \| \| \| \| \| \| \|	redundant with masked move instructions. We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator. llvm-svn: 291371
*	[X86] Regenerate a test to remove tab characters.	Craig Topper	2017-01-07	1	-4/+4
\| \| \| \|	llvm-svn: 291370
*	[AVX-512] Add masked forms of the alternate MOVDDUP patterns.	Craig Topper	2017-01-07	1	-0/+30
\| \| \| \| \| \|	I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now. llvm-svn: 291368
*	[X86][AVX2] Regenerate arithmetic tests	Simon Pilgrim	2017-01-07	1	-5/+96
\| \| \| \| \| \|	Fixed missing checks for tests that used a '-' in the name, which was messing with update_llc_test_checks.py llvm-svn: 291363
*	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI ↵	Simon Pilgrim	2017-01-07	1	-7/+2
\| \| \| \| \| \|	v64i8 shuffles (PR31470) llvm-svn: 291347
*	[WebAssembly] Don't abort on code with UB.	Dan Gohman	2017-01-07	2	-1/+27
\| \| \| \| \| \| \| \|	Gracefully leave code that performs function-pointer bitcasts implying non-trivial pointer conversions alone, rather than aborting, since it's just undefined behavior. llvm-svn: 291326
*	[WebAssembly] Add a pass to create wrappers for function bitcasts.	Dan Gohman	2017-01-07	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \|	WebAssembly requires caller and callee signatures to match exactly. In LLVM, there are a variety of circumstances where signatures may be mismatched in practice, and one can bitcast a function address to another type to call it as that type. This patch adds a pass which replaces bitcasted function addresses with wrappers to replace the bitcasts. This doesn't catch everything, but it does match many common cases. llvm-svn: 291315
*	AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes	Jan Vesely	2017-01-06	3	-209/+1003
\| \| \| \| \| \| \| \|	This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279