bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AVX-512] Remove unmasked BLENDM instructions from the wrong load folding ↵	Craig Topper	2017-01-13	1	-4/+0
\| \| \| \| \| \| \| \|	table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885
*	[X86] Move some entries in the load folding tables to move appropriate ↵	Craig Topper	2017-01-13	1	-10/+10
\| \| \| \| \| \|	grouping. NFC llvm-svn: 291884
*	[PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2017-01-13	10	-255/+433
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 291872
*	[X86] Replace AND+IMM64 with SRL/SHL	Nikolai Bozhenov	2017-01-12	1	-7/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806
*	[X86] Tune bypassing of slow division for Intel CPUs	Nikolai Bozhenov	2017-01-12	3	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800
*	AMDGPU: Skip fneg/select combine if it can fold into other	Matt Arsenault	2017-01-12	1	-29/+40
\| \| \| \|	llvm-svn: 291792
*	AMDGPU: Fold free fneg into sin	Matt Arsenault	2017-01-12	1	-1/+5
\| \| \| \|	llvm-svn: 291790
*	ARM: slightly more table driven libcall setup	Saleem Abdulrasool	2017-01-12	1	-26/+59
\| \| \| \| \| \| \| \| \|	Switch some additional library call setup to be table driven. This makes it more immediately obvious what the library call looks like. This is important for ARM since the calling conventions for the builtins change based on the target/libcall name. NFC llvm-svn: 291789
*	AMDGPU: Fold fneg into fmul_legacy	Matt Arsenault	2017-01-12	1	-2/+5
\| \| \| \|	llvm-svn: 291784
*	AMDGPU: Fold fneg into rcp	Matt Arsenault	2017-01-12	1	-1/+7
\| \| \| \|	llvm-svn: 291779
*	AMDGPU: Fold fneg into fp_round	Matt Arsenault	2017-01-12	1	-2/+18
\| \| \| \|	llvm-svn: 291778
*	AMDGPU: Fold fneg into fp_extend	Matt Arsenault	2017-01-12	1	-0/+14
\| \| \| \|	llvm-svn: 291777
*	[globalisel] Move as much RegisterBank initialization to the constructor as ↵	Daniel Sanders	2017-01-12	3	-21/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	possible Summary: The register bank is now entirely initialized in the constructor. However, we still have the hardcoded number of register classes which will be dealt with in the TableGen patch (D27338) since we do not have access to this information to resolve this at this stage. The number of register classes is known to the TRI and to TableGen but the RegisterBank constructor is too early for the former and too late for the latter. This will be fixed when the data is tablegen-erated. Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D27809 llvm-svn: 291770
*	[globalisel] Initialize RegisterBanks with static data.	Daniel Sanders	2017-01-12	3	-12/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Refactor the RegisterBank initialization to use static data. This requires GlobalISel implementations to rewrite calls to createRegisterBank() and addRegBankCoverage() into a call to setRegBankData(). Out of tree targets can use diff 4 of D27807 (https://reviews.llvm.org/D27807?id=84117) to have addRegBankCoverage() dump the register classes and other data that needs to be provided to setRegBankData(). This is the method that was used to generate the static data in this patch. Tablegen-eration of this static data will follow after some refactoring. Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D27807 Differential Revision: https://reviews.llvm.org/D27808 llvm-svn: 291768
*	AMDGPU: Fix sub_oneuse being marked commutative	Matt Arsenault	2017-01-12	1	-1/+2
\| \| \| \|	llvm-svn: 291748
*	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵	Craig Topper	2017-01-12	1	-4/+4
\| \| \| \| \| \|	with VLX, but no DQ or BW support. llvm-svn: 291747
*	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵	Craig Topper	2017-01-12	1	-11/+13
\| \| \| \| \| \|	when avx512vl is available, but not avx512dq. llvm-svn: 291746
*	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 ↵	Elad Cohen	2017-01-12	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745
*	AMDGPU: Fold fneg into fma or fmad	Matt Arsenault	2017-01-12	1	-0/+24
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291733
*	AMDGPU: Fold fneg into fmul	Matt Arsenault	2017-01-12	1	-0/+17
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291732
*	AMDGPU: Fold fneg into fadd	Matt Arsenault	2017-01-12	2	-0/+61
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291731
*	AMDGPU: Pull fneg/fabs out of a select	Matt Arsenault	2017-01-11	1	-0/+74
\| \| \| \| \| \|	Allows better source modifier usage. llvm-svn: 291729
*	X86: Remove dead code. NFC.	Peter Collingbourne	2017-01-11	1	-10/+0
\| \| \| \|	llvm-svn: 291721
*	AMDGPU: Fix shrinking of addc/subb.	Matt Arsenault	2017-01-11	1	-7/+25
\| \| \| \| \| \|	To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
*	AMDGPU: Fix sext_inreg for i1 in i16	Matt Arsenault	2017-01-11	1	-0/+5
\| \| \| \| \| \| \| \|	This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
*	AMDGPU: Fix breaking VOP3 v_add_i32s	Matt Arsenault	2017-01-11	1	-1/+11
\| \| \| \| \| \| \|	This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
*	AMDGPU: Fix folding immediates into mac src2	Matt Arsenault	2017-01-11	1	-2/+30
\| \| \| \| \| \| \|	Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
*	[ARM] More aggressive matching for vpadd and vpaddl.	Eli Friedman	2017-01-11	1	-4/+104
\| \| \| \| \| \| \| \| \|	The new matchers work after legalization to make them simpler, and to avoid blocking other optimizations. Differential Revision: https://reviews.llvm.org/D27779 llvm-svn: 291693
*	Remove trailing whitespace. NFCI.	Simon Pilgrim	2017-01-11	1	-3/+3
\| \| \| \|	llvm-svn: 291680
*	[SystemZ] Improve isFoldableMemAccessOffset().	Jonas Paulsson	2017-01-11	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \|	A store of an extracted element or a load which gets inserted into a vector, will be combined into a vector load/store element instruction. Therefore, isFoldableMemAccessOffset(), which is called by LSR, should return false in these cases. Reviewer: Ulrich Weigand llvm-svn: 291673
*	X86 CodeGen: Optimized pattern for truncate with unsigned saturation.	Elena Demikhovsky	2017-01-11	1	-0/+97
\| \| \| \| \| \| \| \| \|	DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670
*	[AMDGPU] Assembler: SDWA/DPP should not accept scalar registers and ↵	Sam Kolton	2017-01-11	5	-39/+133
\| \| \| \| \| \| \| \| \| \| \| \|	immediate operands Reviewers: artem.tamazov, nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28157 llvm-svn: 291668
*	[X86][AVX512BW] Vectorize v64i8 vector shifts	Simon Pilgrim	2017-01-11	2	-4/+20
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665
*	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}	Elad Cohen	2017-01-11	2	-1/+112
\| \| \| \| \| \| \| \| \| \| \|	The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660
*	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.	Mohammed Agabaria	2017-01-11	15	-17/+73
\| \| \| \| \| \| \| \| \| \| \| \|	updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
*	[Target] Fix some Clang-tidy modernize and Include What You Use warnings; ↵	Eugene Zelenko	2017-01-11	1	-11/+30
\| \| \| \| \| \|	other minor fixes (NFC). llvm-svn: 291641
*	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) ↵	Hans Wennborg	2017-01-11	1	-10/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640
*	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses	Hans Wennborg	2017-01-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630
*	AMDGPU/EG,CM: Add fp16 conversion instructions	Jan Vesely	2017-01-11	1	-1/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622
*	[TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.	Justin Lebar	2017-01-10	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously if you had * a function with the fast-math-enabled attr, followed by * a function without the fast-math attr, the second function would inherit the first function's fast-math-ness. This means that mixing fast-math and non-fast-math functions in a module was completely broken unless you explicitly annotated every non-fast-math function with "unsafe-fp-math"="false". This appears to have been broken since r176986 (March 2013), when the resetTargetOptions function was introduced. This patch tests the correct behavior as best we can. I don't think I can test FPDenormalMode and NoTrappingFPMath, because they aren't used in any backends during function lowering. Surprisingly, I also can't find any uses at all of LessPreciseFPMAD affecting generated code. The NVPTX/fast-math.ll test changes are an expected result of fixing this bug. When FMA is disabled, we emit add as "add.rn.f32", which prevents fma combining. Before this patch, fast-math was enabled in all functions following the one which explicitly enabled it on itself, so we were emitting plain "add.f32" where we should have generated "add.rn.f32". Reviewers: mkuper Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28507 llvm-svn: 291618
*	[AArch64] Consider all vector types for FeatureSlowMisaligned128Store	Evandro Menezes	2017-01-10	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \|	The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates. In internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1. Differential revision: https://reviews.llvm.org/D27998 llvm-svn: 291616
*	AMDGPU: Constant fold when immediate is materialized	Matt Arsenault	2017-01-10	1	-141/+228
\| \| \| \| \| \|	In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
*	[WebAssembly] Only RAUW a constant once in FixFunctionBitcasts	Derek Schuff	2017-01-10	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \|	When we collect 2 uses of a function in FindUses and then RAUW when we visit the first, we end up visiting the wrapper (because the second was RAUW'd). We still want to use RAUW instead of just Use->set() because it has special handling for Constants, so this patch just ensures that only one use of each constant is added to the work list. Differential Revision: https://reviews.llvm.org/D28504 llvm-svn: 291603
*	[ARM] Remove rbit intrinsics and autoupgrade to generic bitreverse.	Chad Rosier	2017-01-10	1	-5/+0
\| \| \| \| \| \|	Testing already covered by CodeGen/ARM/rbit.ll llvm-svn: 291587
*	AMDGPU: Add tests for HasMultipleConditionRegisters	Matt Arsenault	2017-01-10	1	-0/+7
\| \| \| \| \| \|	This was enabled without many specific tests or the comment. llvm-svn: 291586
*	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions	Michael Zuckerman	2017-01-10	1	-6/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584
*	[AArch64] Add support for lowering bitreverse to the rbit instruction.	Chad Rosier	2017-01-10	2	-4/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28379 llvm-svn: 291575
*	[mips] Fix Mips MSA instrinsics	Simon Dardis	2017-01-10	1	-17/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The usage of some MIPS MSA instrinsics that took immediates could crash LLVM during lowering. This patch addresses that behaviour. Crucially this patch also makes the use of intrinsics with out of range immediates as producing an internal error. The ld,st instrinsics would trigger an assertion failure for MIPS64 as their lowering would attempt to add an i32 offset to a i64 pointer. Reviewers: vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D25438 llvm-svn: 291571
*	[mips] Honour -mno-odd-spreg for vector splat (again)	Simon Dardis	2017-01-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous the lowering of FILL_FW would use the MSA128W register class when performing a vector splat. Instead it should be honouring -mno-odd-spreg and only use the even registers when performing a splat from word to vector register. Logical follow-on from r230235. This fixes PR/31369. A previous commit was missing the test case and had another differential in it. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D28373 llvm-svn: 291566
*	Revert "[mips] Honour -mno-odd-spreg for vector splat"	Simon Dardis	2017-01-10	2	-18/+2
\| \| \| \| \| \| \|	This reverts commit r291556. It was a mixture of two differentials and was missing a test. llvm-svn: 291562