bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[PowerPC] Refactor popcnt[dw] target features	Hal Finkel	2016-03-29	5	-18/+28
\| \| \| \| \| \| \| \| \|	Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690
*	[PowerPC] Clarify a comment in PPCTTI about vector loads	Hal Finkel	2016-03-28	1	-1/+1
\| \| \| \| \| \| \|	This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. llvm-svn: 264683
*	[PowerPC] On the A2, popcnt[dw] are very slow	Hal Finkel	2016-03-28	5	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600
*	[Power9] Implement new altivec instructions: bcd* series	Chuang-Yu Cheng	2016-03-28	3	-0/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568
*	[Power9] Implement new vsx instructions: insert, extract, test data class, ↵	Chuang-Yu Cheng	2016-03-28	7	-0/+345
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567
*	[Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic	Chuang-Yu Cheng	2016-03-28	2	-0/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565
*	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵	Hal Finkel	2016-03-27	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
*	[PowerPC] Disable the CTR optimization in the presence of {min,max}num	David Majnemer	2016-03-26	1	-0/+2
\| \| \| \| \| \| \| \| \|	The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
*	[Power9] Implement new altivec instructions: permute, count zero, extend ↵	Chuang-Yu Cheng	2016-03-26	3	-0/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504
*	Finish the incomplete 'd' inline asm constraint support for PPC by	Eric Christopher	2016-03-24	1	-0/+5
\| \| \| \| \| \|	making sure we give it a register and mark it as a register constraint. llvm-svn: 264340
*	[PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode	Nemanja Ivanovic	2016-03-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D17711 It disables direct moves on these operations in 32-bit mode since the patterns assume 64-bit registers. The final patch is slightly different from the Phabricator review as the bitcast operations needed to be disabled in 32-bit mode as well. This fixes PR26617. llvm-svn: 264282
*	Codegen: [PPC] Word Rotates are Zero Extending.	Kyle Butt	2016-03-23	1	-1/+8
\| \| \| \| \| \| \|	Add Word rotates to the list of instructions that are zero extending. This allows them to be used in dot form to compare with zero. llvm-svn: 264183
*	adding another optimization opportunity to readme file	Ehsan Amiri	2016-03-18	1	-0/+11
\| \| \| \|	llvm-svn: 263775
*	[PPC, FastISel] Fix ordered/unordered fcmp	Tim Shen	2016-03-17	1	-7/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. llvm-svn: 263753
*	[PowerPC] Disable CTR loops optimization for soft float operations	Petar Jovanovic	2016-03-17	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 llvm-svn: 263727
*	Tweak some atomics functions in preparation for larger changes; NFC.	James Y Knight	2016-03-16	2	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665
*	[DAG] use !isUndef() ; NFCI	Sanjay Patel	2016-03-14	1	-1/+1
\| \| \| \|	llvm-svn: 263453
*	[DAG] use isUndef() ; NFCI	Sanjay Patel	2016-03-14	1	-8/+8
\| \| \| \|	llvm-svn: 263448
*	Fix for PR 26378	Nemanja Ivanovic	2016-03-12	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). llvm-svn: 263338
*	[PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions	Kit Barton	2016-03-09	1	-0/+2
\| \| \| \| \| \| \|	This has to be committed before the FE changes Phabricator: http://reviews.llvm.org/D17837 llvm-svn: 263035
*	[Power9] Implement new vsx instructions: load, store instructions for vector ↵	Kit Barton	2016-03-08	7	-0/+214
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and scalar We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to implement this new patch. This patch implements the following vsx instructions: Vector load/store: lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx stxv stxvb16x stxvh8x stxvl stxvll stxvx Scalar load/store: lxsd lxssp lxsibzx lxsihzx stxsd stxssp stxsibx stxsihx 21 instructions Phabricator: http://reviews.llvm.org/D16919 llvm-svn: 262906
*	A couple more UB fixes for C++14 sized deallocation.	Richard Smith	2016-03-08	1	-0/+4
\| \| \| \|	llvm-svn: 262891
*	[PPCVSXFMAMutate] Temporarily disable this pass	Tim Shen	2016-03-03	1	-2/+8
\| \| \| \|	llvm-svn: 262573
*	[Power9] Implement new vector compare, extract, insert instructions	Kit Barton	2016-03-01	2	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change implements the following vector operations: - Vector Compare Not Equal - vcmpneb(.) vcmpneh(.) vcmpnew(.) - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.) - Vector Extract Unsigned - vextractub vextractuh vextractuw vextractd - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx - Vector Insert - vinsertb vinserth vinsertw vinsertd 26 instructions. Phabricator: http://reviews.llvm.org/D15916 llvm-svn: 262392
*	New file to track implementation status of new POWER9 instructions	Kit Barton	2016-03-01	1	-0/+442
\| \| \| \|	llvm-svn: 262386
*	TableGen: Check scheduling models for completeness	Matthias Braun	2016-03-01	7	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384
*	Fix for PR26180	Nemanja Ivanovic	2016-02-29	3	-6/+6
\| \| \| \| \| \| \| \| \| \|	Corresponds to Phabricator review: http://reviews.llvm.org/D16592 This fix includes both an update to how we handle the "generic" CPU on LE systems as well as Anton's fix for the Fast Isel issue. llvm-svn: 262233
*	CodeGen: Change MachineInstr to use MachineInstr&, NFC	Duncan P. N. Exon Smith	2016-02-27	1	-3/+3
\| \| \| \| \| \| \| \|	Change MachineInstr API to prefer MachineInstr& over MachineInstr* whenever the parameter is expected to be non-null. Slowly inching toward being able to fix PR26753. llvm-svn: 262149
*	CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC	Duncan P. N. Exon Smith	2016-02-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753). At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs. llvm-svn: 262115
*	[PPC] Legalize FNEG on PPC when possible	Kit Barton	2016-02-26	1	-0/+3
\| \| \| \| \| \| \| \|	Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has native support for this opcode Phabricator: http://reviews.llvm.org/D17647 llvm-svn: 262079
*	Power9] Implement new vsx instructions: compare and conversion	Kit Barton	2016-02-26	6	-0/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change implements the following vsx instructions: Quad/Double-Precision Compare: xscmpoqp xscmpuqp xscmpexpdp xscmpexpqp xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp xvcmpnedp(.) xvcmpnesp(.) Quad-Precision Floating-Point Conversion xscvqpdp(o) xscvdpqp xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp xscvdphp xscvhpdp xvcvhpsp xvcvsphp xsrqpi xsrqpix xsrqpxp 28 instructions Phabricator: http://reviews.llvm.org/D16709 llvm-svn: 262068
*	Silencing a signed vs unsigned mismatch.	Aaron Ballman	2016-02-23	1	-1/+1
\| \| \| \|	llvm-svn: 261640
*	CodeGen: TII: Take MachineInstr& in predicate API, NFC	Duncan P. N. Exon Smith	2016-02-23	2	-69/+66
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605
*	Fix for PR26690 take 2	Nemanja Ivanovic	2016-02-22	1	-1/+1
\| \| \| \| \| \| \| \|	This is what was meant to be in the initial commit to fix this bug. The parens were missing. This commit also adds a test case for the bug and has undergone full testing on PPC and X86. llvm-svn: 261546
*	Revert bad fix for PR26690.	Nemanja Ivanovic	2016-02-22	1	-1/+1
\| \| \| \|	llvm-svn: 261527
*	Fix for PR26690	Nemanja Ivanovic	2016-02-22	1	-1/+1
\| \| \| \| \| \| \|	I mistook BitVector::empty() to mean BitVector::count() == 0 and it does not. Corrected the issue with the fix for PR26500. llvm-svn: 261525
*	Fix some abuse of auto flagged by clang's -Wrange-loop-analysis.	Benjamin Kramer	2016-02-22	1	-5/+5
\| \| \| \|	llvm-svn: 261524
*	Fix the build bot break caused by rL261441.	Nemanja Ivanovic	2016-02-20	1	-5/+11
\| \| \| \| \| \| \| \|	The patch has a necessary call to a function inside an assert. Which is fine when you have asserts turned on. Not so much when they're off. Sorry about the regression. llvm-svn: 261447
*	Fix for PR 26500	Nemanja Ivanovic	2016-02-20	2	-52/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D17294 It ensures that whatever block we are emitting the prologue/epilogue into, we have the necessary scratch registers. It takes away the hard-coded register numbers for use as scratch registers as registers that are guaranteed to be available in the function prologue/epilogue are not guaranteed to be available within the function body. Since we shrink-wrap, the prologue/epilogue may end up in the function body. llvm-svn: 261441
*	Remove uses of builtin comma operator.	Richard Trieu	2016-02-18	2	-22/+30
\| \| \| \| \| \|	Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270
*	[PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC	Adam Nemet	2016-02-18	4	-230/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). Obviously the pass still only used from PPC at this point. Subsequent patches will start driving this from ARM64 as well. Due to the previous patch most lines should show up as moved lines. llvm-svn: 261265
*	[PPCLoopDataPrefetch] Remove PPC from some of the names. NFC	Adam Nemet	2016-02-18	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \|	This is done only to make the next patch that move the pass out PPC to Transforms easier to read. After this most line should show up as moved lines in that patch. This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). llvm-svn: 261264
*	[CodeGen] Document and use getConstant's splat-building feature. NFC.	Ahmed Bougacha	2016-02-15	1	-19/+6
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901
*	[MC] Merge VK_PPC_TPREL in to generic VK_TPREL.	Colin LeMahieu	2016-02-10	2	-7/+7
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17038 llvm-svn: 260401
*	Fix for PR 26193	Nemanja Ivanovic	2016-02-05	1	-1/+1
\| \| \| \| \| \| \|	This is a simple fix for a PowerPC intrinsic that was incorrectly defined (the return type was incorrect). llvm-svn: 259886
*	Fix for PR 26356	Nemanja Ivanovic	2016-02-04	1	-5/+4
\| \| \| \| \| \| \| \| \|	Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840
*	Fix for PR 26381	Nemanja Ivanovic	2016-02-03	1	-1/+1
\| \| \| \| \| \|	Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645
*	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.	Kyle Butt	2016-02-03	1	-19/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617
*	Refactor common code for PPC fast isel load immediate selection.	Eric Christopher	2016-01-29	1	-9/+5
\| \| \| \|	llvm-svn: 259178
*	Since LI/LIS sign extend the constant passed into the instruction we should	Eric Christopher	2016-01-29	1	-2/+3
\| \| \| \| \| \| \| \| \|	check that the sign extended constant fits into 16-bits if we want a zero extended value, otherwise go ahead and put it together piecemeal. Fixes PR26356. llvm-svn: 259177