bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix a lot of confusion around inserting nops on empty functions.	Rafael Espindola	2014-09-15	2	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. llvm-svn: 217801
*	[X86] Fix a bug in X86's peephole optimization.	Akira Hatanaka	2014-09-15	1	-14/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 llvm-svn: 217799
*	R600/SI: Prefer selecting more e64 instruction forms.	Matt Arsenault	2014-09-15	1	-7/+7
\| \| \| \| \| \| \| \|	Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789
*	R600/SI: Add preliminary support for flat address space	Matt Arsenault	2014-09-15	20	-11/+440
\| \| \| \|	llvm-svn: 217777
*	R600/SI: Fix promote alloca pass breaking addrspacecast	Matt Arsenault	2014-09-15	1	-0/+7
\| \| \| \|	llvm-svn: 217776
*	R600/SI: Enable named operand table for MTBUF	Matt Arsenault	2014-09-15	1	-0/+1
\| \| \| \| \| \| \|	There is already code trying to use it for getting the offset. llvm-svn: 217775
*	[mips] Use early exit in MipsAsmParser::matchCPURegisterName(). NFC.	Toma Tabacu	2014-09-15	1	-17/+18
\| \| \| \| \| \| \| \|	Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5270 llvm-svn: 217774
*	[mips] Marked the DADDiu instruction aliases as MIPS III.	Toma Tabacu	2014-09-15	1	-4/+4
\| \| \| \| \| \| \| \|	Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5239 llvm-svn: 217770
*	[x86] Begin emitting PBLENDW instructions for integer blend operations	Chandler Carruth	2014-09-15	1	-2/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. llvm-svn: 217767
*	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS	Chandler Carruth	2014-09-15	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761
*	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP	Chandler Carruth	2014-09-15	4	-30/+105
\| \| \| \| \| \| \| \| \| \| \| \|	instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758
*	[x86] Undo a flawed transform I added to form UNPCK instructions when	Chandler Carruth	2014-09-15	1	-79/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. llvm-svn: 217755
*	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and	Chandler Carruth	2014-09-15	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. llvm-svn: 217752
*	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.	Chandler Carruth	2014-09-14	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744
*	[x86] Teach the vector combiner that picks a canonical shuffle from to	Chandler Carruth	2014-09-14	1	-9/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. llvm-svn: 217738
*	[A57FPLoadBalancing] Modify r217689 - actually we do need to check defs	James Molloy	2014-09-14	1	-6/+6
\| \| \| \| \| \| \| \|	... Just make sure we check uses first so we see the kill first. It turns out ignoring defs gives some pretty nasty runtime failures. I'm certain this is the fix but I'm still reducing a testcase. llvm-svn: 217735
*	[FastISel][AArch64] Add support for non-native types for logical ops.	Juergen Ributzka	2014-09-13	1	-36/+48
\| \| \| \| \| \| \| \| \|	Extend the logical ops selection to also support non-native types such as i1, i8, and i16. Fixes rdar://problem/18330589. llvm-svn: 217732
*	Fix typo	Matt Arsenault	2014-09-13	1	-4/+4
\| \| \| \|	llvm-svn: 217730
*	[AArch64] Don't enable the post-RA MI scheduler at OptNone.	Chad Rosier	2014-09-12	1	-1/+2
\| \| \| \| \| \|	Hopefully, this will appease the bots. llvm-svn: 217712
*	The MCAssembler.h include isn't used.	Yaron Keren	2014-09-12	1	-1/+0
\| \| \| \|	llvm-svn: 217705
*	[AArch64] Enable post-RA MI scheduler.	Chad Rosier	2014-09-12	2	-1/+6
\| \| \| \| \| \| \|	Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693
*	[A57FPLoadBalancing] Remove support for vector types	James Molloy	2014-09-12	1	-5/+0
\| \| \| \| \| \| \| \|	Vector MUL/MLAs have tied operands, which gives us extra constraints that we currently can't handle. Instead of silently doing the wrong thing, remove support to be readded later properly. llvm-svn: 217690
*	[A57FPLoadBalancing] Ignore <def>s when checking if a chain may be killed.	James Molloy	2014-09-12	1	-0/+4
\| \| \| \| \| \| \| \|	Defs are seen before uses, so a def without the kill flag doesn't necessarily mean that the register is not killed on that instruction. It may be killed in a later use operand. llvm-svn: 217689
*	[A57LoadBalancing] unique_ptr-ify.	James Molloy	2014-09-12	1	-25/+20
\| \| \| \| \| \|	Thanks to David Blakie for the in-depth review! llvm-svn: 217682
*	[mips][microMIPS] Implement JRADDIUSP instruction	Zoran Jovanovic	2014-09-12	4	-0/+52
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5046 llvm-svn: 217681
*	Address comments on r217622	Bill Schmidt	2014-09-12	1	-4/+6
\| \| \| \|	llvm-svn: 217680
*	[mips][microMIPS] Implement BGEZALS and BLTZALS instructions	Zoran Jovanovic	2014-09-12	2	-0/+13
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5004 llvm-svn: 217678
*	[mips][microMIPS] Implement JALS and JALRS instructions.	Zoran Jovanovic	2014-09-12	2	-4/+37
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5003 llvm-svn: 217676
*	[mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions	Zoran Jovanovic	2014-09-12	3	-5/+19
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5211 llvm-svn: 217675
*	[ARM] Teach the cost model that cross-class copies are costly.	James Molloy	2014-09-12	1	-0/+7
\| \| \| \| \| \|	Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674
*	Fix gcc -Wpedantic.	Patrik Hagglund	2014-09-12	1	-1/+1
\| \| \| \|	llvm-svn: 217669
*	Remove a temporary variable and just construct a unique_ptr directly using ↵	Craig Topper	2014-09-12	1	-9/+6
\| \| \| \| \| \|	make_unique. llvm-svn: 217655
*	R600/SI: Fix off by 1 error in used register count	Matt Arsenault	2014-09-11	1	-2/+4
\| \| \| \| \| \| \|	The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636
*	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm	Bill Schmidt	2014-09-11	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622
*	Provide an implementation of getNoopForMachoTarget for SPARC.	Brad Smith	2014-09-11	2	-0/+7
\| \| \| \|	llvm-svn: 217611
*	[AVX512] Fix miscompile for unpack	Adam Nemet	2014-09-11	1	-56/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack between the low and the high 256 bits of src1 into the low part of the destination and another unpack of the low and high 256 bits of src2 into the high part of the destination. I don't think that's how unpack works. AVX512 unpack simply has more 128-bit lanes but other than it works the same way as AVX. So in each 128-bit lane, we're always interleaving certain parts of both operands rather different parts of one of the operands. E.g. for this: __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16, 24, 17, 25, 20, 28, 21, 29); we generated punpcklps (notice how the elements of a and b are not interleaved in the shuffle). In turn, c was set to this: 0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29 Obviously this should have just returned the mask vector of the shuffle vector. I mostly reverted this change and made sure the original AVX code worked for 512-bit vectors as well. Also updated the tests because they matched the logic from the code. llvm-svn: 217602
*	Move constant-sized bitvector to the stack.	Benjamin Kramer	2014-09-11	1	-2/+2
\| \| \| \|	llvm-svn: 217600
*	R600: Add cmpxchg instruction for evergreen	Aaron Watry	2014-09-11	2	-5/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the R600_LDS_1A2D class a bit to get it to actually work. It seemed to be previously unused and broken. We also have to disable the conversion to the noret variant for now in R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops. Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to work for more than 1A1D variants of LDS operations. It's being left as a future TODO for now. Signed-off-by: Aaron Watry <awatry at gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217596
*	R600: Add LDS_WRXCHG[_RET] instructions for Evergreen.	Aaron Watry	2014-09-11	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217594
*	R600: Add LDS_MIN_[U]INT[_RET] instructions for Evergreen	Aaron Watry	2014-09-11	1	-0/+8
\| \| \| \| \| \|	Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217593
*	R600: Add LDS_XOR[_RET] instructions for Evergreen	Aaron Watry	2014-09-11	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217592
*	R600: Add LDS_OR[_RET] instructions for Evergreen	Aaron Watry	2014-09-11	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217591
*	R600: Add LDS_AND[_RET] instructions for Evergreen	Aaron Watry	2014-09-11	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217590
*	R600: Add LDS_MAX_[U]INT[_RET] instructions for Evergreen	Aaron Watry	2014-09-11	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was only present for SI before. Cayman may still be missing, but I am unable to test that currently. v2: Don't create atomicrmw max tests in separate file Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217589
*	R600/SI: Fix losing chain when fixing reg class of loads.	Matt Arsenault	2014-09-10	1	-6/+14
\| \| \| \| \| \| \|	The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561
*	R600/SI: Report offset in correct units for st64 DS instructions	Matt Arsenault	2014-09-10	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	Need to convert the 64 element offset into bytes, not just the element size like the normal case instructions. Noticed by inspection. This can't be hit now because st64 instructions aren't emitted during instruction selection, and the post-RA scheduler isn't enabled. llvm-svn: 217560
*	R600: Custom lower frem	Matt Arsenault	2014-09-10	2	-0/+20
\| \| \| \|	llvm-svn: 217553
*	Add doInitialization/doFinalization to DataLayoutPass.	Rafael Espindola	2014-09-10	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	With this a DataLayoutPass can be reused for multiple modules. Once we have doInitialization/doFinalization, it doesn't seem necessary to pass a Module to the constructor. Overall this change seems in line with the idea of making DataLayout a required part of Module. With it the only way of having a DataLayout used is to add it to the Module. llvm-svn: 217548
*	[AArch64] Revert r216141 for cyclone	Gerolf Hoflehner	2014-09-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The increase of the interleave factor to 4 has side-effects like performance losses eg. due to reminder loops being executed more frequently and may increase code size. It requires more analysis and careful heuristic tuning. Expect double digit gains in small benchmarks like lowercase.c and losses in puzzle.c. llvm-svn: 217540
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	5	-9/+9
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528