bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32.	NAKAMURA Takumi	2014-09-15	1	-1/+1
\| \| \| \|	llvm-svn: 217806
*	Add a triple to fix the bots.	Rafael Espindola	2014-09-15	1	-1/+1
\| \| \| \|	llvm-svn: 217805
*	Fix a lot of confusion around inserting nops on empty functions.	Rafael Espindola	2014-09-15	3	-22/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. llvm-svn: 217801
*	[CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> ↵	Quentin Colombet	2014-09-15	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \|	zext promotion introduced in r217629. We were returning the old sext instead of the new zext as the promoted instruction! Thanks Joerg Sonnenberger for the test case. llvm-svn: 217800
*	[X86] Fix a bug in X86's peephole optimization.	Akira Hatanaka	2014-09-15	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 llvm-svn: 217799
*	CHECK-LABELize test	Matt Arsenault	2014-09-15	2	-19/+19
\| \| \| \|	llvm-svn: 217797
*	R600/SI: Prefer selecting more e64 instruction forms.	Matt Arsenault	2014-09-15	6	-9/+81
\| \| \| \| \| \| \| \|	Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789
*	R600/SI: Make sure double vector fmul is tested	Matt Arsenault	2014-09-15	1	-4/+29
\| \| \| \|	llvm-svn: 217787
*	R600/SI: Add some mubuf testcases.	Matt Arsenault	2014-09-15	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785
*	R600/SI: Add preliminary support for flat address space	Matt Arsenault	2014-09-15	1	-0/+182
\| \| \| \|	llvm-svn: 217777
*	[mips] Marked the DADDiu instruction aliases as MIPS III.	Toma Tabacu	2014-09-15	8	-0/+32
\| \| \| \| \| \| \| \|	Patch by Vasileios Kalintiris. Differential Revision: http://reviews.llvm.org/D5239 llvm-svn: 217770
*	[x86] Begin emitting PBLENDW instructions for integer blend operations	Chandler Carruth	2014-09-15	2	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. llvm-svn: 217767
*	[x86] Add an explicit SSE3 run to this test and flesh out a bunch of	Chandler Carruth	2014-09-15	1	-0/+86
\| \| \| \| \| \| \| \| \| \|	missing specific checks. While there is a lot of redundancy here where all-but-one mode use the same code generation, I'd rather have each variant spelled out and checked so that readers aren't misled by an omission in the test suite. llvm-svn: 217765
*	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS	Chandler Carruth	2014-09-15	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761
*	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP	Chandler Carruth	2014-09-15	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758
*	[x86] Undo a flawed transform I added to form UNPCK instructions when	Chandler Carruth	2014-09-15	6	-23/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. llvm-svn: 217755
*	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and	Chandler Carruth	2014-09-15	1	-24/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. llvm-svn: 217752
*	InstSimplify: Simplify trivial and/or of icmps	David Majnemer	2014-09-15	1	-0/+120
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some ICmpInsts when anded/ored with another ICmpInst trivially reduces to true or false depending on whether or not all integers or no integers satisfy the intersected/unioned range. This sort of trivial looking code can come about when InstCombine performs a range reduction-type operation on sdiv and the like. This fixes PR20916. llvm-svn: 217750
*	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.	Chandler Carruth	2014-09-14	3	-37/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744
*	llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit ↵	NAKAMURA Takumi	2014-09-14	1	-1/+1
\| \| \| \| \| \|	-mtriple=x86_64-unknown to avoid incompatibility of win32. llvm-svn: 217742
*	[x86] Add an SSE41 mode to this test. Nothing interesting here, its the	Chandler Carruth	2014-09-14	1	-0/+19
\| \| \| \| \| \|	same as SSE3. llvm-svn: 217741
*	[x86] Switch this test to use an ALL prefix with special SSE2 and SSE3	Chandler Carruth	2014-09-14	1	-120/+129
\| \| \| \| \| \| \| \| \|	variants where significant. This will make it more obvious what is happening when we start using blends in SSE41. llvm-svn: 217740
*	[x86] Add some test cases where we should emit blendpd in SSE4.1. No	Chandler Carruth	2014-09-14	1	-0/+15
\| \| \| \| \| \|	actual change yet though. llvm-svn: 217739
*	[x86] Teach the vector combiner that picks a canonical shuffle from to	Chandler Carruth	2014-09-14	8	-22/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. llvm-svn: 217738
*	R600/SI: Fix broken check lines	Matt Arsenault	2014-09-14	1	-2/+2
\| \| \| \|	llvm-svn: 217736
*	[FastISel][AArch64] Add support for non-native types for logical ops.	Juergen Ributzka	2014-09-13	1	-0/+176
\| \| \| \| \| \| \| \| \|	Extend the logical ops selection to also support non-native types such as i1, i8, and i16. Fixes rdar://problem/18330589. llvm-svn: 217732
*	[AArch64] Update test case to pass with post-RA MI scheduler.	Chad Rosier	2014-09-13	1	-1/+1
\| \| \| \| \| \| \| \|	Check that the post RA scheduler is being skipped, regardless of whether it's the top-down list latency scheduler or the post-RA MI scheduler. llvm-svn: 217725
*	Stop suppress error messages in test case to see why one buildbot is failing	Nick Kledzik	2014-09-12	1	-1/+1
\| \| \| \|	llvm-svn: 217715
*	[llvm-objdump] support -rebase option for mach-o to dump rebasing info	Nick Kledzik	2014-09-12	2	-0/+15
\| \| \| \| \| \| \| \| \| \|	Similar to my previous -exports-trie option, the -rebase option dumps info from the LC_DYLD_INFO load command. The rebasing info is a list of the the locations that dyld needs to adjust if a mach-o image is not loaded at its preferred address. Since ASLR is now the default, images almost never load at their preferred address, and thus need to be rebased by dyld. llvm-svn: 217709
*	llvm-profdata: Avoid undefined behaviour when reading raw profiles	Justin Bogner	2014-09-12	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	The raw profiles that are generated in compiler-rt always add padding so that each profile is aligned, so we can simply treat files that don't have this property as malformed. Caught by Alexey's new ubsan bot. Thanks! llvm-svn: 217708
*	FileCheckize. NFC.	Chad Rosier	2014-09-12	1	-21/+25
\| \| \| \|	llvm-svn: 217698
*	[AArch64] Enable post-RA MI scheduler.	Chad Rosier	2014-09-12	1	-0/+31
\| \| \| \| \| \| \|	Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693
*	[lit] Parse all strings as UTF-8 rather than ASCII.	Jordan Rose	2014-09-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	As far as I can tell UTF-8 has been supported since the beginning of Python's codec support, and it's the de facto standard for text these days, at least for primarily-English text. This allows us to put Unicode into lit RUN lines. rdar://problem/18311663 llvm-svn: 217688
*	llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. ↵	NAKAMURA Takumi	2014-09-12	1	-1/+1
\| \| \| \| \| \|	It was incompatible to Win32 x64. llvm-svn: 217683
*	[mips][microMIPS] Implement JRADDIUSP instruction	Zoran Jovanovic	2014-09-12	1	-0/+5
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5046 llvm-svn: 217681
*	Address comments on r217622	Bill Schmidt	2014-09-12	1	-0/+12
\| \| \| \|	llvm-svn: 217680
*	[mips][microMIPS] Implement BGEZALS and BLTZALS instructions	Zoran Jovanovic	2014-09-12	1	-0/+10
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5004 llvm-svn: 217678
*	[mips][microMIPS] Implement JALS and JALRS instructions.	Zoran Jovanovic	2014-09-12	1	-0/+10
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5003 llvm-svn: 217676
*	[mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions	Zoran Jovanovic	2014-09-12	1	-0/+12
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D5211 llvm-svn: 217675
*	[ARM] Teach the cost model that cross-class copies are costly.	James Molloy	2014-09-12	1	-56/+56
\| \| \| \| \| \|	Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case. llvm-svn: 217674
*	Legalizer: Use the scalar bit width when promoting bit counting instrs on	Benjamin Kramer	2014-09-12	1	-1/+50
\| \| \| \| \| \| \| \| \|	vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671
*	Revert "llvm-cov: Remove an overly system specific test"	Justin Bogner	2014-09-11	2	-0/+31
\| \| \| \| \| \| \| \| \| \|	This fixes a call to sys::fs::equivalent that should've been to CodeCoverageTool::equivalentFiles, which lets us restore the test of r217476 that was removed in r217478. This reverts r217478, but the test works this time. llvm-svn: 217646
*	R600/SI: Fix off by 1 error in used register count	Matt Arsenault	2014-09-11	1	-1/+8
\| \| \| \| \| \| \|	The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636
*	[MCJIT] Make sure we test ARM BR24 relocations with both internal and external	Lang Hames	2014-09-11	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	symbols. Previously we have only been testing these relocations with external symbols. <rdar://problem/18308413> llvm-svn: 217635
*	[CodeGenPrepare] Teach the addressing mode matcher how to promote zext.	Quentin Colombet	2014-09-11	1	-0/+15
\| \| \| \| \| \|	I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2. llvm-svn: 217629
*	Add missing colon to RUN line...	Bill Schmidt	2014-09-11	1	-1/+1
\| \| \| \|	llvm-svn: 217623
*	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm	Bill Schmidt	2014-09-11	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622
*	[MCJIT] Add support for ARM HALF_DIFF relocations to MCJIT.	Lang Hames	2014-09-11	1	-15/+19
\| \| \| \| \| \|	Fixes <rdar://problem/18297804>. llvm-svn: 217620
*	Add triple to test to fix bots	Matt Arsenault	2014-09-11	1	-1/+1
\| \| \| \|	llvm-svn: 217612
*	Provide an implementation of getNoopForMachoTarget for SPARC.	Brad Smith	2014-09-11	1	-0/+8
\| \| \| \|	llvm-svn: 217611