bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[x86] Move the AVX blend test to a generic name. I'm going to fold other	Chandler Carruth	2014-10-01	1	-0/+0
\| \| \| \| \| \|	blend tests into this one. llvm-svn: 218813
*	[x86] Remove a test that wasn't doing anything really. We have plenty of	Chandler Carruth	2014-10-01	1	-69/+0
\| \| \| \| \| \|	better tests for zext of vectors at this point. llvm-svn: 218811
*	[x86] Add a 32-bit run to the sext test, and remove a sad vec_sext.ll	Chandler Carruth	2014-10-01	2	-80/+181
\| \| \| \| \| \| \| \| \| \| \|	test file. This old test had a bunch of functions that were never even checked. =/ The only thing it really did was to make sure that we did something reasonable in 32-bit mode with SSE4.1. Adding another run line to the main vector-sext.ll test seems a better way to do that. llvm-svn: 218810
*	[x86] Teach both sext and zext vector tests to cover a nice wide range	Chandler Carruth	2014-10-01	2	-184/+662
\| \| \| \| \| \| \| \| \| \| \| \| \|	of architectures: SSE2, SSSE3, SSE4.1, AVX, and AVX2. Unfortunately, this exposses the absolute horror of the code we generate for many of these patterns. Anyone wanting to familiarize themselves with the x86 backend and improve performance could do a lot of good sitting down and making these test cases not look so terrible. While the new vector shuffle code I'm working on well help some, it won't fix all of the crimes here. llvm-svn: 218807
*	Rework the PPC TargetMachine so that the non-function specific	Eric Christopher	2014-10-01	2	-27/+32
\| \| \| \| \| \| \|	overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805
*	constify TargetMachine parameter for X86TargetLowering.	Eric Christopher	2014-10-01	4	-5/+5
\| \| \| \|	llvm-svn: 218804
*	Make the sqrt intrinsic return undef for a negative input.	Sanjay Patel	2014-10-01	2	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 llvm-svn: 218803
*	[x86] Sort the ISA-specific RUN lines for vector-sext.ll to go from	Chandler Carruth	2014-10-01	1	-155/+155
\| \| \| \| \| \| \|	oldest to newest. This makes more sense to me and is more consistent with other tests. llvm-svn: 218802
*	ARM: yes it can (as of r218789)	Tim Northover	2014-10-01	1	-3/+0
\| \| \| \|	llvm-svn: 218801
*	[x86] Rename avx-{s,z}ext.ll to vector-{s,z}ext.ll.	Chandler Carruth	2014-10-01	2	-0/+0
\| \| \| \| \| \| \| \|	These tests are far and away the best sext and zext tests we have for vectors. I'm going to merge the other similar tests into them and expand the ISA coverage. llvm-svn: 218800
*	[x86] Cleanup and re-generate the checks for avx-zext.ll using the new	Chandler Carruth	2014-10-01	1	-19/+32
\| \| \| \| \| \|	script. llvm-svn: 218799
*	DIBuilder: Encapsulate DIExpression's element type	Duncan P. N. Exon Smith	2014-10-01	3	-7/+10
\| \| \| \| \| \| \| \|	`DIExpression`'s elements are 64-bit integers that are stored as `ConstantInt`. The accessors already encapsulate the storage. This commit updates the `DIBuilder` API to also encapsulate that. llvm-svn: 218797
*	[x86] Generate the FileCheck assertions for avx-blend.ll with my new	Chandler Carruth	2014-10-01	1	-96/+75
\| \| \| \| \| \| \|	script to make them nice and predictable. This will ease updating them for the new vector shuffle lowering and seeing the delta if any. llvm-svn: 218795
*	[x86] Clean up and generate detailed FileCheck assertions for	Chandler Carruth	2014-10-01	1	-123/+365
\| \| \| \| \| \| \| \| \| \| \| \| \|	avx-sext.ll using my new script. Also add an AVX2 mode to this test. Part of cleaning up the test suite before enabling the new vector shuffle lowering. This also highlights some of the abysmal failures of the old shuffle lowering. Check out those 'pinsrw' and 'pextrw' sequences! llvm-svn: 218794
*	[MemoryDepAnalysis] Fix compile time slowdown	Bruno Cardoso Lopes	2014-10-01	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Problem One program takes ~3min to compile under -O2. This happens after a certain function A is inlined ~700 times in a function B, inserting thousands of new BBs. This leads to 80% of the compilation time spent in GVN::processNonLocalLoad and MemoryDependenceAnalysis::getNonLocalPointerDependency, while searching for nonlocal information for basic blocks. Usually, to avoid spending a long time to process nonlocal loads, GVN bails out if it gets more than 100 deps as a result from MD->getNonLocalPointerDependency. However this only happens after all nonlocal information for BBs have been computed, which is the bottleneck in this scenario. For instance, there are 8280 times where getNonLocalPointerDependency returns deps with more than 100 bbs and from those, 600 times it returns more than 1000 blocks. - Solution Bail out early during the nonlocal info computation whenever we reach a specified threshold. This patch proposes a 100 BBs threshold, it also reduces the compile time from 3min to 23s. - Testing The test-suite presented no compile nor execution time regressions. Some numbers from my machine (x86_64 darwin): - 17s under -Oz (which avoids inlining). - 1.3s under -O1. - 2m51s under -O2 ToT *** 23s under -O2 w/ Result.size() > 100 - 1m54s under -O2 w/ Result.size() > 500 With NumResultsLimit = 100, GVN yields the same outcome as in the unlimited 3min version. http://reviews.llvm.org/D5532 rdar://problem/18188041 llvm-svn: 218792
*	Don't repeat function/variable name in comment. NFC.	Sanjay Patel	2014-10-01	2	-99/+84
\| \| \| \|	llvm-svn: 218791
*	[X86 disasm tblegen backend] Clean up numPhysicalOperands asserts	Adam Nemet	2014-10-01	1	-42/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No functionality change intended. This implements Elena's idea to put the new additionalOperand outside the switch to cover all cases (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140929/237763.html). Note only nontrivial change is in MRMSrcMemFrm. This requires an inclusive interval of [2, 4] because we have prefix-dependent optional immediate operand. llvm-svn: 218790
*	ARM: allow copying of CPSR when all else fails.	Tim Northover	2014-10-01	4	-1/+98
\| \| \| \| \| \| \| \| \| \| \| \|	As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	257	-1378/+1535
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
*	LTO: Add missing target triple from r218784	Duncan P. N. Exon Smith	2014-10-01	1	-0/+2
\| \| \| \|	llvm-svn: 218786
*	Add fptrunc to mips fast-sel	Reed Kotler	2014-10-01	2	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785
*	LTO: Ignore disabled diagnostic remarks	Duncan P. N. Exon Smith	2014-10-01	7	-14/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r206400 and r209442 added remarks that are disabled by default. However, if a diagnostic handler is registered, the remarks are sent unfiltered to the handler. This is the right behaviour for clang, since it has its own filters. However, the diagnostic handler exposed in the LTO API receives only the severity and message. It doesn't have the information to filter by pass name. For LTO, disabled remarks should be filtered by the producer. I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool` argument indicating whether to respect the built-in filters. This defaults to `false`, so other consumers don't have a behaviour change, but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`. To make this behaviour testable, I added a `-use-diagnostic-handler` command-line option to `llvm-lto`. This fixes PR21108. llvm-svn: 218784
*	Add an immovable type to test Optional<T>::emplace more rigorously after ↵	David Blaikie	2014-10-01	1	-5/+26
\| \| \| \| \| \|	r218732. llvm-svn: 218783
*	Revert r218778 while investigating buldbot breakage.	Adrian Prantl	2014-10-01	257	-1530/+1378
\| \| \| \| \| \|	"Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	257	-1378/+1530
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol ↵	Tom Stellard	2014-10-01	266	-1520/+1521
\| \| \| \| \| \|	table llvm-svn: 218776
*	C API: Add LLVMCloneModule()	Tom Stellard	2014-10-01	2	-0/+13
\| \| \| \|	llvm-svn: 218775
*	Revert r216862 due to a performance regression	Jingyue Wu	2014-10-01	4	-59/+28
\| \| \| \| \| \|	Reported by Alexey Volkov in PR21115 llvm-svn: 218771
*	[mips] Rename emit and parse functions for the .cpload assembler directive. NFC.	Toma Tabacu	2014-10-01	3	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768
*	R600/SI: Add a generic pseudo EXP instruction	Tom Stellard	2014-10-01	3	-8/+30
\| \| \| \|	llvm-svn: 218767
*	R600/SI: Add generic pseudo MTBUF instructions	Tom Stellard	2014-10-01	3	-31/+58
\| \| \| \|	llvm-svn: 218766
*	R600/SI: Add generic pseudo SMRD instructions	Tom Stellard	2014-10-01	2	-14/+39
\| \| \| \|	llvm-svn: 218765
*	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5	Oliver Stannard	2014-10-01	4	-52/+82
\| \| \| \| \| \| \| \| \| \|	Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763
*	[x86] Fix a few more tiny patterns with the new vector shuffle lowering	Chandler Carruth	2014-10-01	2	-5/+213
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756
*	[x86] Delete some extraneous logic from the new vector shuffle lowering.	Chandler Carruth	2014-10-01	1	-7/+0
\| \| \| \| \| \| \| \|	Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755
*	[AArch64] Allow access to all system registers with MRS/MSR instructions.	Tom Coxon	2014-10-01	8	-70/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753
*	Revert r218721, r218735.	Evgeniy Stepanov	2014-10-01	5	-283/+9
\| \| \| \| \| \| \| \| \| \|	Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752
*	Add missing natual vector cast.	Asiri Rathnayake	2014-10-01	3	-0/+67
\| \| \| \| \| \| \| \| \|	Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751
*	ADTTests/OptionalTest.cpp: Use LLVM_DELETED_FUNCTION.	NAKAMURA Takumi	2014-10-01	1	-4/+4
\| \| \| \|	llvm-svn: 218750
*	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)	Oliver Stannard	2014-10-01	11	-18/+75
\| \| \| \| \| \| \| \| \|	The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747
*	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref ↵	Daniel Sanders	2014-10-01	6	-12/+144
\| \| \| \| \| \| \| \|	Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745
*	[mips] For indirect calls we don't need $gp to point to .got. Mips linker	Sasa Stankovic	2014-10-01	8	-19/+47
\| \| \| \| \| \| \| \| \|	doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744
*	test: XFAIL the non-darwin gmlt test on darwin	Justin Bogner	2014-10-01	1	-0/+3
\| \| \| \| \| \| \|	r218702 disabled a -gmlt optimization for darwin, but this means the non-darwin test isn't working there anymore. llvm-svn: 218742
*	[MCJIT] Turn the getSymbolAddress free function created in r218626 into a static	Lang Hames	2014-10-01	3	-7/+13
\| \| \| \| \| \| \| \| \| \| \|	member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess). The functionality this provides is very specific to RTDyldMemoryManager, so it makes sense to keep it in that class to avoid accidental re-use. No functional change. llvm-svn: 218741
*	Fix typo in comment from r218733	Nick Lewycky	2014-10-01	1	-1/+1
\| \| \| \|	llvm-svn: 218739
*	InstrProf: Make coverage::Counter comparable	Justin Bogner	2014-10-01	1	-0/+4
\| \| \| \| \| \|	I'll be using this in a clang change very soon. llvm-svn: 218736
*	[InstCombine] Fix for assert build failures caused by r218721	Gerolf Hoflehner	2014-10-01	1	-1/+7
\| \| \| \| \| \| \| \| \|	The icmp-select-icmp optimization made the implicit assumption that the select-icmp instructions are in the same block and asserted on it. The fix explicitly checks for that condition and conservatively suppresses the optimization when it is violated. llvm-svn: 218735
*	[x86] Teach the new vector shuffle lowering to be even more aggressive	Chandler Carruth	2014-10-01	2	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734
*	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is	Chandler Carruth	2014-10-01	2	-11/+27
\| \| \| \| \| \| \| \| \| \|	the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733
*	Add an emplace(...) method to llvm::Optional<T>.	Jordan Rose	2014-10-01	2	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \|	This can be used for in-place initialization of non-moveable types. For compilers that don't support variadic templates, only up to four arguments are supported. We can always add more, of course, but this should be good enough until we move to a later MSVC that has full support for variadic templates. Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS. Reviewed by David Blaikie. llvm-svn: 218732