bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[x86] Teach the new vector shuffle lowering how to lower 128-bit	Chandler Carruth	2014-10-05	4	-192/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079
*	[InstCombine] Remove redundant @llvm.assume intrinsics	Hal Finkel	2014-10-04	1	-0/+55
\| \| \| \| \| \| \| \| \|	For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067
*	[x86] Slap a triple on this test since it is poking around at the stack	Chandler Carruth	2014-10-04	1	-0/+2
\| \| \| \| \| \| \|	and calling conventions. Otherwise its too hard to craft a usefully generic set of assertions. llvm-svn: 219047
*	[x86] Enable the new vector shuffle lowering by default.	Chandler Carruth	2014-10-04	90	-3775/+1333
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
*	R600/SI: Custom lower f64 -> i64 conversions	Matt Arsenault	2014-10-03	4	-34/+122
\| \| \| \|	llvm-svn: 219038
*	R600: Custom lower [s\|u]int_to_fp for i64 -> f64	Matt Arsenault	2014-10-03	2	-3/+67
\| \| \| \|	llvm-svn: 219037
*	R600/SI: Fix ftrunc f64 conformance failures.	Matt Arsenault	2014-10-03	4	-3/+113
\| \| \| \| \| \|	Re-add the tests since they were deleted at some point llvm-svn: 219036
*	[x86] Add a really preposterous number of patterns for matching all of	Chandler Carruth	2014-10-03	4	-35/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033
*	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't	Chandler Carruth	2014-10-03	5	-60/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
*	PR21145: Teach LLVM about C++14 sized deallocation functions.	Richard Smith	2014-10-03	1	-0/+23
\| \| \| \| \| \| \| \|	C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014
*	Revert "Revert "DI: Fold constant arguments into a single MDString""	Duncan P. N. Exon Smith	2014-10-03	342	-5879/+5879
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010
*	[ISel] Keep matching state consistent when folding during X86 address match	Adam Nemet	2014-10-03	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
*	R600: Align functions to 256 bytes	Tom Stellard	2014-10-03	1	-0/+2
\| \| \| \|	llvm-svn: 219002
*	[Power] Delete redundant test Atomics-32.ll	Robin Morisset	2014-10-03	1	-715/+0
\| \| \| \| \| \| \| \|	The test Atomics-32.ll was both redundant (all operations are also checked by atomics.ll at least) and not actually checking correctness (it was not using FileCheck, just verifying that the compiler does not crash). llvm-svn: 218997
*	llvm-readobj: print out the fields of the COFF delay-import table	Rui Ueyama	2014-10-03	1	-0/+12
\| \| \| \|	llvm-svn: 218996
*	[Power] Use lwsync for non-seq_cst fences	Robin Morisset	2014-10-03	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995
*	[mips] Print warning when using register names not available in N32/64	Daniel Sanders	2014-10-03	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989
*	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS	Chandler Carruth	2014-10-03	4	-123/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985
*	Revert 202433 - Provide a target override for the latest regalloc heuristic	Renato Golin	2014-10-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981
*	[x86] Fix the RUN-lines of this test to make sense.	Chandler Carruth	2014-10-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	I got them quite wrong when updating it and had the SSE4.1 run checked for SSE2 and the SSE2 run checked for SSE4.1. I think everything was actually generic SSE, but this still seems good to fix. While here, hoist the triple into the IR and make the flag set a bit more direct in what it is trying to test. llvm-svn: 218978
*	[x86] Significantly improve the ability of the new vector shuffle	Chandler Carruth	2014-10-03	4	-222/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977
*	[x86] Add some important, missing test coverage for blending from one	Chandler Carruth	2014-10-03	2	-46/+371
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector to a zero vector for the v2 cases and fix the v4 integer cases to actually blend from a vector. There are already seprate tests for the case of inserting from a scalar. These cases cover a lot of the regressions I've seen in the regression test suite for the new vector shuffle lowering and specifically cover the reported lack of using various zext-ing instruction patterns. My next patch should fix a big chunk of this, but wanted to get a nice baseline for these patterns in the test cases first. llvm-svn: 218976
*	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen	Chandler Carruth	2014-10-03	1	-0/+235
\| \| \| \| \| \| \| \| \|	element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974
*	[x86] Add two more triples to stabilize the precise assembly syntax	Chandler Carruth	2014-10-03	2	-0/+4
\| \| \| \| \| \|	across platforms. llvm-svn: 218973
*	[x86] Remove a couple of fairly pointless tests. These were merely	Chandler Carruth	2014-10-03	2	-58/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	testing that we generated divps and divss but not in a very systematic way. There are other tests for widening binary operations already that make these unnecessary. The second one seems mostly about testing Atom as well as normal X86, but despite the comment claiming it is testing a different instruction sequence, it then tests for exactly the same div instruction sequence! (The sequence of instructions is actually quite different on Atom, but not the sequence of div instructions....) And then it has an "execution" test that simply isn't run? Very strange. Anyways, none of this is really needed so clean this up. llvm-svn: 218972
*	Revert r215343.	James Molloy	2014-10-03	1	-33/+0
\| \| \| \| \| \|	This was contentious and needs invesigation. llvm-svn: 218971
*	[mips] Remove XFAIL from two XPASS'ing tests on the llvm-mips-linux builder	Daniel Sanders	2014-10-03	2	-2/+0
\| \| \| \|	llvm-svn: 218967
*	[x86] Add another triple to a test to make the comment syntax stable.	Chandler Carruth	2014-10-03	1	-1/+1
\| \| \| \| \| \|	Should fix darwin builders. llvm-svn: 218956
*	[x86] Add triples to these tests so that we see fewer calling convention	Chandler Carruth	2014-10-03	2	-3/+3
\| \| \| \| \| \| \|	differences and they're a bit easier to maintain. This should fix the tests on cygwin bots, etc. llvm-svn: 218955
*	[x86] Regenerate precise FileCheck lines for the lats batch of test	Chandler Carruth	2014-10-03	3	-347/+1351
\| \| \| \| \| \|	cases. llvm-svn: 218954
*	[x86] Remove another low-value test still written using grep. We have	Chandler Carruth	2014-10-03	1	-19/+0
\| \| \| \| \| \|	many tests for movss and friends. llvm-svn: 218953
*	[x86] Regenerate precise checks for a couple of test cases and remove	Chandler Carruth	2014-10-03	3	-90/+190
\| \| \| \| \| \| \|	a test case that was just grepping the debug stats output rather than actually checking the generated code for anything useful. llvm-svn: 218951
*	[x86] Remove an over-reduced test case. This would need to be	Chandler Carruth	2014-10-03	1	-30/+0
\| \| \| \| \| \| \| \|	intergrated much more fully into some logical part of the backend to really understand what it is trying to accomplish and how to update it. I suspect it no longer holds enough value to be worth having. llvm-svn: 218950
*	[x86] Regenerate and clean up more tests is preparation for vector	Chandler Carruth	2014-10-03	3	-348/+252
\| \| \| \| \| \| \| \| \|	shufle switch. I nuked a win64 config from one test as it doesn't really make sense to cover that ABI specially for generic v2f32 tests... llvm-svn: 218948
*	[x86] Cleanup and generate precise FileCheck assertions for a bunch of	Chandler Carruth	2014-10-03	4	-525/+933
\| \| \| \| \| \|	SSE tests. llvm-svn: 218947
*	[x86] This is a terrible SSE1 test, but we should keep it. I've deleted	Chandler Carruth	2014-10-03	1	-21/+11
\| \| \| \| \| \| \|	two functions that really didn't have any interesting assertions, and generated more precise tests for one of the others. llvm-svn: 218946
*	[x86] Merge two very similar tests and regenerate FileCheck lines for	Chandler Carruth	2014-10-03	2	-563/+698
\| \| \| \| \| \|	them. llvm-svn: 218945
*	[BasicAA] Revert r218714 - Make better use of zext and sign information.	Lang Hames	2014-10-03	2	-66/+0
\| \| \| \| \| \| \| \| \|	This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944
*	[x86] Regenerate a number of FileCheck assertions with my script for	Chandler Carruth	2014-10-03	6	-214/+426
\| \| \| \| \| \| \| \| \| \|	test cases that will change with the new vector shuffle lowering. This gives us a nice baseline for deltas against. I've checked and removed the cases where there were weird register usage being pinned down, and all of these are extremely pin-pointed tests so fully checking them seems very appropriate. llvm-svn: 218941
*	[x86] Remove a couple of other overly isolated tests that are low-value	Chandler Carruth	2014-10-03	2	-23/+0
\| \| \| \| \| \| \|	at this point. We have lots of tests of peephole optimizations with insert and extract on vectors. llvm-svn: 218940
*	[x86] Remove a test that provides little value. There are plenty of	Chandler Carruth	2014-10-03	1	-18/+0
\| \| \| \| \| \|	tests for zext of a vector. llvm-svn: 218939
*	[x86] Regenerate a bunch more avx512 test cases using my script to have	Chandler Carruth	2014-10-03	3	-152/+252
\| \| \| \| \| \| \| \|	tighter, more strict FileCheck assertions. Some of these I really like as they show case exactly what instruction sequences come out of these microscopic functionality tests. llvm-svn: 218936
*	[x86] Regenerate an avx512 test with my script to provide a nice	Chandler Carruth	2014-10-03	1	-130/+192
\| \| \| \| \| \| \| \| \| \| \| \|	baseline for updates from the new vector shuffle lowering. I've inspected the results here, and I couldn't find any register allocation decisions where there should be any realistic way to register allocate things differently. The closest was the imul test case. If you see something here you'd like register number variables on, just shout and I'll add them. llvm-svn: 218935
*	llvm-readobj: print COFF delay-load import table	Rui Ueyama	2014-10-03	3	-12/+32
\| \| \| \| \| \| \| \| \|	This patch adds another iterator to access the delay-load import table and use it from llvm-readobj. http://reviews.llvm.org/D5594 llvm-svn: 218933
*	[x86] Remove some of the --show-mc-encoding flags from avx512 tests that	Chandler Carruth	2014-10-03	3	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	need to be updated for the new vector shuffle lowering. After talking to Adam Nemet, Tim Northover, etc., it seems that testing MC encodings in the same suite as the basic codegen isn't the right approach. Instead, we're going to want dedicated MC tests for the encodings. These encodings are starting to get in my way so I wanted to cut them out early. The total set of instructions that should have encoding tests added is: vpaddd vsqrtss vsqrtsd vmovlhps vmovhlps valignq vbroadcastss Not too many parts of these tests were even using this. =] llvm-svn: 218932
*	llvm-readobj: add a test for COFF import-by-ordinal symbols	Rui Ueyama	2014-10-02	3	-4/+20
\| \| \| \|	llvm-svn: 218924
*	[PowerPC] Modern Book-E cores support sync	Hal Finkel	2014-10-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
*	[Power] Improve the expansion of atomic loads/stores	Robin Morisset	2014-10-02	4	-6/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922
*	[Stackmaps] Make ithe frame-pointer required for stackmaps.	Juergen Ributzka	2014-10-02	7	-12/+12
\| \| \| \| \| \| \| \| \|	Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920
*	Revert "DI: Fold constant arguments into a single MDString"	Duncan P. N. Exon Smith	2014-10-02	342	-5877/+5878
\| \| \| \| \| \|	This reverts commit r218914 while I investigate some bots. llvm-svn: 218918