bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Add support for VZEXT constant folding	Simon Pilgrim	2016-04-07	1	-0/+18
\| \| \| \|	llvm-svn: 265646
*	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.	Ahmed Bougacha	2016-04-07	1	-15/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636
*	Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵	Hans Wennborg	2016-04-07	1	-14/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623
*	NFC: make AtomicOrdering an enum class	JF Bastien	2016-04-06	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 llvm-svn: 265602
*	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."	Hans Wennborg	2016-04-06	1	-76/+15
\| \| \| \| \| \|	It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559
*	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵	Hans Wennborg	2016-04-06	1	-26/+13
\| \| \| \| \| \| \| \| \|	eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551
*	Faster stack-protector for Android/AArch64.	Evgeniy Stepanov	2016-04-05	2	-11/+10
\| \| \| \| \| \| \|	Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481
*	Swift Calling Convention: add swiftcc.	Manman Ren	2016-04-05	2	-0/+22
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480
*	Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining ↵	Duncan P. N. Exon Smith	2016-04-05	1	-38/+33
\| \| \| \| \| \| \| \| \| \|	files; other minor fixes." This reverts commit r265454 since it broke the build. E.g.: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/ llvm-svn: 265459
*	Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; ↵	Eugene Zelenko	2016-04-05	1	-33/+38
\| \| \| \| \| \| \| \| \| \| \| \|	other minor fixes. Some Include What You Use suggestions were used too. Use anonymous namespaces in source files. Differential revision: http://reviews.llvm.org/D18778 llvm-svn: 265454
*	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.	Ahmed Bougacha	2016-04-05	1	-15/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450
*	[X86] Simplify early-exit check. NFC.	Ahmed Bougacha	2016-04-05	1	-4/+4
\| \| \| \|	llvm-svn: 265447
*	fix typo; NFC	Sanjay Patel	2016-04-05	1	-1/+1
\| \| \| \|	llvm-svn: 265442
*	Re-commit r265039 "[X86] Merge adjacent stack adjustments in ↵	Hans Wennborg	2016-04-04	1	-13/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345
*	ARM, AArch64, X86: Check preserved registers for tail calls.	Matthias Braun	2016-04-04	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329
*	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes	Derek Schuff	2016-04-04	6	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313
*	AVX-512: Truncating store for i1 vectors	Elena Demikhovsky	2016-04-04	1	-1/+62
\| \| \| \| \| \| \| \| \|	Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283
*	[X86] Removed duplicate code.	Simon Pilgrim	2016-04-03	1	-5/+5
\| \| \| \|	llvm-svn: 265274
*	[X86][SSE] Support for MOVMSK signbit extraction instructions	Simon Pilgrim	2016-04-03	5	-45/+32
\| \| \| \| \| \| \| \| \| \|	Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266
*	[X86] Tidied up X86ISD instruction nodes. NFCI.	Simon Pilgrim	2016-04-03	1	-50/+59
\| \| \| \| \| \| \| \|	Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263
*	AVX-512: Load and Extended Load for i1 vectors	Elena Demikhovsky	2016-04-03	2	-10/+122
\| \| \| \| \| \| \| \| \| \|	Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
*	[x86] avoid intermediate splat for non-zero memsets (PR27100)	Sanjay Patel	2016-04-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161
*	[x86] avoid intermediate splat for non-zero memsets (PR27100)	Sanjay Patel	2016-04-01	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148
*	[x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.	Andrea Di Biagio	2016-04-01	1	-1/+0
\| \| \| \| \| \| \|	Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138
*	[X86] Introduce Lakemont CPU.	Andrey Turetskiy	2016-04-01	1	-0/+3
\| \| \| \| \| \| \| \|	Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128
*	Use range-based for loops. NFC.	Michael Kuperstein	2016-04-01	1	-6/+5
\| \| \| \|	llvm-svn: 265105
*	Follow-up to r265036: I got these iterators mixed up	Hans Wennborg	2016-03-31	1	-2/+2
\| \| \| \|	llvm-svn: 265076
*	Revert r265039 "[X86] Merge adjacent stack adjustments in ↵	Hans Wennborg	2016-03-31	1	-19/+12
\| \| \| \| \| \| \| \| \| \|	eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046
*	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr ↵	Hans Wennborg	2016-03-31	1	-12/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039
*	Change eliminateCallFramePseudoInstr() to return an iterator	Hans Wennborg	2016-03-31	2	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036
*	[x86] use SSE/AVX ops for non-zero memsets (PR27100)	Sanjay Patel	2016-03-31	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
*	Prevent X86ISelLowering from merging volatile loads	Nirav Dave	2016-03-31	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013
*	[X86] Use MVT instead of EVT in code called after legalization.	Craig Topper	2016-03-31	1	-3/+3
\| \| \| \|	llvm-svn: 264992
*	[X86] Enable call frame optimization ("mov to push") not only for optsize ↵	Hans Wennborg	2016-03-30	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	(PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966
*	CodeGen: Factor out code for tail call result compatibility check; NFC	Matthias Braun	2016-03-30	1	-35/+8
\| \| \| \|	llvm-svn: 264959
*	Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵	Aaron Ballman	2016-03-30	1	-4/+4
\| \| \| \| \| \|	"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
*	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for ↵	Simon Pilgrim	2016-03-30	1	-1/+0
\| \| \| \| \| \| \| \|	consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922
*	Remove HasFnAttribute guards to getFnAttribute calls	Nirav Dave	2016-03-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
*	[X86][XOP] BITREVERSE lowering using VPPERM	Simon Pilgrim	2016-03-30	1	-1/+70
\| \| \| \| \| \|	XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870
*	[x86] Fix a horrible bug in our lowering of x86 floating point atomic	Chandler Carruth	2016-03-30	1	-24/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845
*	[x86] Extract a helper function to compute the full addressing mode from	Chandler Carruth	2016-03-30	2	-21/+29
\| \| \| \| \| \| \| \| \|	an x86 MachineInstr's operands. This will be super useful to fix some bad atomics code in my next commit. No functionality changed. llvm-svn: 264819
*	Swift Calling Convention: add swiftself attribute.	Manman Ren	2016-03-29	2	-0/+4
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754
*	AVX-512: fixed a bug in fp_to_uint pattern on KNL	Elena Demikhovsky	2016-03-29	1	-0/+4
\| \| \| \| \| \| \| \| \|	Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701
*	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op ↵	Simon Pilgrim	2016-03-28	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666
*	AVX-512: Fixed ICMP instruction selection for i1 operands	Elena Demikhovsky	2016-03-28	1	-5/+9
\| \| \| \| \| \| \| \| \| \|	ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566
*	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets	Simon Pilgrim	2016-03-26	1	-0/+21
\| \| \| \| \| \|	Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517
*	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets	Simon Pilgrim	2016-03-26	1	-0/+2
\| \| \| \| \| \| \| \|	Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512
*	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors	Simon Pilgrim	2016-03-26	1	-0/+124
\| \| \| \| \| \| \| \|	Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
*	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵	Simon Pilgrim	2016-03-26	1	-2/+3
\| \| \| \| \| \| \| \|	multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
*	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.	Simon Pilgrim	2016-03-26	1	-13/+5
\| \| \| \| \| \|	LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506