bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.	Juergen Ributzka	2014-10-16	2	-44/+58
\| \| \| \| \| \| \| \| \| \| \| \|	This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
*	[SROA] Switch the common variable name for the 'AllocaSlices' class to	Chandler Carruth	2014-10-16	1	-40/+42
\| \| \| \| \| \| \| \| \| \| \|	'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963
*	[SROA] More range-based cleanups to SROA, these brought to you by	Chandler Carruth	2014-10-16	1	-25/+12
\| \| \| \| \| \| \| \| \|	clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962
*	[SROA] Switch a couple of overly complex iterator accessors to just be	Chandler Carruth	2014-10-16	1	-26/+10
\| \| \| \| \| \| \| \| \|	ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958
*	Erase fence insertion from SelectionDAGBuilder.cpp (NFC)	Robin Morisset	2014-10-16	4	-67/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
*	R600/SI: Remove unnecessary VALU patterns	Matt Arsenault	2014-10-16	1	-41/+0
\| \| \| \| \| \| \| \|	These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956
*	[SROA] Start more deeply moving SROA to use ranges rather than just	Chandler Carruth	2014-10-16	1	-45/+42
\| \| \| \| \| \| \| \| \| \| \| \|	iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955
*	R600: Fix nonsensical implementation of computeKnownBits for BFE	Matt Arsenault	2014-10-16	1	-5/+1
\| \| \| \| \| \|	This was resulting in invalid simplifications of sdiv llvm-svn: 219953
*	Delete -std-compile-opts.	Rafael Espindola	2014-10-16	1	-22/+22
\| \| \| \| \| \|	These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951
*	Allow call-slop optzn for destinations with a suitable dereferenceable attribute	Bjorn Steinbrink	2014-10-16	1	-14/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950
*	fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)	Sanjay Patel	2014-10-16	1	-1/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944
*	[AArch64] Fix miscompile of sdiv-by-power-of-2.	Juergen Ributzka	2014-10-16	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934
*	[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 ↵	Vasileios Kalintiris	2014-10-16	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931
*	[mips] Marked the DI/EI instruction aliases as MIPS32r2	Vasileios Kalintiris	2014-10-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927
*	Test commit access: remove extra new line at the end of file	Vasileios Kalintiris	2014-10-16	1	-1/+0
\| \| \| \|	llvm-svn: 219925
*	Reapply r219832 - InstCombine: Narrow switch instructions using known bits.	Akira Hatanaka	2014-10-16	1	-0/+31
\| \| \| \| \| \| \|	The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902
*	TRE: make TRE a bit more aggressive	Saleem Abdulrasool	2014-10-16	1	-8/+2
\| \| \| \| \| \| \| \| \|	Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899
*	Revert r219832.	Akira Hatanaka	2014-10-16	1	-31/+0
\| \| \| \|	llvm-svn: 219884
*	[LVI] Add some additional comments about caching and context instructions	Hal Finkel	2014-10-16	1	-0/+13
\| \| \| \| \| \| \| \|	Philip Reames and I had a long conversation about this, mostly because it is not obvious why the current logic is correct. Hopefully, these comments will prevent such confusion in the future. llvm-svn: 219882
*	R600: Remove dead function	Matt Arsenault	2014-10-16	2	-15/+0
\| \| \| \|	llvm-svn: 219879
*	Revert "r219834 - Teach ScalarEvolution to sharpen range information"	Sanjoy Das	2014-10-15	1	-38/+0
\| \| \| \| \| \| \|	This change breaks the asan buildbots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468 llvm-svn: 219878
*	Preserve non-byval pointer alignment attributes using @llvm.assume when inlining	Hal Finkel	2014-10-15	1	-0/+45
\| \| \| \| \| \| \| \| \|	For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876
*	Add CreateAlignmentAssumption to IRBuilder	Hal Finkel	2014-10-15	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	Clang CodeGen had a utility function for creating pointer alignment assumptions using the @llvm.assume intrinsic. This functionality will also be needed by the inliner (to preserve function-argument alignment attributes when inlining), so this moves the utility function into IRBuilder where it can be used both by Clang CodeGen and also other LLVM-level code. llvm-svn: 219875
*	[AVX512] Add DQ subvector inserts	Adam Nemet	2014-10-15	2	-11/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874
*	[AVX512] Two new attributes in X86VectorVTInfo for subvector insert	Adam Nemet	2014-10-15	2	-4/+14
\| \| \| \| \| \| \| \| \|	The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871
*	[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size	Adam Nemet	2014-10-15	1	-4/+4
\| \| \| \| \| \| \|	It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870
*	R600: Remove unnecessary part of computeKnownBitsForTargetNode	Matt Arsenault	2014-10-15	1	-5/+0
\| \| \| \| \| \| \|	Zero-width BFEs are combined away already, so there's no point in handling them. llvm-svn: 219868
*	Move variable down to use	Matt Arsenault	2014-10-15	1	-4/+4
\| \| \| \|	llvm-svn: 219867
*	Add MachOObjectFile::getUuid()	Alexander Potapenko	2014-10-15	1	-1/+10
\| \| \| \| \| \| \|	This CL introduces MachOObjectFile::getUuid(). This function returns an ArrayRef to the object file's UUID, or an empty ArrayRef if the object file doesn't contain an LC_UUID load command. The new function is gonna be used by llvm-symbolizer. llvm-svn: 219866
*	Fixing the build failure due to compiler warnings and unnecessary ↵	Chris Bieneman	2014-10-15	1	-3/+2
\| \| \| \| \| \|	disambiguation. llvm-svn: 219861
*	Defining a new API for debug options that doesn't rely on static global ↵	Chris Bieneman	2014-10-15	4	-12/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854
*	R600/SI: Fix bug where immediates were being used in DS addr operands	Tom Stellard	2014-10-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848
*	Avoid caching the MachineFunction, we don't use it outside of	Eric Christopher	2014-10-15	1	-9/+7
\| \| \| \| \| \|	runOnMachineFunction. llvm-svn: 219847
*	Wrong attribute. LLVM_ATTRIBUTE_UNUSED not LLVM_ATTRIBUTE_USED	Sid Manning	2014-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	This original fix for the build break was correct. LLVM_ATTRIBUTE_USED removes the warning message because it keeps the function in the object file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used depending on build settings. llvm-svn: 219846
*	IR: Move NumOperands from User to Value, NFC	Duncan P. N. Exon Smith	2014-10-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store `User::NumOperands` (and `MDNode::NumOperands`) in `Value`. On 64-bit host architectures, this reduces `sizeof(User)` and all subclasses by 8, and has no effect on `sizeof(Value)` (or, incidentally, on `sizeof(MDNode)`). On 32-bit host architectures, this increases `sizeof(Value)` by 4. However, it has no effect on `sizeof(User)` and `sizeof(MDNode)`, so the only concrete subclasses of `Value` that actually see the increase are `BasicBlock`, `Argument`, `InlineAsm`, and `MDString`. Moreover, I'll be shocked and confused if this causes a tangible memory regression. This has no functionality change (other than memory footprint). llvm-svn: 219845
*	IR: Cleanup comments for Value, User, and MDNode	Duncan P. N. Exon Smith	2014-10-15	3	-43/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	A follow-up commit will modify the memory-layout of `Value`, `User`, and `MDNode`. First fix the comments to be doxygen-friendly (and to follow the coding standards). - Use "\brief" instead of "repeatedName -". - Add a brief intro where it was missing. - Remove duplicated comments from source files (and a couple of noisy/trivial comments altogether). llvm-svn: 219844
*	Wrong attribute. LLVM_ATTRIBUTE_USED not LLVM_ATTRIBUTE_UNUSED	Sid Manning	2014-10-15	1	-1/+1
\| \| \| \|	llvm-svn: 219837
*	Allow forward references to section symbols.	Rafael Espindola	2014-10-15	1	-1/+8
\| \| \| \|	llvm-svn: 219835
*	Teach ScalarEvolution to sharpen range information.	Sanjoy Das	2014-10-15	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	If x is known to have the range [a, b) in a loop predicated by (icmp ne x, a), its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. phabricator: http://reviews.llvm.org/D5639 llvm-svn: 219834
*	Add LLVM_ATTRIBUTE_UNUSED to function currently just used in an assert	Sid Manning	2014-10-15	1	-0/+2
\| \| \| \| \| \|	Fixes break when -Wunused-function is used. llvm-svn: 219833
*	InstCombine: Narrow switch instructions using known bits.	Akira Hatanaka	2014-10-15	1	-0/+31
\| \| \| \| \| \| \| \| \|	Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832
*	Reapply "[FastISel][AArch64] Add custom lowering for GEPs."	Juergen Ributzka	2014-10-15	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \|	This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831
*	[FastISel][AArch64] Factor out add with immediate emission into a helper ↵	Juergen Ributzka	2014-10-15	1	-13/+28
\| \| \| \| \| \| \| \|	function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830
*	Correctly handle references to section symbols.	Rafael Espindola	2014-10-15	3	-50/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When processing assembly like .long .text we were creating a new undefined symbol .text. GAS on the other hand would handle that as a reference to the .text section. This patch implements that by creating the section symbols earlier so that they are visible during asm parsing. The patch also updates llvm-readobj to print the symbol number in the relocation dump so that the test can differentiate between two sections with the same name. llvm-svn: 219829
*	Enable the instruction printer in HexagonMCTargetDesc	Sid Manning	2014-10-15	4	-4/+64
\| \| \| \| \| \| \| \| \| \|	This adds the MCInstPrinter to the LLVMHexagonDesc library and removes the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is a prerequisite needed by the disassembler. Phabricator Revision: http://reviews.llvm.org/D5734 llvm-svn: 219826
*	R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.	Matt Arsenault	2014-10-15	1	-0/+17
\| \| \| \|	llvm-svn: 219823
*	R600: Fix miscompiles when BFE has multiple uses	Matt Arsenault	2014-10-15	1	-7/+10
\| \| \| \| \| \|	SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819
*	correct const-ness with auto and dyn_cast	Sanjay Patel	2014-10-15	1	-3/+3
\| \| \| \| \| \| \| \| \|	1. Use const with autos. 2. Don't bother with explicit const in cast ops because they do it automagically. Thanks, David B. / Aaron B. / Reid K. llvm-svn: 219817
*	[SLPVectorize] Basic ephemeral-value awareness	Hal Finkel	2014-10-15	1	-3/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816
*	Treat the WorkSet used to find ephemeral values as double-ended	Hal Finkel	2014-10-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to make sure that we visit all operands of an instruction before moving deeper in the operand graph. We had been pushing operands onto the back of the work set, and popping them off the back as well, meaning that we might visit an instruction before visiting all of its uses that sit in between it and the call to @llvm.assume. To provide an explicit example, given the following: %q0 = extractelement <4 x float> %rd, i32 0 %q1 = extractelement <4 x float> %rd, i32 1 %q2 = extractelement <4 x float> %rd, i32 2 %q3 = extractelement <4 x float> %rd, i32 3 %q4 = fadd float %q0, %q1 %q5 = fadd float %q2, %q3 %q6 = fadd float %q4, %q5 %qi = fcmp olt float %q6, %q5 call void @llvm.assume(i1 %qi) %q5 is used by both %qi and %q6. When we visit %qi, it will be marked as ephemeral, and we'll queue %q6 and %q5. %q6 will be marked as ephemeral and we'll queue %q4 and %q5. Under the old system, we'd then visit %q4, which would become ephemeral, %q1 and then %q0, which would become ephemeral as well, and now we have a problem. We'd visit %rd, but it would not be marked as ephemeral because we've not yet visited %q2 and %q3 (because we've not yet visited %q5). This will be covered by a test case in a follow-up commit that enables ephemeral-value awareness in the SLP vectorizer. llvm-svn: 219815