bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[mips] Define patterns for the atomic_{load,store}_{8,16,32,64} nodes.	Vasileios Kalintiris	2015-11-06	4	-32/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without these patterns we would generate a complete LL/SC sequence. This would be problematic for memory regions marked as WRITE-only or READ-only, as the instructions LL/SC would read/write to the protected memory regions correspondingly. Reviewers: dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D14397 llvm-svn: 252293
*	AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL	Tom Stellard	2015-11-06	2	-4/+24
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
*	Add a new attribute: norecurse	James Molloy	2015-11-06	8	-12/+21
\| \| \| \| \| \|	This attribute allows the compiler to assume that the function never recurses into itself, either directly or indirectly (transitively). This can be used among other things to demote global variables to locals. llvm-svn: 252282
*	Revert r252249 (and r252255, r252258), "[WinEH] Clone funclets with multiple ↵	NAKAMURA Takumi	2015-11-06	4	-1690/+10
\| \| \| \| \| \| \| \|	parents" It behaved flaky due to iterating pointer key values on std::set and std::map. llvm-svn: 252279
*	Temporarily disable flaky checks in wineh-multi-parent-cloning.	Andrew Kaylor	2015-11-06	1	-4/+8
\| \| \| \|	llvm-svn: 252258
*	[WinEH] Clone funclets with multiple parents	Andrew Kaylor	2015-11-06	4	-10/+1686
\| \| \| \| \| \| \| \| \| \|	Windows EH funclets need to always return to a single parent funclet. However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated. These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet. Differential Revision: http://reviews.llvm.org/D13274?id=39098 llvm-svn: 252249
*	[bugpoint] Add a named metadata (+their operands) reducer	Keno Fischer	2015-11-06	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We frequently run bugpoint on a linked module that consists of all modules we create while jitting the julia standard library. This module has a very large number of compile units (10000+) in `llvm.dbg.cu`, which didn't get reduced at all, requiring manual post processing. This is an attempt to have bugpoint go through and attempt to reduce the number of global named metadata nodes as well as their operands, to cut down the number of roots for such metadata. Reviewers: dexonsmith, reames, pete Subscribers: pete, dexonsmith, reames, llvm-commits Differential Revision: http://reviews.llvm.org/D14043 llvm-svn: 252247
*	Re-apply r251050 with a for PR25421	Sanjoy Das	2015-11-05	2	-0/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug: I missed adding break statements in the switch / case. Original commit message: [SCEV] Teach SCEV some axioms about non-wrapping arithmetic Summary: - A s< (A + C)<nsw> if C > 0 - A s<= (A + C)<nsw> if C >= 0 - (A + C)<nsw> s< A if C < 0 - (A + C)<nsw> s<= A if C <= 0 Right now `C` needs to be a constant, but we can later generalize it to be a non-constant if needed. Reviewers: atrick, hfinkel, reames, nlewycky Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13686 llvm-svn: 252236
*	Revert r251050 to fix miscompile when running Clang -O1	Richard Trieu	2015-11-05	1	-58/+0
\| \| \| \| \| \| \|	See bug for details: https://llvm.org/bugs/show_bug.cgi?id=25421 Some comparisons were incorrectly replaced with a constant value. llvm-svn: 252231
*	DI: Reverse direction of subprogram -> function edge.	Peter Collingbourne	2015-11-05	370	-1125/+1143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
*	Remove windows line endings introduced by r252177. NFC.	Tim Northover	2015-11-05	4	-172/+172
\| \| \| \|	llvm-svn: 252217
*	[ASan] Disable instrumentation for inalloca variables.	Alexey Samsonov	2015-11-05	1	-0/+16
\| \| \| \| \| \| \| \|	inalloca variables were not treated as static allocas, therefore didn't participate in regular stack instrumentation. We don't want them to participate in dynamic alloca instrumentation as well. llvm-svn: 252213
*	[WinEH] Fix funclet prologues with stack realignment	Reid Kleckner	2015-11-05	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a test for this for 32-bit SEH catchpads, but those don't actually create funclets. We had a bug that only appeared in funclet prologues, where we would establish EBP and ESI as our FP and BP, and then downstream prologue code would overwrite them. While I was at it, I fixed Win64+funclets+stackrealign. This issue doesn't come up as often there due to the ABI requring 16 byte stack alignment, but now we can rest easy that AVX and WinEH will work well together =P. llvm-svn: 252210
*	[WebAssembly] Update wasm builtin functions to match spec changes.	Dan Gohman	2015-11-05	2	-34/+10
\| \| \| \| \| \| \|	The page_size operator has been removed from the spec, and the resize_memory operator has been changed to grow_memory. llvm-svn: 252202
*	Reapply r250906 with many suggested updates from Rafael Espindola.	Kevin Enderby	2015-11-05	4	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The needed lld matching changes to be submitted immediately next, but this revision will cause lld failures with this alone which is expected. This removes the eating of the error in Archive::Child::getSize() when the characters in the size field in the archive header for the member is not a number. To do this we have all of the needed methods return ErrorOr to push them up until we get out of lib. Then the tools and can handle the error in whatever way is appropriate for that tool. So the solution is to plumb all the ErrorOr stuff through everything that touches archives. This include its iterators as one can create an Archive object but the first or any other Child object may fail to be created due to a bad size field in its header. Thanks to Lang Hames on the changes making child_iterator contain an ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add operator overloading for * and -> . We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash” and using report_fatal_error() to move the error checking will cause the program to stop, neither of which are really correct in library code. There are still some uses of these that should be cleaned up in this library code for other than the size field. The test cases use archives with text files so one can see the non-digit character, in this case a ‘%’, in the size field. These changes will require corresponding changes to the lld project. That will be committed immediately after this change. But this revision will cause lld failures with this alone which is expected. llvm-svn: 252192
*	[DebugInfo] Fix ARM/AArch64 prologue_end position. Related to D11268.	Oleg Ranevskyy	2015-11-05	4	-0/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it. D11268 is quite old and has merge conflicts against the current trunk. This request - rebases D11268 onto the new trunk; - resolves the merge conflicts; - fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct. Reviewers: echristo, rengolin, kubabrecka Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252177
*	Add cfi instr for CFA calculation when movpc is expanded to call and pop	Petar Jovanovic	2015-11-05	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the issue of wrong CFA calculation in the following case: 0x08048400 <+0>: push %ebx 0x08048401 <+1>: sub $0x8,%esp 0x08048404 <+4>: call 0x8048409 <test+9> 0x08048409 <+9>: pop %eax 0x0804840a <+10>: add $0x1bf7,%eax 0x08048410 <+16>: mov %eax,%ebx 0x08048412 <+18>: call 0x80483f0 <bar> 0x08048417 <+23>: add $0x8,%esp 0x0804841a <+26>: pop %ebx 0x0804841b <+27>: ret The highlighted instructions are a product of movpc instruction. The call instruction changes the stack pointer, and pop instruction restores its value. However, the rule for computing CFA is not updated and is wrong on the pop instruction. So, e.g. backtrace in gdb does not work when on the pop instruction. This adds cfi instructions for both call and pop instructions. cfi_adjust_cfa_offset** instruction is used with the appropriate offset for setting the rules to calculate CFA correctly. Patch by Violeta Vukobrat. Differential Revision: http://reviews.llvm.org/D14021 llvm-svn: 252176
*	[WebAssembly] Rename ior operator to or to match the spec	Derek Schuff	2015-11-05	4	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The spec uses "or" for inclusive-or and "xor" for exclusive-or Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D14362 llvm-svn: 252174
*	revert rev. 252153 due to build failure on ubuntu	Asaf Badouh	2015-11-05	2	-202/+0
\| \| \| \| \| \|	[X86][AVX512] add comi with Sae llvm-svn: 252154
*	[X86][AVX512] add comi with Sae	Asaf Badouh	2015-11-05	2	-0/+202
\| \| \| \| \| \| \| \|	add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153
*	[X86][AVX512] small bugfix in VPBROADCASTM	Asaf Badouh	2015-11-05	1	-0/+15
\| \| \| \| \| \| \| \|	VPBROADCASTMW2D and VPBROADCASTMB2Q Differential Revision: http://reviews.llvm.org/D14335 llvm-svn: 252151
*	Fix LoopAccessAnalysis when potentially nullptr check are involved	Mehdi Amini	2015-11-05	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GetUnderlyingObjects() can return "null" among its list of objects, we don't want to deduce that two pointers can point to the same memory in this case, so filter it out. Reviewers: anemet Subscribers: dexonsmith, llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 252149
*	AMDGPU: Disallow s[102:103] on VI in assembler	Matt Arsenault	2015-11-05	2	-16/+46
\| \| \| \|	llvm-svn: 252142
*	AMDGPU: Fix assert when legalizing atomic operands	Matt Arsenault	2015-11-05	1	-0/+52
\| \| \| \| \| \| \| \| \| \|	The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
*	[WinEH] Fix establisher param reg in CLR funclets	Joseph Tremoulet	2015-11-05	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The CLR's personality routine passes the pointer to the establisher frame in RCX, not RDX. Reviewers: pgavlin, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14343 llvm-svn: 252135
*	Go back to producing relocations for out of range symbols.	Rafael Espindola	2015-11-05	2	-7/+5
\| \| \| \| \| \| \| \|	This brings back the behavior from before r252090 for out of range symbols. Should bring some arm bots back. llvm-svn: 252119
*	AMDGPU: Add missing v2f64 fadd tests	Matt Arsenault	2015-11-05	1	-10/+42
\| \| \| \|	llvm-svn: 252117
*	Fix pr24832.	Rafael Espindola	2015-11-05	1	-0/+14
\| \| \| \| \| \|	It is pretty simple now that the yak is shaved. llvm-svn: 252105
*	Simplify .org processing and make it a bit more powerful.	Rafael Espindola	2015-11-04	2	-2/+5
\| \| \| \| \| \| \|	We now always create the fragment, which lets us handle things like .org after a .align. llvm-svn: 252101
*	[SimplifyLibCalls] New transformation: tan(atan(x)) -> x	Davide Italiano	2015-11-04	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is enabled only under -ffast-math. So, instead of emitting: 4007b0: 50 push %rax 4007b1: e8 8a fd ff ff callq 400540 <atanf@plt> 4007b6: 58 pop %rax 4007b7: e9 94 fd ff ff jmpq 400550 <tanf@plt> 4007bc: 0f 1f 40 00 nopl 0x0(%rax) for: float mytan(float x) { return tanf(atanf(x)); } we emit a single retq. Differential Revision: http://reviews.llvm.org/D14302 llvm-svn: 252098
*	[CaptureTracking] Support operand bundles conservatively	Sanjoy Das	2015-11-04	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Earlier CaptureTracking would assume all "interesting" operands to a call or invoke were its arguments. With operand bundles this is no longer true. Note: an earlier change got `doesNotCapture` working correctly with operand bundles. This change uses DSE to test the changes to CaptureTracking. DSE is a vehicle for testing only, and is not directly involved in this change. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14306 llvm-svn: 252095
*	Slightly saner handling of thumb branches.	Rafael Espindola	2015-11-04	2	-0/+27
\| \| \| \| \| \| \| \|	The generic infrastructure already did a lot of work to decide if the fixup value is know or not. It doesn't make sense to reimplement a very basic case: same fragment. llvm-svn: 252090
*	[x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.	Quentin Colombet	2015-11-04	1	-0/+122
\| \| \| \| \| \| \| \| \| \|	Win64 has some strict requirements for the epilogue. As a result, we disable shrink-wrapping for Win64 unless the block that gets the epilogue is already an exit block. Fixes PR24193. llvm-svn: 252088
*	[X86][SSE] Add general memory folding for (V)INSERTPS instruction	Simon Pilgrim	2015-11-04	5	-13/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074
*	Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of ↵	Andrew Kaylor	2015-11-04	2	-248/+939
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	scalar FMA intrinsics. Patch by Slava Klochkov The key difference between FMA* and FMA_Int opcodes is that FMA_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result. Reviewers: Quentin Colombet, Elena Demikhovsky Differential Revision: http://reviews.llvm.org/D13710 llvm-svn: 252060
*	[ARM] Combine CMOV into BFI where possible	James Molloy	2015-11-04	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057
*	[ThinLTO] Always set linkage type to external when converting alias	Teresa Johnson	2015-11-04	1	-0/+11
\| \| \| \| \| \| \| \|	When converting an alias to a non-alias when the aliasee is not imported, ensure that the linkage type is set to external so that it is a valid linkage type. Added a test case that exposed this issue. llvm-svn: 252054
*	[SimplifyCFG] Merge conditional stores	James Molloy	2015-11-04	1	-0/+241
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code: if (c & flag1) a = x; if (c & flag2) a = y; ... There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to. It is, however, legal to merge the stores together and do the store once: tmp = undef; if (c & flag1) tmp = x; if (c & flag2) tmp = y; if (c & flag1 \|\| c & flag2) *a = tmp; The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1\|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too. This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure. The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate. This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite. llvm-svn: 252051
*	Error out when faced with value names containing '\0'	Filipe Cabecinhas	2015-11-04	2	-0/+5
\| \| \| \| \| \|	Bug found with afl-fuzz. llvm-svn: 252048
*	[ELF] elfiamcu triple should imply e_machine == EM_IAMCU	Michael Kuperstein	2015-11-04	1	-1/+4
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14109 llvm-svn: 252043
*	[X86] DAGCombine should not introduce FILD in soft-float mode	Michael Kuperstein	2015-11-04	1	-0/+15
\| \| \| \| \| \| \|	The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode. llvm-svn: 252042
*	[CVP] Fold return values if possible	Philip Reames	2015-11-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In my previous change to CVP (251606), I made CVP much more aggressive about trying to constant fold comparisons. This patch is a reversal in direction. Rather than being agressive about every compare, we restore the non-block local restriction for most, and then try hard for compares feeding returns. The motivation for this is two fold: * The more I thought about it, the less comfortable I got with the possible compile time impact of the other approach. There have been no reported issues, but after talking to a couple of folks, I've come to the conclusion the time probably isn't justified. * It turns out we need to know the context to leverage the full power of LVI. In particular, asking about something at the end of it's block (the use of a compare in a return) will frequently get more precise results than something in the middle of a block. This is an implementation detail, but it's also hard to get around since mid-block queries have to reason about possible throwing instructions and don't get to use most of LVI's block focused infrastructure. This will become particular important when combined with http://reviews.llvm.org/D14263. Differential Revision: http://reviews.llvm.org/D14271 llvm-svn: 252032
*	[StatepointLowering] Remove distinction between call and invoke safepoints	Igor Laevsky	2015-11-04	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \|	There is no point in having invoke safepoints handled differently than the call safepoints. All relevant decisions could be made by looking at whether or not gc.result and gc.relocate lay in a same basic block. This change will allow to lower call safepoints with relocates and results in a different basic blocks. See test case for example. Differential Revision: http://reviews.llvm.org/D14158 llvm-svn: 252028
*	Fix the test case for Windows.	Alexey Samsonov	2015-11-04	1	-1/+1
\| \| \| \|	llvm-svn: 252027
*	[llvm-symbolizer] Improve the test for missing input file.	Alexey Samsonov	2015-11-04	1	-1/+3
\| \| \| \|	llvm-svn: 252020
*	LLE 6/6: Add LoopLoadElimination pass	Adam Nemet	2015-11-03	6	-0/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The goal of this pass is to perform store-to-load forwarding across the backedge of a loop. E.g.: for (i) A[i + 1] = A[i] + B[i] => T = A[0] for (i) T = T + B[i] A[i + 1] = T The pass relies on loop dependence analysis via LoopAccessAnalisys to find opportunities of loop-carried dependences with a distance of one between a store and a load. Since it's using LoopAccessAnalysis, it was easy to also add support for versioning away may-aliasing intervening stores that would otherwise prevent this transformation. This optimization is also performed by Load-PRE in GVN without the option of multi-versioning. As was discussed with Daniel Berlin in http://reviews.llvm.org/D9548, this is inferior to a more loop-aware solution applied here. Hopefully, we will be able to remove some complexity from GVN/MemorySSA as a consequence. In the long run, we may want to extend this pass (or create a new one if there is little overlap) to also eliminate loop-indepedent redundant loads and store that require versioning due to may-aliasing intervening stores/loads. I have some motivating cases for store elimination. My plan right now is to wait for MemorySSA to come online first rather than using memdep for this. The main motiviation for this pass is the 456.hmmer loop in SPECint2006 where after distributing the original loop and vectorizing the top part, we are left with the critical path exposed in the bottom loop. Being able to promote the memory dependence into a register depedence (even though the HW does perform store-to-load fowarding as well) results in a major gain (~20%). This gain also transfers over to x86: it's around 8-10%. Right now the pass is off by default and can be enabled with -enable-loop-load-elim. On the LNT testsuite, there are two performance changes (negative number -> improvement): 1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the critical paths is reduced 2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this outside of LNT The pass is scheduled after the loop vectorizer (which is after loop distribution). The rational is to try to reuse LAA state, rather than recomputing it. The order between LV and LLE is not critical because normally LV does not touch scalar st->ld forwarding cases where vectorizing would inhibit the CPU's st->ld forwarding to kick in. LoopLoadElimination requires LAA to provide the full set of dependences (including forward dependences). LAA is known to omit loop-independent dependences in certain situations. The big comment before removeDependencesFromMultipleStores explains why this should not occur for the cases that we're interested in. Reviewers: dberlin, hfinkel Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13259 llvm-svn: 252017
*	AMDGPU: Fix asserts on invalid register ranges	Matt Arsenault	2015-11-03	3	-0/+60
\| \| \| \| \| \| \| \| \|	If the requested SGPR was not actually aligned, it was accepted and rounded down instead of rejected. Also fix an assert if the range is an invalid size. llvm-svn: 252009
*	AMDGPU: Fix off by one error in register parsing	Matt Arsenault	2015-11-03	1	-0/+14
\| \| \| \| \| \|	If trying to use one past the end, this would assert. llvm-svn: 252008
*	Address nit	Derek Schuff	2015-11-03	1	-32/+32
\| \| \| \|	llvm-svn: 252004
*	[WebAssembly] Support wasm select operator	Derek Schuff	2015-11-03	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for wasm's select operator, and lower LLVM's select DAG node to it. Reviewers: sunfish Subscribers: dschuff, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D14295 llvm-svn: 252002