bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ARM] Combine CMOV into BFI where possible	James Molloy	2015-11-04	2	-0/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057
*	[ThinLTO] Always set linkage type to external when converting alias	Teresa Johnson	2015-11-04	1	-2/+4
\| \| \| \| \| \| \| \|	When converting an alias to a non-alias when the aliasee is not imported, ensure that the linkage type is set to external so that it is a valid linkage type. Added a test case that exposed this issue. llvm-svn: 252054
*	[SimplifyCFG] Merge conditional stores	James Molloy	2015-11-04	1	-3/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code: if (c & flag1) a = x; if (c & flag2) a = y; ... There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to. It is, however, legal to merge the stores together and do the store once: tmp = undef; if (c & flag1) tmp = x; if (c & flag2) tmp = y; if (c & flag1 \|\| c & flag2) *a = tmp; The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1\|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too. This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure. The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate. This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite. llvm-svn: 252051
*	Error out when faced with value names containing '\0'	Filipe Cabecinhas	2015-11-04	1	-1/+4
\| \| \| \| \| \|	Bug found with afl-fuzz. llvm-svn: 252048
*	[ELF] elfiamcu triple should imply e_machine == EM_IAMCU	Michael Kuperstein	2015-11-04	4	-4/+24
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14109 llvm-svn: 252043
*	[X86] DAGCombine should not introduce FILD in soft-float mode	Michael Kuperstein	2015-11-04	1	-2/+2
\| \| \| \| \| \| \|	The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode. llvm-svn: 252042
*	[LVI] Update a comment to clarify what's actually happening and why	Philip Reames	2015-11-04	1	-3/+22
\| \| \| \|	llvm-svn: 252033
*	[CVP] Fold return values if possible	Philip Reames	2015-11-04	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \|	In my previous change to CVP (251606), I made CVP much more aggressive about trying to constant fold comparisons. This patch is a reversal in direction. Rather than being agressive about every compare, we restore the non-block local restriction for most, and then try hard for compares feeding returns. The motivation for this is two fold: * The more I thought about it, the less comfortable I got with the possible compile time impact of the other approach. There have been no reported issues, but after talking to a couple of folks, I've come to the conclusion the time probably isn't justified. * It turns out we need to know the context to leverage the full power of LVI. In particular, asking about something at the end of it's block (the use of a compare in a return) will frequently get more precise results than something in the middle of a block. This is an implementation detail, but it's also hard to get around since mid-block queries have to reason about possible throwing instructions and don't get to use most of LVI's block focused infrastructure. This will become particular important when combined with http://reviews.llvm.org/D14263. Differential Revision: http://reviews.llvm.org/D14271 llvm-svn: 252032
*	[StatepointLowering] Remove distinction between call and invoke safepoints	Igor Laevsky	2015-11-04	2	-24/+31
\| \| \| \| \| \| \| \| \| \| \| \|	There is no point in having invoke safepoints handled differently than the call safepoints. All relevant decisions could be made by looking at whether or not gc.result and gc.relocate lay in a same basic block. This change will allow to lower call safepoints with relocates and results in a different basic blocks. See test case for example. Differential Revision: http://reviews.llvm.org/D14158 llvm-svn: 252028
*	[LLVMSymbolize] Reduce indentation by using helper function. NFC.	Alexey Samsonov	2015-11-04	1	-21/+24
\| \| \| \|	llvm-svn: 252022
*	[LLVMSymbolize] Properly propagate object parsing errors from the library.	Alexey Samsonov	2015-11-04	1	-90/+107
\| \| \| \|	llvm-svn: 252021
*	Fix unused variable warning from r252017	Adam Nemet	2015-11-04	1	-3/+2
\| \| \| \|	llvm-svn: 252019
*	LLE 6/6: Add LoopLoadElimination pass	Adam Nemet	2015-11-03	4	-0/+566
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The goal of this pass is to perform store-to-load forwarding across the backedge of a loop. E.g.: for (i) A[i + 1] = A[i] + B[i] => T = A[0] for (i) T = T + B[i] A[i + 1] = T The pass relies on loop dependence analysis via LoopAccessAnalisys to find opportunities of loop-carried dependences with a distance of one between a store and a load. Since it's using LoopAccessAnalysis, it was easy to also add support for versioning away may-aliasing intervening stores that would otherwise prevent this transformation. This optimization is also performed by Load-PRE in GVN without the option of multi-versioning. As was discussed with Daniel Berlin in http://reviews.llvm.org/D9548, this is inferior to a more loop-aware solution applied here. Hopefully, we will be able to remove some complexity from GVN/MemorySSA as a consequence. In the long run, we may want to extend this pass (or create a new one if there is little overlap) to also eliminate loop-indepedent redundant loads and store that require versioning due to may-aliasing intervening stores/loads. I have some motivating cases for store elimination. My plan right now is to wait for MemorySSA to come online first rather than using memdep for this. The main motiviation for this pass is the 456.hmmer loop in SPECint2006 where after distributing the original loop and vectorizing the top part, we are left with the critical path exposed in the bottom loop. Being able to promote the memory dependence into a register depedence (even though the HW does perform store-to-load fowarding as well) results in a major gain (~20%). This gain also transfers over to x86: it's around 8-10%. Right now the pass is off by default and can be enabled with -enable-loop-load-elim. On the LNT testsuite, there are two performance changes (negative number -> improvement): 1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the critical paths is reduced 2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this outside of LNT The pass is scheduled after the loop vectorizer (which is after loop distribution). The rational is to try to reuse LAA state, rather than recomputing it. The order between LV and LLE is not critical because normally LV does not touch scalar st->ld forwarding cases where vectorizing would inhibit the CPU's st->ld forwarding to kick in. LoopLoadElimination requires LAA to provide the full set of dependences (including forward dependences). LAA is known to omit loop-independent dependences in certain situations. The big comment before removeDependencesFromMultipleStores explains why this should not occur for the cases that we're interested in. Reviewers: dberlin, hfinkel Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13259 llvm-svn: 252017
*	[LAA] LLE 5/6: Add predicate functions Dependence::isForward/isBackward, NFC	Adam Nemet	2015-11-03	1	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Will be used by the LoopLoadElimination pass. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13258 llvm-svn: 252016
*	CodeGen, Target: Move Mach-O-specific symbol name logic to Mach-O lowering.	Peter Collingbourne	2015-11-03	3	-23/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A profile of an LTO link of Chrome revealed that we were spending some ~30-50% of execution time in the function Constant::getRelocationInfo(), which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn from TargetMachine::getNameWithPrefix(). It turns out that we only need the result of getKindForGlobal() when targeting Mach-O, so this change moves the relevant part of the logic to TargetLoweringObjectFileMachO. NFCI. Differential Revision: http://reviews.llvm.org/D14168 llvm-svn: 252014
*	AMDGPU: Make flat_scratch name consistent	Matt Arsenault	2015-11-03	1	-3/+3
\| \| \| \| \| \| \|	The printed name and the parsed assembler names weren't the same. I'm not sure which name SC prints these as, but I think it's this one. llvm-svn: 252010
*	AMDGPU: Fix asserts on invalid register ranges	Matt Arsenault	2015-11-03	1	-5/+13
\| \| \| \| \| \| \| \| \|	If the requested SGPR was not actually aligned, it was accepted and rounded down instead of rejected. Also fix an assert if the range is an invalid size. llvm-svn: 252009
*	AMDGPU: Fix off by one error in register parsing	Matt Arsenault	2015-11-03	1	-4/+5
\| \| \| \| \| \|	If trying to use one past the end, this would assert. llvm-svn: 252008
*	Align whitespace	Derek Schuff	2015-11-03	2	-4/+4
\| \| \| \|	llvm-svn: 252003
*	[WebAssembly] Support wasm select operator	Derek Schuff	2015-11-03	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for wasm's select operator, and lower LLVM's select DAG node to it. Reviewers: sunfish Subscribers: dschuff, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D14295 llvm-svn: 252002
*	AMDGPU: s[102:103] is unavailable on VI	Matt Arsenault	2015-11-03	1	-1/+10
\| \| \| \|	llvm-svn: 252000
*	AMDGPU: Define correct number of SGPRs	Matt Arsenault	2015-11-03	2	-6/+10
\| \| \| \| \| \| \| \| \|	There are actually 104 so 2 were missing. More assembler tests with high register number tuples will be included in later patches. llvm-svn: 251999
*	AMDGPU: Make findUsedSGPR more readable	Matt Arsenault	2015-11-03	1	-7/+18
\| \| \| \| \| \|	Add more comments etc. llvm-svn: 251996
*	AMDGPU: Initialize SIFixSGPRCopies so -print-after works	Matt Arsenault	2015-11-03	3	-8/+15
\| \| \| \|	llvm-svn: 251995
*	AMDGPU: Alphabetize includes	Matt Arsenault	2015-11-03	1	-1/+1
\| \| \| \|	llvm-svn: 251994
*	InstCombine: fix sinking of convergent calls	Fiona Glaser	2015-11-03	1	-0/+6
\| \| \| \|	llvm-svn: 251991
*	[SelectionDAG] Use existing constant nodes instead of recreating them. NFC.	Simon Pilgrim	2015-11-03	1	-9/+6
\| \| \| \|	llvm-svn: 251990
*	[LLVMSymbolize] Factor out the logic for printing structs from DIContext. NFC.	Alexey Samsonov	2015-11-03	3	-61/+76
\| \| \| \| \| \| \| \|	Introduce DIPrinter which takes care of rendering DILineInfo and friends. This allows LLVMSymbolizer class to return a structured data instead of plain std::strings. llvm-svn: 251989
*	[LAA] LLE 3/6: Rename InterestingDependence to Dependences, NFC	Adam Nemet	2015-11-03	2	-33/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now collect all types of dependences including lexically forward deps not just "interesting" ones. Reviewers: hfinkel Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13256 llvm-svn: 251985
*	[LLVMSymbolize] Move demangling away from printing routines. NFC.	Alexey Samsonov	2015-11-03	1	-28/+33
\| \| \| \| \| \| \| \| \|	Make printDILineInfo and friends responsible for just rendering the contents of the structures, demangling should actually be performed earlier, when we have the information about the originating SymbolizableModule at hand. llvm-svn: 251981
*	[SimplifyLibCalls] Add a new transformation: pow(exp(x), y) -> exp(x*y)	Davide Italiano	2015-11-03	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This one is enabled only under -ffast-math (due to rounding/overflows) but allows us to emit shorter code. Before (on FreeBSD x86-64): 4007f0: 50 push %rax 4007f1: f2 0f 11 0c 24 movsd %xmm1,(%rsp) 4007f6: e8 75 fd ff ff callq 400570 <exp2@plt> 4007fb: f2 0f 10 0c 24 movsd (%rsp),%xmm1 400800: 58 pop %rax 400801: e9 7a fd ff ff jmpq 400580 <pow@plt> 400806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40080d: 00 00 00 After: 4007b0: f2 0f 59 c1 mulsd %xmm1,%xmm0 4007b4: e9 87 fd ff ff jmpq 400540 <exp2@plt> 4007b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) Differential Revision: http://reviews.llvm.org/D14045 llvm-svn: 251976
*	[X86][XOP] Add support for the matching of the VPCMOV bit select instruction	Simon Pilgrim	2015-11-03	2	-0/+21
\| \| \| \| \| \| \| \| \| \|	XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) ) This patch adds tablegen pattern matching for this instruction. Differential Revision: http://reviews.llvm.org/D8841 llvm-svn: 251975
*	[LAA] LLE 2/6: Fix a NoDep case that should be a Forward dependence	Adam Nemet	2015-11-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When the dependence distance in zero then we have a loop-independent dependence from the earlier to the later access. No current client of LAA uses forward dependences so other than potentially hitting the MaxDependences threshold earlier, this change shouldn't affect anything right now. This and the previous patch were tested together for compile-time regression. None found in LNT/SPEC. Reviewers: hfinkel Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13255 llvm-svn: 251973
*	[LAA] LLE 1/6: Expose Forward dependences	Adam Nemet	2015-11-03	1	-13/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before this change, we didn't use to collect forward dependences since none of the current clients (LV, LDist) required them. The motivation to also collect forward dependences is a new pass LoopLoadElimination (LLE) which discovers store-to-load forwarding opportunities across the loop's backedge. The pass uses both lexically forward or backward loop-carried dependences to detect these opportunities. The new pass also analyzes loop-independent (forward) dependences since they can conflict with the loop-carried dependences in terms of how the data flows through memory. The newly added test only covers loop-carried forward dependences because loop-independent ones are currently categorized as NoDep. The next patch will fix this. The two patches were tested together for compile-time regression. None found in LNT/SPEC. Note that with this change LAA provides all dependences rather than just "interesting" ones. A subsequent NFC patch will remove the now trivial isInterestingDependence and rename the APIs. Reviewers: hfinkel Subscribers: jmolloy, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13254 llvm-svn: 251972
*	Don't create empty sections just to look like gas.	Rafael Espindola	2015-11-03	1	-10/+0
\| \| \| \| \| \| \|	We are long past the time when this much bug for bug compatibility was useful. llvm-svn: 251970
*	Revert "Move metadata linking after lazy global materialization/linking."	Teresa Johnson	2015-11-03	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r251926. I believe this is causing an LTO bootstrapping bot failure (http://lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/3669/). Haven't been able to repro it yet, but after looking at the metadata I am pretty sure I know what is going on. llvm-svn: 251965
*	[libFuzzer] make -test_single_input more reliable: make sure the input's ↵	Kostya Serebryany	2015-11-03	1	-1/+3
\| \| \| \| \| \|	size is equal to it's capacity llvm-svn: 251961
*	Delete dead code.	Rafael Espindola	2015-11-03	2	-9/+0
\| \| \| \|	llvm-svn: 251960
*	Simplify local common output.	Rafael Espindola	2015-11-03	1	-20/+14
\| \| \| \| \| \| \| \|	We now create them as they are found and use higher level APIs. This is a step in avoiding creating unnecessary sections. llvm-svn: 251958
*	[CodegenPrepare] Do not rematerialize gc.relocates across different basic blocks	Igor Laevsky	2015-11-03	1	-0/+8
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14258 llvm-svn: 251957
*	Move code out of a loop and use a range loop.	Rafael Espindola	2015-11-03	1	-10/+8
\| \| \| \|	llvm-svn: 251952
*	Revert "Revert "[Orc] Directly emit machine code for the x86 resolver block ↵	Rafael Espindola	2015-11-03	4	-150/+90
\| \| \| \| \| \| \| \| \| \|	and trampolines."" This reverts commit r251937. The test was updated to the new API, bring the API back. llvm-svn: 251944
*	Fix PR25372 - teach replaceCongruentPHIs to handle cases where SE evaluates ↵	Silviu Baranga	2015-11-03	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a PHI to a SCEVConstant Summary: Since now Scalar Evolution can create non-add rec expressions for PHI nodes, it can also create SCEVConstant expressions. This will confuse replaceCongruentPHIs, which previously relied on the fact that SCEV could not produce constants in this case. We will now replace the node with a constant in these cases - or avoid processing the Phi in case of a type mismatch. Reviewers: sanjoy Subscribers: llvm-commits, majnemer Differential Revision: http://reviews.llvm.org/D14230 llvm-svn: 251938
*	Revert "[Orc] Directly emit machine code for the x86 resolver block and ↵	Rafael Espindola	2015-11-03	4	-90/+150
\| \| \| \| \| \| \| \| \| \|	trampolines." This reverts commit r251933. It broke the build of examples/Kaleidoscope/Orc/fully_lazy/toy.cpp. llvm-svn: 251937
*	[Orc] Directly emit machine code for the x86 resolver block and trampolines.	Lang Hames	2015-11-03	4	-150/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bypassing LLVM for this has a number of benefits: 1) Laziness support becomes asm-syntax agnostic (previously lazy jitting didn't work on Windows as the resolver block was in Darwin asm). 2) For cross-process JITs, it allows resolver blocks and trampolines to be emitted directly in the target process, reducing cross process traffic. 3) It should be marginally faster. llvm-svn: 251933
*	Move metadata linking after lazy global materialization/linking.	Teresa Johnson	2015-11-03	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, named metadata is linked before the LazilyLinkGlobalValues list is walked and materialized/linked. As a result, references from DISubprogram and DIGlobalVariable metadata to yet unmaterialized functions and variables cause them to be added to the lazy linking list and their definitions are materialized and linked. This makes the llvm-link -only-needed option not have the intended effect when debug information is present, as the otherwise unneeded functions/variables are still linked in. Additionally, for ThinLTO I have implemented a mechanism to only link in debug metadata needed by imported functions. Moving named metadata linking after lazy GV linking will facilitate applying this mechanism to the LTO and "llvm-link -only-needed" cases as well. Reviewers: dexonsmith, tra, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14195 llvm-svn: 251926
*	Don't assert if materializing before seeing any function bodies	Filipe Cabecinhas	2015-11-03	1	-1/+3
\| \| \| \| \| \| \| \| \|	This assert was reachable from user input. A minimized test case (no FUNCTION_BLOCK_ID record) is attached. Bug found with afl-fuzz llvm-svn: 251910
*	Don't use Twine objects after their lifetimes end.	Filipe Cabecinhas	2015-11-03	1	-6/+6
\| \| \| \| \| \| \| \|	No test, since it would depend on what the compiler can optimize/reuse. My next commit made this bug visible on Linux Release compiles with some versions of gcc. llvm-svn: 251909
*	LoopVectorizer - skip 'bitcast' between GEP and load.	Elena Demikhovsky	2015-11-03	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \|	Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double, double* %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907
*	[X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments	Michael Kuperstein	2015-11-03	4	-27/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When push instructions are being used to pass function arguments on the stack, and either EH or debugging are enabled, we need to generate .cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is enough for the CFA offset to be correct at every call site, while for debugging we want to be correct after every push. Darwin does not support this well, so don't use pushes whenever it would be required. Differential Revision: http://reviews.llvm.org/D13767 llvm-svn: 251904