bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ADT] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).	Eugene Zelenko	2017-05-16	1	-9/+28
\| \| \| \|	llvm-svn: 303221
*	Fix for compilers with older CRT header libraries.	Zachary Turner	2017-05-16	1	-1/+6
\| \| \| \|	llvm-svn: 303220
*	[Support] Ignore OutputDebugString exceptions in our crash recovery.	Zachary Turner	2017-05-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we use AddVectoredExceptionHandler, we get notified of every exception that gets raised by a program. Sometimes these are not necessarily errors though, and this can be especially true when linking against a library that we have no control over, and may raise an exception internally which it intends to catch. In particular, the Windows API OutputDebugString does exactly this. It raises an exception inside of a __try / __except, giving the debugger a chance to handle the exception to print the message to the debug console. But this doesn't interoperate nicely with our vectored exception handler, which just sees another exception and decides that we need to terminate the program. Add a special case for this so that we ignore ODS exceptions and continue normally. Note that a better fix is to simply not use vectored exception handlers and use SEH instead, but given that MinGW doesn't support SEH, this is the only solution for MinGW. Differential Revision: https://reviews.llvm.org/D33260 llvm-svn: 303219
*	[IR] Prefer use_empty() to !hasNUsesOrMore(1) for clarity.	Davide Italiano	2017-05-16	2	-2/+2
\| \| \| \|	llvm-svn: 303218
*	[InstSimplify] add folds for constant mask of value shifted by constant	Sanjay Patel	2017-05-16	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We would eventually catch these via demanded bits and computing known bits in InstCombine, but I think it's better to handle the simple cases as soon as possible as a matter of efficiency. This fold allows further simplifications based on distributed ops transforms. eg: %a = lshr i8 %x, 7 %b = or i8 %a, 2 %c = and i8 %b, 1 InstSimplify can directly fold this now: %a = lshr i8 %x, 7 Differential Revision: https://reviews.llvm.org/D33221 llvm-svn: 303213
*	The patch exclude a case from zero check skip in	Evgeny Stupachenko	2017-05-16	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CTLZ idiom recognition (r303102). Summary: The following case: i = 1; if(n) while (n >>= 1) i++; use(i); Was converted to: i = 1; if(n) i += builtin_ctlz(n >> 1, false); use(i); Which is not correct. The patch make it: i = 1; if(n) i += builtin_ctlz(n >> 1, true); use(i); From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303212
*	Re-commit r302678, fixing PR33053.	Amara Emerson	2017-05-16	4	-269/+101
\| \| \| \| \| \| \|	The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211
*	[Inliner] Do not mix callsite and callee hotness based updates.	Easwaran Raman	2017-05-16	1	-15/+27
\| \| \| \| \| \| \| \| \| \|	Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210
*	[PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.	Tim Shen	2017-05-16	5	-8/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205
*	Add hasProfileSummary and has{Sample\|Instrumentation}Profile methods	Easwaran Raman	2017-05-16	1	-1/+1
\| \| \| \| \| \| \| \|	ProfileSummaryInfo already checks whether the module has sample profile in determining profile counts. This will also be useful in inliner to clean up threshold updates. llvm-svn: 303204
*	In debug builds non-trivial amount of time is spent in InstCombine processing	Dmitry Mikulin	2017-05-16	1	-1/+4
\| \| \| \| \| \|	@llvm.dbg.* calls in visitCallInst(). They can be safely ignored. llvm-svn: 303202
*	NewGVN: Only do something in verifyStoreExpressions if assertions are ↵	Daniel Berlin	2017-05-16	1	-0/+2
\| \| \| \| \| \|	enabled, to avoid unused code warnings. llvm-svn: 303201
*	NewGVN: Fix PR 33051 by making sure we remove old store expressions	Daniel Berlin	2017-05-16	1	-27/+36
\| \| \| \| \| \|	from the ExpressionToClass mapping. llvm-svn: 303200
*	Revert "[X86] Replace slow LEA instructions in X86"	Reid Kleckner	2017-05-16	4	-237/+43
\| \| \| \| \| \| \|	This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199
*	Elide stores which are overwritten without being observed.	Nirav Dave	2017-05-16	1	-7/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198
*	ShrinkWrap: Add skipFunction() call	Matthias Braun	2017-05-16	1	-1/+1
\| \| \| \| \| \| \|	ShrinkWrapping is a performance optimization that can safely be skipped, so we can add `if (!skipFunction()) return;` llvm-svn: 303197
*	[MetadataLoader] Remove unused Vector. NFCI.	Davide Italiano	2017-05-16	1	-1/+1
\| \| \| \|	llvm-svn: 303196
*	Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"	Renato Golin	2017-05-16	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193
*	[AMDGPU] Use GCNRPTracker dumper methods in scheduler	Stanislav Mekhanoshin	2017-05-16	3	-18/+21
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33244 llvm-svn: 303186
*	[AMDGPU] Cache live-ins and register pressure in scheduler	Stanislav Mekhanoshin	2017-05-16	2	-75/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using LIS can be quite expensive, so caching of calculated region live-ins and pressure is implemented. It does two things: 1. Caches the info for the second stage when we schedule with decreased target occupancy. 2. Tracks the basic block from top to bottom thus eliminating the need to scan whole register file liveness at every region split in the middle of the block. The scheduling is now done in 3 stages instead of two, with the first one being really a no-op and only used to collect scheduling regions as sent by the scheduler driver. There is no functional change to the current behavior, only compilation speed is affected. In general computeBlockPressure() could be simplified if we switch to backward RP tracker, because scheduler sends regions within a block starting from the last upward. We could use a natural order of upward tracker to seamlessly change between regions of the same block, since live reg set of a previous tracked region would become a live-out of the next region. That however requires fixing upward tracker to properly account defs and uses of the same instruction as both are contributing to the current pressure. When we converge on the produced pressure we should be able to switch between them back and forth. In addition, backward tracker is less expensive as it uses LIS in recede less often than forward uses it in advance. At the moment the worst known case compilation time has improved from 26 minutes to 8.5. Differential Revision: https://reviews.llvm.org/D33117 llvm-svn: 303184
*	[X86] Replace slow LEA instructions in X86	Lama Saba	2017-05-16	4	-43/+237
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183
*	Revert 303174, 303176, and 303178	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \| \| \|	These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182
*	[DAG] Prune deleted nodes in TokenFactor	Nirav Dave	2017-05-16	1	-8/+3
\| \| \| \| \| \|	Fix visitTokenFactor to correctly remove deleted nodes. NFC. llvm-svn: 303181
*	[AMDGPU] Turn register pressure estimation into forward tracker	Stanislav Mekhanoshin	2017-05-16	4	-135/+196
\| \| \| \| \| \| \| \| \| \|	This factors register pressure estimation mechanism from the GCNSchedStrategy into the forward tracker to unify interface with other strategies and expose it to other interested phases. Differential Revision: https://reviews.llvm.org/D33105 llvm-svn: 303179
*	[LV] Avoid potentential division by zero when selecting IC	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \|	llvm-svn: 303174
*	[coroutines] Handle unwind edge splitting	Gor Nishanov	2017-05-16	1	-4/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: RewritePHIs algorithm used in building of CoroFrame inserts a placeholder ``` %placeholder = phi [%val] ``` on every edge leading to a block starting with PHI node with multiple incoming edges, so that if one of the incoming values was spilled and need to be reloaded, we have a place to insert a reload. We use SplitEdge helper function to split the incoming edge. SplitEdge function does not deal with unwind edges comping into a block with an EHPad. This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge. For landing pads, we clone the landing pad into every edge block and replace the original landing pad with a PHI collection the values from all incoming landing pads. For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the edge blocks. Reviewers: majnemer, rnk Reviewed By: majnemer Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D31845 llvm-svn: 303172
*	[DWARF] - Add RelocAddrEntry for cleanup. NFCi.	George Rimar	2017-05-16	1	-2/+2
\| \| \| \| \| \|	Was mentioned as possible cleanup during review of D33184. llvm-svn: 303171
*	Fix an improperly placed curly bracket. NFC.	Chad Rosier	2017-05-16	1	-1/+1
\| \| \| \|	llvm-svn: 303165
*	[DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for ↵	George Rimar	2017-05-16	4	-19/+18
\| \| \| \| \| \| \| \| \| \| \|	DWARFAddressRangesVector. Recommit of r303159 "[DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for DWARFAddressRangesVector" All places were shitched to use DWARFAddressRange now. Suggested during review of D33184. llvm-svn: 303163
*	Revert r303159 "[DWARF] - Use DWARFAddressRange struct instead of uint64_t ↵	George Rimar	2017-05-16	2	-8/+10
\| \| \| \| \| \| \| \| \|	pair for DWARFAddressRangesVector." Something went wrong, it broke BB. http://green.lab.llvm.org/green//job/clang-stage1-cmake-RA-incremental_build/38477/consoleFull#-200034420049ba4694-19c4-4d7e-bec5-911270d8a58c llvm-svn: 303162
*	[DWARF] - Use DWARFAddressRange struct instead of uint64_t pair for ↵	George Rimar	2017-05-16	2	-10/+8
\| \| \| \| \| \| \| \|	DWARFAddressRangesVector. Suggested during review of D33184. llvm-svn: 303159
*	[LTO] Print time-passes information at conclusion of LTO codegen	James Henderson	2017-05-16	3	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The information collected when requested by -time-passes is only printed when llvm_shutdown is called at the moment. This means that when linking against the LTO library dynamically and using the C interface, it is not possible to see the timing information, because llvm_shutdown cannot be called. This change modifies the LTO code generation functions for both regular LTO and thin LTO to explicitly print and reset the timing information. I have tested that this works with our proprietary linker. However, as this relies on a specific method of building and linking against the LTO library, I'm not sure how or if this can be tested in the LLVM testsuite. Reviewed by: mehdi_amini Differential Revision: https://reviews.llvm.org/D32803 llvm-svn: 303152
*	[SCEV] Fix sorting order for AddRecExprs	Max Kazantsev	2017-05-16	1	-16/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs by loop depth, but does not pay attention to dominance of loops. This can lead us to the following buggy situation: for (...) { // loop1 op1 = {A,+,B} } for (...) { // loop2 op2 = {A,+,B} S = add op1, op2 } In this case there is no guarantee that in operand list of S the op2 comes before op1 (loop depth is the same, so they will be sorted just lexicographically), so we can incorrectly treat S as a recurrence of loop1, which is wrong. This patch changes the sorting logic so that it places the dominated recs before the dominating recs. This ensures that when we pick the first recurrency in the operands order, it will be the bottom-most in terms of domination tree. The attached test set includes some tests that produce incorrect SCEV estimations and crashes with oldlogic. Reviewers: sanjoy, reames, apilipenko, anna Reviewed By: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33121 llvm-svn: 303148
*	[CorrelatedValuePropagation] Don't use -> to call a static method of ↵	Craig Topper	2017-05-16	1	-6/+4
\| \| \| \| \| \|	ConstantRange. NFC llvm-svn: 303147
*	NewGVN: Use StoreExpression StoredValue instead of looking it up again, ↵	Daniel Berlin	2017-05-16	1	-6/+5
\| \| \| \| \| \|	since it was already looked up when it was created llvm-svn: 303144
*	NewGVN: Formatting fixes	Daniel Berlin	2017-05-16	1	-2/+3
\| \| \| \|	llvm-svn: 303143
*	Revert "[NewGVN] Replace predicate info leftovers."	Davide Italiano	2017-05-16	1	-4/+0
\| \| \| \| \| \|	It's breaking the bots. llvm-svn: 303142
*	[NewGVN] Replace predicate info leftovers.	Davide Italiano	2017-05-16	1	-0/+4
\| \| \| \| \| \| \| \|	Fixes PR32945. Differential Revision: https://reviews.llvm.org/D33226 llvm-svn: 303141
*	AMDGPUCodeGen: Fix warnings in r303111. [-Wunused-variable]	NAKAMURA Takumi	2017-05-16	2	-2/+4
\| \| \| \|	llvm-svn: 303137
*	IR: Give function GlobalValue::getRealLinkageName() a less misleading name: ↵	Peter Collingbourne	2017-05-16	11	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \|	dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134
*	[ShrinkWrapping] Handle restores on no-return paths	Francis Visoiu Mistrih	2017-05-15	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shrink-wrapping uses post-dominators to find a restore point that post-dominates all the uses of CSR / stack. The way dominator trees are modeled in LLVM today is that unreachable blocks are not present in a generic dominator tree, so, an unreachable node is dominated by anything: include/llvm/Support/GenericDomTree.h:467. Since for post-dominators, a no-return block is considered "unreachable", calling findNearestCommonDominator on an unreachable node A and a non-unreachable node B, will return B, which can be false. If we find such node, we bail out since there is no good restore point available. rdar://problem/30186931 llvm-svn: 303130
*	[libFuzzer] fix tests on Windows	Kostya Serebryany	2017-05-15	1	-0/+1
\| \| \| \|	llvm-svn: 303128
*	Fix memory leak	Xinliang David Li	2017-05-15	1	-0/+4
\| \| \| \|	llvm-svn: 303126
*	[libFuzzer] improve the afl driver and it's tests. Make it possible to run ↵	Kostya Serebryany	2017-05-15	3	-13/+77
\| \| \| \| \| \|	individual inputs with afl driver llvm-svn: 303125
*	[AMDGPU] Kill now unused phiInfoElementGetDebugLoc(). NFCI.	Davide Italiano	2017-05-15	1	-5/+0
\| \| \| \|	llvm-svn: 303122
*	[APInt] Simplify a for loop initialization based on the fact that 'n' is ↵	Craig Topper	2017-05-15	1	-1/+1
\| \| \| \| \| \|	known to be 1 by an earlier 'if'. llvm-svn: 303120
*	[IR] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).	Eugene Zelenko	2017-05-15	5	-67/+108
\| \| \| \|	llvm-svn: 303119
*	AArch64: use linker-private symbols for globals in MachO.	Tim Northover	2017-05-15	2	-0/+11
\| \| \| \| \| \| \| \|	We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118
*	PR32288: Describe a bool parameter's DWARF location with a simple register	David Blaikie	2017-05-15	1	-28/+23
\| \| \| \| \| \| \| \| \| \| \|	There's no need (& a bit incorrect) to mask off the high bits of the register reference when describing a simple bool value. Reviewers: aprantl Differential Revision: https://reviews.llvm.org/D31062 llvm-svn: 303117
*	[SLP] Enable 64-bit wide vectorization on AArch64	Adam Nemet	2017-05-15	5	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116