bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion	Jordan Rupprecht	2018-11-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: -mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this. Fixes PR38177. The test case here is (somewhat) simplified from: ``` struct S { int foo; void fn(int bar); }; void S::fn(int bar) { if (bar) foo = 1; else foo = 0; } ``` Reviewers: dblaikie, gbedwell, aprantl, vsk Reviewed By: vsk Subscribers: vsk, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D54019 llvm-svn: 346027
*	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks	Ayal Zaks	2018-11-02	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959
*	[NFC][LICM] Factor out instruction erasing logic	Max Kazantsev	2018-11-02	1	-11/+15
\| \| \| \| \| \| \| \| \| \|	This patch factors out a function that makes all required updates whenever an instruction gets erased. Differential Revision: https://reviews.llvm.org/D54011 Reviewed By: apilipenko llvm-svn: 345914
*	[LoopInterchange] Fix unused variables in release build	Florian Hahn	2018-11-01	1	-0/+2
\| \| \| \|	llvm-svn: 345881
*	[LoopInterchange] Remove support for inner-only reductions.	Florian Hahn	2018-11-01	1	-105/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inner-loop only reductions require additional checks to make sure they form a load-phi-store cycle across inner and outer loop. Otherwise the reduction value is not properly preserved. This patch disables interchanging such loops for now, as it causes miscompiles in some cases and it seems to apply only for a tiny amount of loops. Across the test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are interchange with inner loop reduction support disabled. With -loop-interchange-threshold=-1000, 3256 instead of 3267. See the discussion and history of D53027 for an outline of how such legality checks could look like. Reviewers: efriedma, mcrosier, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D53027 llvm-svn: 345877
*	Remove unnecessary fallthrough annotation after unreachable	Reid Kleckner	2018-11-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Clang's -Wimplicit-fallthrough implementation warns on this. I built clang with GCC 7.3 in +asserts and -asserts mode, and GCC doesn't warn on this in either configuration. I think it is unnecessary. I separated it from the large mechanical patch (https://reviews.llvm.org/D53950) in case I am wrong and it has to be reverted. llvm-svn: 345876
*	[NFC] Reorganize code to prepare it for more transforms	Max Kazantsev	2018-11-01	1	-4/+15
\| \| \| \|	llvm-svn: 345820
*	[IndVars] Smart hard uses detection	Max Kazantsev	2018-11-01	1	-13/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When rewriting loop exit values, IndVars considers this transform not profitable if the loop instruction has a loop user which it believes cannot be optimized away. In current implementation only calls that immediately use the instruction are considered as such. This patch extends the definition of "hard" users to any side-effecting instructions (which usually cannot be optimized away from the loop) and also allows handling of not just immediate users, but use chains. Differentlai Revision: https://reviews.llvm.org/D51584 Reviewed By: etherzhhb llvm-svn: 345814
*	[InstCombine] Combine nested min/max intrinsics with constants	Volkan Keles	2018-10-31	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, spatel Reviewed By: spatel Subscribers: lebedev.ri, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53774 llvm-svn: 345751
*	[InstCombine] refactor fabs+fcmp fold; NFC	Sanjay Patel	2018-10-31	1	-39/+45
\| \| \| \| \| \| \|	Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734
*	[InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC	Sanjay Patel	2018-10-31	1	-2/+5
\| \| \| \| \| \| \| \|	The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727
*	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative	Sanjay Patel	2018-10-31	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725
*	[LoopUnroll] allow customization for new-pass-manager version of LoopUnroll	Fedor Sergeev	2018-10-31	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike its legacy counterpart new pass manager's LoopUnrollPass does not provide any means to select which flavors of unroll to run (runtime, peeling, partial), relying on global defaults. In some cases having ability to run a restricted LoopUnroll that does more than LoopFullUnroll is needed. Introduced LoopUnrollOptions to select optional unroll behaviors. Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing. Reviewers: chandlerc, tejohnson Differential Revision: https://reviews.llvm.org/D53440 llvm-svn: 345723
*	[IndVars] Strengthen restricton in rewriteLoopExitValues	Max Kazantsev	2018-10-31	1	-28/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some unclear reason rewriteLoopExitValues considers recalculation after the loop profitable if it has some "soft uses" outside the loop (i.e. any use other than call and return), even if we have proved that it has a user inside the loop which we think will not be optimized away. There is no existing unit test that would explain this. This patch provides an example when rematerialisation of exit value is not profitable but it passes this check due to presence of a "soft use" outside the loop. It makes no sense to recalculate value on exit if we are going to compute it due to some irremovable within the loop. This patch disallows applying this transform in the described situation. Differential Revision: https://reviews.llvm.org/D51581 Reviewed By: etherzhhb llvm-svn: 345708
*	[LV] Support vectorization of interleave-groups that require an epilog under	Dorit Nuzman	2018-10-31	1	-32/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
*	[MSan] another take at instrumenting inline assembly - now with calls	Alexander Potapenko	2018-10-31	1	-22/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turns out it's not always possible to figure out whether an asm() statement argument points to a valid memory region. One example would be per-CPU objects in the Linux kernel, for which the addresses are calculated using the FS register and a small offset in the .data..percpu section. To avoid pulling all sorts of checks into the instrumentation, we replace actual checking/unpoisoning code with calls to msan_instrument_asm_load(ptr, size) and msan_instrument_asm_store(ptr, size) functions in the runtime. This patch doesn't implement the runtime hooks in compiler-rt, as there's been no demand in assembly instrumentation for userspace apps so far. llvm-svn: 345702
*	ADT/STLExtras: Introduce llvm::empty; NFC	Matthias Braun	2018-10-31	4	-6/+6
\| \| \| \| \| \| \| \|	This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679
*	[InstCombine] use 'match' to reduce code; NFC	Sanjay Patel	2018-10-30	1	-92/+90
\| \| \| \|	llvm-svn: 345647
*	[InstCombine] Teach the move free before null test opti how to deal with ↵	Quentin Colombet	2018-10-30	1	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	noop casts InstCombine features an optimization that essentially replaces: if (a) free(a) into: free(a) Right now, this optimization is gated by the minsize attribute and therefore we only perform it if we can prove that we are going to be able to eliminate the branch and the destination block. However when casts are involved the optimization would fail to apply, because the optimization was not smart enough to realize that it is possible to also move the casts away from the destination block and that is harmless to the performance since they are just noops. E.g., foo(int a) if (a) free((char)a) Wouldn't be optimized by instcombine, because - We would refuse to hoist the `bitcast i32* %a to i8` in the source block - We would fail to see that `bitcast i32* %a to i8` and %a are the same value. This patch fixes both these problems: - It teaches the pattern matching of the comparison how to look through casts. - It checks that whether the additional instruction in the destination block can be hoisted and are harmless performance-wise. - It hoists all the code of the destination block in the source block. Differential Revision: D53356 llvm-svn: 345644
*	[GCOV] Function counters are wrong when on one line	Calixte Denizet	2018-10-30	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { }) I added a test in: https://reviews.llvm.org/D53601 Reviewers: marco-c Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision: https://reviews.llvm.org/D53600 llvm-svn: 345624
*	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)	Simon Pilgrim	2018-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
*	[InstCombine] use getFltSemantics() instead of duplicating it; NFC	Sanjay Patel	2018-10-30	1	-19/+3
\| \| \| \|	llvm-svn: 345613
*	[InstCombine] try to turn shuffle into insertelement	Sanjay Patel	2018-10-30	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC' The motivating case is at least a couple of steps away: I noticed that SLPVectorizer does not analyze shuffles as well as sequences of insert/extract in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 ...so SLP may fail to vectorize when source code has shuffles to start with or instcombine has converted insert/extract to shuffles. Independent of that, an insertelement is always a simpler op for IR analysis vs. a shuffle, so we should transform to insert when possible. I don't think there's any codegen concern here - if a target can't insert a scalar directly to some fixed element in a vector (x86?), then this should get expanded to the insert+shuffle that we started with. Differential Revision: https://reviews.llvm.org/D53507 llvm-svn: 345607
*	[LoopVectorizer] Fix for cost values of memory accesses.	Jonas Paulsson	2018-10-30	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 llvm-svn: 345603
*	[SROA] Use offset sizes from the DataLayout instead of the pointer siezes.	Nicola Zaghen	2018-10-30	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	This fixes an assertion when constant folding a GEP when the part of the offset was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in the newly created test case. Differential Revision: https://reviews.llvm.org/D52609 llvm-svn: 345585
*	[HotColdSplitting] Allow outlining single-block cold regions	Vedant Kumar	2018-10-29	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It can be profitable to outline single-block cold regions because they may be large. Allow outlining single-block regions if they have over some threshold of non-debug, non-terminator instructions. I chose 3 as the threshold after experimenting with several internal frameworks. In practice, reducing the threshold further did not give much improvement, whereas increasing it resulted in substantial regressions. Differential Revision: https://reviews.llvm.org/D53824 llvm-svn: 345524
*	[Local] Keep K's range if K does not move when combining metadata.	Florian Hahn	2018-10-27	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As K has to dominate I, IIUC I's range metadata must be a subset of K's. After Eli's recent clarification to the LangRef, loading a value outside of the range is undefined behavior. Therefore if I's range contains elements outside of K's range and we would load one such value, K would cause undefined behavior. In cases like hoisting/sinking, we still want the most generic range over all code paths to/from the hoist/sink point. As suggested in the patches related to D47339, I will refactor the handling of those scenarios and try to decouple it from this function as follow up, once we switched to a similar handling of metadata in most of combineMetadata. I updated some tests checking mostly the merging of metadata to keep the metadata of to dominating load. The most interesting one is probably test8 in test/Transforms/JumpThreading/thread-loads.ll. It contained a comment about the alias metadata preventing us to eliminate the branch, but it seem like the actual problem currently is that we merge the ranges of both loads and cannot eliminate the icmp afterwards. With this patch, we manage to eliminate the icmp, as the range of the first load excludes 8. Reviewers: efriedma, nlopes, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D51629 llvm-svn: 345456
*	Fix -Wdocumentation warning. NFCI.	Simon Pilgrim	2018-10-27	1	-4/+4
\| \| \| \|	llvm-svn: 345454
*	Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer ↵	Leonard Chan	2018-10-26	2	-119/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	passes with -fsanitize=address" This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times by initializing the AddressSanitizer on every function run. The corresponding revisions are https://reviews.llvm.org/D52814 and https://reviews.llvm.org/D52739. llvm-svn: 345433
*	Pointer types were treated as zero-size by MergeICmps	Christy Lee	2018-10-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size. Found with "self-build" of clang/LLVM on Windows. Reviewers: christylee, trentxintong, courbet Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53536 llvm-svn: 345413
*	[SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics	Max Kazantsev	2018-10-26	1	-2/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial simple loop unswitching. These intrinsics represent implicit control flow which has pretty much the same semantics as usual conditional branches. The algorithm of dealing with them is following: - Consider guards as unswitching candidates; - If a guard is considered the best candidate, turn it into a branch; - Apply normal unswitching algorithm on this branch. The patch has no compile time effect on code that does not contain any guards. Differential Revision: https://reviews.llvm.org/D53744 Reviewed By: chandlerc llvm-svn: 345387
*	[SimpleLoopUnswitch] Make all checks before actual non-trivial unswitch	Max Kazantsev	2018-10-26	1	-18/+20
\| \| \| \| \| \| \| \| \| \| \|	We should be able to make all relevant checks before we actually start the non-trivial unswitching, so that we could guarantee that once we have started the transform, it will always succeed. Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D53747 llvm-svn: 345375
*	[FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes	Cameron McInally	2018-10-25	1	-2/+3
\| \| \| \| \| \| \| \| \|	Replacing BinaryOperator::isFNeg(...) to avoid regressions when we separate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53650 llvm-svn: 345295
*	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.	Carlos Alberto Enciso	2018-10-25	2	-18/+45
\| \| \| \| \| \| \| \|	When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D53287 llvm-svn: 345250
*	Add -instcombine-code-sinking option	Gabor Buella	2018-10-25	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	Reviewers: craig.topper, andrew.w.kaylor, efriedma Reviewed By: craig.topper, andrew.w.kaylor, efriedma Differential Revision: https://reviews.llvm.org/D52709 llvm-svn: 345248
*	Update MemorySSA in LoopRotate.	Alina Sbirlea	2018-10-24	2	-13/+75
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Teach LoopRotate to preserve MemorySSA. Enable tests for correctness, dependency disabled by default. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D51718 llvm-svn: 345216
*	[HotColdSplitting] Identify larger cold regions using domtree queries	Vedant Kumar	2018-10-24	2	-201/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks outside of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209
*	[hot-cold-split] Name split functions with ".cold" suffix	Teresa Johnson	2018-10-24	2	-12/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current default of appending "_"+entry block label to the new extracted cold function breaks demangling. Change the deliminator from "_" to "." to enable demangling. Because the header block label will be empty for release compile code, use "extracted" after the "." when the label is empty. Additionally, add a mechanism for the client to pass in an alternate suffix applied after the ".", and have the hot cold split pass use "cold."+Count, where the Count is currently 1 but can be used to uniquely number multiple cold functions split out from the same function with D53588. Reviewers: sebpop, hiraditya Subscribers: llvm-commits, erik.pilkington Differential Revision: https://reviews.llvm.org/D53534 llvm-svn: 345178
*	[InstCombine] try harder to form select from logic ops (2nd try)	Sanjay Patel	2018-10-24	2	-30/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original patch was committed here: rL344609 ...and reverted: rL344612 ...because it did not properly check/test data types before calling ComputeNumSignBits(). The tests that caused bot failures for the previous commit are over-reaching front-end tests that run the entire -O optimizer pipeline: Clang :: CodeGen/builtins-systemz-zvector.c Clang :: CodeGen/builtins-systemz-zvector2.c I've added a negative test here to ensure coverage for that case. The new early exit check also tests the type of the 'B' parameter, so we don't waste time on matching if either value is unsuitable. Original commit message: This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 345149
*	[FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...)	Cameron McInally	2018-10-24	1	-10/+7
\| \| \| \| \| \| \| \|	This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53205 llvm-svn: 345146
*	Revert r345114	Gil Rapaport	2018-10-24	1	-40/+12
\| \| \| \| \| \|	Investigating fails. llvm-svn: 345123
*	[LV] Don't have fold-tail under optsize invalidate interleave-groups when	Dorit Nuzman	2018-10-24	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115
*	[LSR] Combine unfolded offset into invariant register	Gil Rapaport	2018-10-24	1	-12/+40
\| \| \| \| \| \| \| \| \| \| \| \|	LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 345114
*	[PM] keeping history when original SCC split and then merge into itself	Wei Mi	2018-10-23	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the same round of SCC update. In https://reviews.llvm.org/rL309784, inline history is added to prevent infinite inlining across multiple run of inliner and SCC update, but the history will only be kept when new SCC is actually generated during SCC update. We found a case that SCC can be split and then merge into itself in the same round of SCC update, so the same SCC will be pop out from UR.CWorklist and then added back immediately, without any new SCC generated, that is why the existing patch cannot catch the infinite inline case. What the patch does is even if no new SCC is generated, if only the current SCC appears in UR.CWorklist again, then keep the inline history. Differential Revision: https://reviews.llvm.org/D52915 llvm-svn: 345103
*	[HotColdSplitting] Attach MinSize to outlined code	Vedant Kumar	2018-10-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Outlined code is cold by assumption, so it makes sense to optimize it for minimal code size rather than performance. After r344869 moved the splitting pass to the end of the IR pipeline, this does not result in much of a code size reduction. This is probably because a comparatively small number backend transforms make use of the MinSize hint. Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of 919 bytes of TEXT reduction. I didn't measure a significant performance impact. Differential Revision: https://reviews.llvm.org/D53518 llvm-svn: 345072
*	[InstCombine] use 'match' to simplify code	Sanjay Patel	2018-10-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	There's probably some vector-with-undef-element pattern that shows an improvement, so this is probably not quite 'NFC'. This is the last step towards removing the fake binop queries for not/neg. Ie, there are no more uses of those functions in trunk. Fneg should follow. llvm-svn: 345050
*	[DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to ↵	Jordan Rupprecht	2018-10-23	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	booleans. Summary: TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass. Fixes PR37959. The test case here is the simplified .ll for: ``` static int foo; int bar() { foo = 5; return foo; } ``` Reviewers: dblaikie, gbedwell, aprantl Reviewed By: dblaikie Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D53531 llvm-svn: 345046
*	[Reassociate] replace fake binop queries with 'match' API	Sanjay Patel	2018-10-23	1	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to update this code before introducing an 'fneg' instruction in IR, so we might as well kill off the integer neg/not queries too. This is no-functional-change-intended for scalar code and most vector code. For vectors, we can see that the 'match' API allows for undef elements in constants, so we optimize those cases better. Ideally, there would be a test for each code diff, but I don't see evidence of that for the existing code, so I didn't try very hard to come up with new vector tests for each code change. Differential Revision: https://reviews.llvm.org/D53533 llvm-svn: 345042
*	[SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions	Simon Pilgrim	2018-10-23	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Expand arithmetic reduction to include mul/and/or/xor instructions. This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests. This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason. Differential Revision: https://reviews.llvm.org/D53473 llvm-svn: 345037
*	[InstCombine] use 'match' to handle vectors and simplify code	Sanjay Patel	2018-10-23	1	-2/+3
\| \| \| \| \| \| \|	This is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345036