summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [InstCombine] not(sub X, Y) --> add (not X), YSanjay Patel2018-07-272-10/+8
| | | | | | | | | | | The tests with constants show a missing optimization. Analysis for adds is better than subs, so this can also help with other transforms. And codegen is better with adds for targets like x86 (destructive ops, no sub-from). https://rise4fun.com/Alive/llK llvm-svn: 338118
* [InstCombine] add tests for not+sub; NFCSanjay Patel2018-07-271-0/+110
| | | | llvm-svn: 338117
* [SimplifyIndVar] Canonicalize comparisons to unsigned while eliminating truncsMax Kazantsev2018-07-271-0/+31
| | | | | | | | | | | This is a follow-up for the patch rL335020. When we replace compares against trunc with compares against wide IV, we can also replace signed predicates with unsigned where it is legal. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D48763 llvm-svn: 338115
* Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"Anastasis Grammenos2018-07-271-33/+0
| | | | | | This reverts commit r338106. llvm-svn: 338109
* [InstSimplify] tests for D48828: fold extraction from std::pairHiroshi Inoue2018-07-273-0/+305
| | | | | | | This commit includes unit tests for D48828, which enhances InstSimplify to enable jump threading with a method whose return type is std::pair<int, bool> or std::pair<bool, int>. I am going to commit the actual transformation later. llvm-svn: 338107
* [LV][DebugInfo] Set DL to the middle block Icmp instructionAnastasis Grammenos2018-07-271-0/+33
| | | | | | | | Reviewers: hsaito Differential Revision: https://reviews.llvm.org/D49746 llvm-svn: 338106
* [InstCombine] canonicalize abs patternChen Zheng2018-07-273-57/+121
| | | | | | Differential Revision: https://reviews.llvm.org/D48754 llvm-svn: 338092
* [InstrProf] Use comdats on COFF for available_externally functionsReid Kleckner2018-07-261-13/+12
| | | | | | | | | | | | | | | | | | | Summary: r262157 added ELF-specific logic to put a comdat on the __profc_* globals created for available_externally functions. We should be able to generalize that logic to all object file formats that support comdats, i.e. everything other than MachO. This fixes duplicate symbol errors, since on COFF, linkonce_odr doesn't make the symbol weak. Fixes PR38251. Reviewers: davidxl, xur Subscribers: hiraditya, dmajor, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D49882 llvm-svn: 338082
* [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst usersVedant Kumar2018-07-261-0/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LowerDbgDeclare inserts a dbg.value before each use of an address described by a dbg.declare. When inserting a dbg.value before a CallInst use, however, it fails to append DW_OP_deref to the DIExpression. The DW_OP_deref is needed to reflect the fact that a dbg.value describes a source variable directly (as opposed to a dbg.declare, which relies on pointer indirection). This patch adds in the DW_OP_deref where needed. This results in the correct values being shown during a debug session for a program compiled with ASan and optimizations (see https://reviews.llvm.org/D49520). Note that ConvertDebugDeclareToDebugValue is already correct -- no changes there were needed. One complication is that SelectionDAG is unable to distinguish between direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also fixes this long-standing issue in order to not regress integration tests relying on the incorrect assumption that all frame-index SDDbgValues are indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot be lowered properly otherwise. Basically the fix prevents a direct SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice by a debugger. There were a handful of tests relying on this incorrect "FRAMEIX => indirect" assumption which actually had incorrect DW_AT_locations: these are all fixed up in this patch. Testing: - check-llvm, and an end-to-end test using lldb to debug an optimized program. - Existing unit tests for DIExpression::appendToStack fully cover the new DIExpression::append utility. - check-debuginfo (the debug info integration tests) Differential Revision: https://reviews.llvm.org/D49454 llvm-svn: 338069
* [InstCombine] fold udiv with common factor from muls with nuwSanjay Patel2018-07-261-12/+4
| | | | | | | | | Unfortunately, sdiv isn't as simple because of UB due to overflow. This fold is mentioned in PR38239: https://bugs.llvm.org/show_bug.cgi?id=38239 llvm-svn: 338059
* [InstCombine] add tests for udiv with common factor; NFCSanjay Patel2018-07-261-0/+82
| | | | | | | | | | | This fold is mentioned in PR38239: https://bugs.llvm.org/show_bug.cgi?id=38239 The general case probably belongs in -reassociate, but given that we do basic reassociation optimizations similar to this in instcombine already, we might as well be consistent within instcombine and handle this pattern? llvm-svn: 338038
* [ConstProp] Fix calls-math-finite.ll on FreeBSDFangrui Song2018-07-261-1/+1
| | | | | | | FreeBSD's log(3.0) is less precise than glibc and musl. Let's forgive its rounding error of more than half an ulp. llvm-svn: 338009
* [GlobalMerge] Handle llvm.compiler.used correctly.Eli Friedman2018-07-251-0/+29
| | | | | | | | Reuse the handling for llvm.used, and don't transform such globals. Fixes a failure on the asan buildbot caused by my previous commit. llvm-svn: 337973
* [LSV] Look through selects for consecutive addressesRoman Tereshin2018-07-251-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | In some cases LSV sees (load/store _ (select _ <pointer expression> <pointer expression>)) patterns in input IR, often due to sinking and other forms of CFG simplification, sometimes interspersed with bitcasts and all-constant-indices GEPs. With this patch`areConsecutivePointers` method would attempt to handle select instructions. This leads to an increased number of successful vectorizations. Technically, select instructions could appear in index arithmetic as well, however, we don't see those in our test suites / benchmarks. Also, there is a lot more freedom in IR shapes computing integral indices in general than in what's common in pointer computations, and it appears that it's quite unreliable to do anything short of making select instructions first class citizens of Scalar Evolution, which for the purposes of this patch is most definitely an overkill. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49428 llvm-svn: 337965
* [GlobalMerge] Allow merging globals with arbitrary alignment.Eli Friedman2018-07-254-6/+48
| | | | | | | | | Instead of depending on implicit padding from the structure layout code, use a packed struct and emit the padding explicitly. Differential Revision: https://reviews.llvm.org/D49710 llvm-svn: 337961
* Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp ↵Florian Hahn2018-07-251-2/+2
| | | | | | | | instructions. I suspect it is causing the clang-stage2-Rthinlto failures. llvm-svn: 337956
* [SCEV] Add [zs]ext{C,+,x} -> (D + [zs]ext{C-D,+,x})<nuw><nsw> transformRoman Tereshin2018-07-252-1/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw> similar to the equivalent transformation for zext's if the top level addition in (D + (C-D + x * n)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x * n), ensuring homogeneous behaviour of the transformation and better canonicalization of such AddRec's (indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for the same Y, but only 2^(2w - k) different expressions in the resulting `B3 + ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type) This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in r209568 relaxing the requirements the following way: 1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2 a sufficient number of times; 2. C1 doesn't have to be less than C2, instead of extracting the entire C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the second one that may cause wrapping within the extension operator, and move the first one that doesn't affect wrapping out of the extension operator, enabling further simplifications; 3. C1 and C2 don't have to be positive, splitting C1 like shown above produces a sum that is guaranteed to not wrap, signed or unsigned; 4. in AddExpr case there could be more than 2 terms, and in case of AddExpr the 2nd and following terms and in case of AddRecExpr the Step component don't have to be in the C2*X form or constant (respectively), they just need to have enough trailing zeros, which in turn could be guaranteed by means other than arithmetics, e.g. by a pointer alignment; 5. the extension operator doesn't have to be a sext, the same transformation works and profitable for zext's as well. Apparently, optimizations like SLPVectorizer currently fail to vectorize even rather trivial cases like the following: double bar(double *a, unsigned n) { double x = 0.0; double y = 0.0; for (unsigned i = 0; i < n; i += 2) { x += a[i]; y += a[i + 1]; } return x * y; } If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm` (!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"}) it produces scalar code with the loop not unrolled with the unsigned `n` and `i` (like shown above), but vectorized and unrolled loop with signed `n` and `i`. With the changes made in this commit the unsigned version will be vectorized (though not unrolled for unclear reasons). How it all works: Let say we have an AddExpr that looks like (C + x + y + ...), where C is a constant and x, y, ... are arbitrary SCEVs. Let's compute the minimum number of trailing zeroes guaranteed of that sum w/o the constant term: (x + y + ...). If, for example, those terms look like follows: i XXXX...X000 YYYY...YY00 ... ZZZZ...0000 then the rightmost non-guaranteed-zero bit (a potential one at i-th position above) can change the bits of the sum to the left (and at i-th position itself), but it can not possibly change the bits to the right. So we can compute the number of trailing zeroes by taking a minimum between the numbers of trailing zeroes of the terms. Now let's say that our original sum with the constant is effectively just C + X, where X = x + y + .... Let's also say that we've got 2 guaranteed trailing zeros for X: j CCCC...CCCC XXXX...XX00 // this is X = (x + y + ...) Any bit of C to the left of j may in the end cause the C + X sum to wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not affect wrapping in any way. If the upper bits cause a wrap, it will be a wrap regardless of the values of the 2 least significant bits of C. If the upper bits do not cause a wrap, it won't be a wrap regardless of the values of the 2 bits on the right (again). So let's split C to 2 constants like follows: 0000...00CC = D CCCC...CC00 = (C - D) and represent the whole sum as D + (C - D + X). The second term of this new sum looks like this: CCCC...CC00 XXXX...XX00 ----------- // let's add them up YYYY...YY00 The sum above (let's call it Y)) may or may not wrap, we don't know, so we need to keep it under a sext/zext. Adding D to that sum though will never wrap, signed or unsigned, if performed on the original bit width or the extended one, because all that that final add does is setting the 2 least significant bits of Y to the bits of D: YYYY...YY00 = Y 0000...00CC = D ----------- <nuw><nsw> YYYY...YYCC Which means we can safely move that D out of the sext or zext and claim that the top-level sum neither sign wraps nor unsigned wraps. Let's run an example, let's say we're working in i8's and the original expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes like this: 0001 0101 // 21 XXXX XX00 // 12x YYYY Y000 // 8y 0001 0101 // 21 ZZZZ ZZ00 // 12x + 8y 0000 0001 // D 0001 0100 // 21 - D = 20 ZZZZ ZZ00 // 12x + 8y 0000 0001 // D WWWW WW00 // 21 - D + 12x + 8y = 20 + 12x + 8y therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw> This approach could be improved if we move away from using trailing zeroes and use KnownBits instead. For instance, with KnownBits we could have the following picture: i 10 1110...0011 // this is C XX X1XX...XX00 // this is X = (x + y + ...) Notice that some of the bits of X are known ones, also notice that known bits of X are interspersed with unknown bits and not grouped on the rigth or left. We can see at the position i that C(i) and X(i) are both known ones, therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of the bits of C to the right of i. For instance, the C(i - 1) bit only affects the bits of the sum at positions i - 1 and i, and does not influence if the sum is going to wrap or not. Therefore we could split the constant C the following way: i 00 0010...0011 = D 10 1100...0000 = (C - D) Let's compute the KnownBits of (C - D) + X: XX1 1 = carry bit, blanks stand for known zeroes 10 1100...0000 = (C - D) XX X1XX...XX00 = X --- ----------- XX X0XX...XX00 Will this add wrap or not essentially depends on bits of X. Adding D to this sum, however, is guaranteed to not to wrap: 0 X 00 0010...0011 = D sX X0XX...XX00 = (C - D) + X --- ----------- sX XXXX XX11 As could be seen above, adding D preserves the sign bit of (C - D) + X, if any, and has a guaranteed 0 carry out, as expected. The more bits of (C - D) we constrain, the better the transformations introduced here canonicalize expressions as it leaves less freedom to what values the constant part of ((C - D) + x + y + ...) can take. Reviewed By: mzolotukhin, efriedma Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337943
* Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp ↵Florian Hahn2018-07-251-2/+2
| | | | | | | | | | | | | | | | | | | | instructions. r337828 resolves a PredicateInfo issue with unnamed types. Original message: This patch updates IPSCCP to use PredicateInfo to propagate facts to true branches predicated by EQ and to false branches predicated by NE. As a follow up, we should be able to extend it to also propagate additional facts about nonnull. Reviewers: davide, mssimpso, dberlin, efriedma Reviewed By: davide, dberlin llvm-svn: 337904
* [LV] Fix for PR38110, LV encountered llvm_unreachable() Hideki Saito2018-07-241-0/+50
| | | | | | | | | | | | | | Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861
* [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transformRoman Tereshin2018-07-241-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | if the top level addition in (D + (C-D + x + ...)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the transformation and better canonicalization of such expressions. This enables better canonicalization of expressions like 1 + zext(5 + 20 * %x + 24 * %y) and zext(6 + 20 * %x + 24 * %y) which get both transformed to 2 + zext(4 + 20 * %x + 24 * %y) This pattern is common in address arithmetics and the transformation makes it easier for passes like LoadStoreVectorizer to prove that 2 or more memory accesses are consecutive and optimize (vectorize) them. Reviewed By: mzolotukhin Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337859
* [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.Florian Hahn2018-07-246-67/+103
| | | | | | | | | | | | | | | | | | This is a workaround and it would be better to fix this generally, but doing it generally is quite tricky. See D48541 and PR38117. Doing it in PredicateInfo directly allows us to use the type address to differentiate different unnamed types, because neither the created declarations nor the ssa_copy calls should be visible after PredicateInfo got destroyed. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49126 llvm-svn: 337828
* ConstantFolding: Avoid a crash.Manoj Gupta2018-07-231-0/+19
| | | | | | | | | | | | | | | | | | | | | Summary: Check if the parent basic block and caller exists before calling CS.getCaller when constant folding strip.invariant.group instrinsic. This avoids a crash when the function containing the intrinsic is being inlined. The instruction is checked for any simplifiction but has not yet been added to a basic block. Reviewers: Prazek, rsmith, efriedma Reviewed By: efriedma Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D49690 llvm-svn: 337742
* [GVN] Don't use the eliminated load as an available value in phi constructionJohn Brawn2018-07-232-0/+121
| | | | | | | | | | | In ConstructSSAForLoadSet if an available value is actually the load that we're doing SSA construction to eliminate, then we can omit it as SSAUpdate will add in the value for the phi that will be replacing it anyway. This can result in simpler IR which can allow further optimisation. Differential Revision: https://reviews.llvm.org/D44160 llvm-svn: 337686
* [MemorySSAUpdater] Update Phi operands after trivial Phi eliminationAlexandros Lamprineas2018-07-231-0/+119
| | | | | | | | | | | | Bug fix for PR37445. The underlying problem and its fix are similar to PR37808. The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is computed before the call to tryRemoveTrivialPhi() and it ends up being out of date, pointing to stale data. We have now turned each of the PhiOps into a TrackingVH<MemoryAccess>. Differential Revision: https://reviews.llvm.org/D49425 llvm-svn: 337680
* [GVNHoist] safeToHoistLdSt allows illegal hoistingAlexandros Lamprineas2018-07-231-0/+76
| | | | | | | | | | | | | Bug fix for PR36787. When reasoning if it's safe to hoist a load we want to make sure that the defining memory access dominates the new insertion point of the hoisted instruction. safeToHoistLdSt calls firstInBB(InsertionPoint,DefiningAccess) which returns false if InsertionPoint == DefiningAccess, and therefore it falsely thinks it's safe to hoist. Differential Revision: https://reviews.llvm.org/D49555 llvm-svn: 337674
* [InstrSimplify] fold sdiv if two operands are negated and non-overflowChen Zheng2018-07-211-24/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D49382 llvm-svn: 337642
* Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"Roman Tereshin2018-07-201-2/+51
| | | | | | | | This reapplies commit r337489 reverted by r337541 Additionally, this commit contains a speculative fix to the issue reported in r337541 (the report does not contain an actionable reproducer, just a stack trace) llvm-svn: 337606
* [NFC][testcases] fold sdiv if two operands are negated and non-overflowChen Zheng2018-07-201-0/+147
| | | | llvm-svn: 337549
* Recommit r328307: [IPSCCP] Use constant range information for comparisons of ↵Florian Hahn2018-07-201-13/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 337548
* [InstSimplify] fold srem instruction if its two operands are negated.Chen Zheng2018-07-201-25/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D49423 llvm-svn: 337545
* [NFC][testcases] more testcases for folding srem if its two operands are ↵Chen Zheng2018-07-201-0/+34
| | | | | | negatived. llvm-svn: 337543
* Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"Sam McCall2018-07-201-19/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r337489. It causes asserts to fire in some TensorFlow tests, e.g. tensorflow/compiler/tests/gather_test.py on GPU. Example stack trace: Start test case: GatherTest.testHigherRank assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits" @ 0x5559446ebe10 __assert_fail @ 0x55593ef32f5e llvm::APInt::trunc() @ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses() @ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers() @ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess() @ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions() @ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains() @ 0x55593d78b948 (anonymous namespace)::Vectorizer::run() @ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction() @ 0x55593edf4b17 llvm::FPPassManager::runOnFunction() @ 0x55593edf4e55 llvm::FPPassManager::runOnModule() @ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule() @ 0x55593edf5137 llvm::legacy::PassManagerImpl::run() @ 0x55593edf5b71 llvm::legacy::PassManager::run() @ 0x55593ced250d xla::gpu::IrDumpingPassManager::run() @ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX() @ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx() @ 0x55593ced33d0 xla::gpu::CompileToPtx() @ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend() @ 0x55593b21f973 xla::Service::BuildExecutable() @ 0x555938f44e64 xla::LocalService::CompileExecutable() @ 0x555938f30a85 xla::LocalClient::Compile() @ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable() @ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl() @ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile() @ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute() @ 0x555938c68151 tensorflow::XlaDevice::Compute() @ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process() @ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()() *** SIGABRT received by PID 7798 (TID 7837) from PID 7798; *** llvm-svn: 337541
* [SCCP] Don't use markForcedConstant on branch conditions.Eli Friedman2018-07-193-1/+89
| | | | | | | | | | | | It's more aggressive than we need to be, and leads to strange workarounds in other places like call return value inference. Instead, just directly mark an edge viable. Tests by Florian Hahn. Differential Revision: https://reviews.llvm.org/D49408 llvm-svn: 337507
* [LSV] Refactoring + supporting bitcasts to a type of different sizeRoman Tereshin2018-07-191-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a preparation work for adding a limited support for select instructions. It proved to be difficult to do due to size and irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I believe. It also turned out that these changes make it simpler to finish one of the TODOs and fix a number of other small issues, namely: 1. Looking through bitcasts to a type of a different size (requires careful tracking of the original load/store size and some math converting sizes in bytes to expected differences in indices of GEPs). 2. Reusing partial analysis of pointers done by first attempt in proving them consecutive instead of starting from scratch. This added limited support for nested GEPs co-existing with difficult sext/zext instructions. This also required a careful handling of negative differences between constant parts of offsets. 3. Handing a case where the first pointer index is not an add, but something else (a function parameter for instance). I observe an increased number of successful vectorizations on a large set of shader programs. Only few shaders are affected, but those that are affected sport >5% less loads and stores than before the patch. Reviewed By: rampitec Differential-Revision: https://reviews.llvm.org/D49342 llvm-svn: 337489
* [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two ↵Farhana Aleen2018-07-191-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | pointers. Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or both of them are not factorized. But isConsecutiveAccess() fails if one of the pointers is factorized but the other one is not. Here is an example: PtrA = 4 * (A + B) PtrB = 4 + 4A + 4B This patch uses getMinusSCEV() to compute the distance between two pointers. getMinusSCEV() allows combining the expressions and computing the simplified distance. Author: FarhanaAleen Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49516 llvm-svn: 337471
* [InstCombine] Re-commit: Fold 'check for [no] signed truncation' patternRoman Lebedev2018-07-182-28/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` This is a re-commit, as the original patch, committed in rL337190 was reverted in rL337344 as it broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337376
* [NFC][InstCombine] i65 tests for 'check for [no] signed truncation' patternRoman Lebedev2018-07-182-0/+28
| | | | | | | | Those initially broke chromium build: https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337364
* Revert test changes part of "Revert "[InstCombine] Fold 'check for [no] ↵Roman Lebedev2018-07-182-108/+108
| | | | | | | | | | | signed truncation' pattern"" We want the test to remain good anyway. I think the fix is incoming. This reverts part of commit rL337344. llvm-svn: 337359
* Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"Bob Haarman2018-07-182-110/+116
| | | | | | | | | This reverts r337190 (and a few follow-up commits), which caused the Chromium build to fail. See https://bugs.llvm.org/show_bug.cgi?id=38204 and https://crbug.com/864832 llvm-svn: 337344
* [InstCombine] Preserve debug value when simplifying cast-of-selectVedant Kumar2018-07-171-0/+11
| | | | | | | | | | | | | | | | | | | | InstCombine has a cast transform that matches a cast-of-select: Orig = cast (Src = select Cond TV FV) And tries to replace it with a select which has the cast folded in: NewSel = select Cond (cast TV) (cast FV) The combiner does RAUW(Orig, NewSel), so any debug values for Orig would survive the transform. But debug values for Src would be lost. This patch teaches InstCombine to replace all debug uses of Src with NewSel (taking care of doing any necessary DIExpression rewriting). Differential Revision: https://reviews.llvm.org/D49270 llvm-svn: 337310
* Remove an errant piece of !dbg metadata from a test, NFCVedant Kumar2018-07-171-1/+1
| | | | llvm-svn: 337309
* [IPSCCP] Run Solve each time we resolved an undef in a function.Florian Hahn2018-07-172-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we resolved an undef in a function we can run Solve, which could lead to finding a constant return value for the function, which in turn could turn undefs into constants in other functions that call it, before resolving undefs there. Computationally the amount of work we are doing stays the same, just the order we process things is slightly different and potentially there are a few less undefs to resolve. We are still relying on the order of functions in the IR, which means depending on the order, we are able to resolve the optimal undef first or not. For example, if @test1 comes before @testf, we find the constant return value of @testf too late and we cannot use it while solving @test1. This on its own does not lead to more constants removed in the test-suite, probably because currently we have to be very lucky to visit applicable functions in the right order. Maybe we manage to come up with a better way of resolving undefs in more 'profitable' functions first. Reviewers: efriedma, mssimpso, davide Reviewed By: efriedma, davide Differential Revision: https://reviews.llvm.org/D49385 llvm-svn: 337283
* [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)Simon Pilgrim2018-07-171-0/+128
| | | | | | TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only. llvm-svn: 337280
* [NFC][testcases] add testcases for folding srem whose operands are negatived.Chen Zheng2018-07-171-0/+49
| | | | | | | Finish same optimization for add instruction in D49216 and sdiv instruction in D49382. This patch is for srem instruction. llvm-svn: 337270
* [testcases] move testcases to right place - NFCChen Zheng2018-07-171-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D49409 llvm-svn: 337230
* [NFC][InstCombine] Fine-tune 'check for [no] signed truncation' testsRoman Lebedev2018-07-162-110/+110
| | | | | | | | | | | | We are using i8 for these tests, and shifting by 4, which is exactly the half of i8. But as it is seen from the proofs https://rise4fun.com/Alive/mgu KeptBits = bitwidth(%x) - MaskedBits, so with using shifts by 4, we are not really testing that we actually properly handle the other cases with shifts not by half... llvm-svn: 337208
* [InstCombine] Fold 'check for [no] signed truncation' patternRoman Lebedev2018-07-162-22/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. Proofs for this transform: https://rise4fun.com/Alive/mgu This transform is surprisingly frustrating. This does not deal with non-splat shift amounts, or with undef shift amounts. I've outlined what i think the solution should be: ``` // Potential handling of non-splats: for each element: // * if both are undef, replace with constant 0. // Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0. // * if both are not undef, and are different, bailout. // * else, only one is undef, then pick the non-undef one. ``` The DAGCombine will reverse this transform, see https://reviews.llvm.org/D49266 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: JDevlieghere, rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D49320 llvm-svn: 337190
* Restore "[ThinLTO] Ensure we always select the same function copy to import"Teresa Johnson2018-07-165-95/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r337081, therefore restoring r337050 (and fix in r337059), with test fix for bot failure described after the original description below. In order to always import the same copy of a linkonce function, even when encountering it with different thresholds (a higher one then a lower one), keep track of the summary we decided to import. This ensures that the backend only gets a single definition to import for each GUID, so that it doesn't need to choose one. Move the largest threshold the GUID was considered for import into the current module out of the ImportMap (which is part of a larger map maintained across the whole index), and into a new map just maintained for the current module we are computing imports for. This saves some memory since we no longer have the thresholds maintained across the whole index (and throughout the in-process backends when doing a normal non-distributed ThinLTO build), at the cost of some additional information being maintained for each invocation of ComputeImportForModule (the selected summary pointer for each import). There is an additional map lookup for each callee being considered for importing, however, this was able to subsume a map lookup in the Worklist iteration that invokes computeImportForFunction. We also are able to avoid calling selectCallee if we already failed to import at the same or higher threshold. I compared the run time and peak memory for the SPEC2006 471.omnetpp benchmark (running in-process ThinLTO backends), as well as for a large internal benchmark with a distributed ThinLTO build (so just looking at the thin link time/memory). Across a number of runs with and without this change there was no significant change in the time and memory. (I tried a few other variations of the change but they also didn't improve time or peak memory). The new commit removes a test that no longer makes sense (Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the reverse-iteration bot. The test depends on the order of processing the summary call edges, and actually depended on the old problematic behavior of selecting more than one summary for a given GUID when encountered with different thresholds. There was no guarantee even before that we would eventually pick the linkonce copy with the hottest call edges, it just happened to work with the test and the old code, and there was no guarantee that we would end up importing the selected version of the copy that had the hottest call edges (since the backend would effectively import only one of the selected copies). Reviewers: davidxl Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D48670 llvm-svn: 337184
* [InstrSimplify] add testcases for fold sdiv if two operands are negatived ↵Chen Zheng2018-07-161-0/+49
| | | | | | | | and non-overflow Differential Revision: https://reviews.llvm.org/D49365 llvm-svn: 337179
* [MemorySSAUpdater] Remove deleted trivial Phis from active worksetAlexandros Lamprineas2018-07-161-0/+40
| | | | | | | | | | | | | Bug fix for PR37808. The regression test is a reduced version of the original reproducer attached to the bug report. As stated in the report, the problem was that InsertedPHIs was keeping dangling pointers to deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get zapped shortly afterwards. I've used WeakVH instead of an expensive removal operation from the active workset. Differential Revision: https://reviews.llvm.org/D48372 llvm-svn: 337149
OpenPOWER on IntegriCloud