summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Migrate SimplifyLibCalls to new OptimizationRemarkEmitterAdam Nemet2017-07-261-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter API. In fact, as SimplifyLibCalls is only ever called via InstCombine, (as far as I can tell) the OptimizationRemarkEmitter is added there, and then passed through to SimplifyLibCalls later. I have avoided changing any remark text. This closes PR33787 Patch by Sam Elliott! Reviewers: anemet, davide Reviewed By: anemet Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D35608 llvm-svn: 309158
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.Michael Zuckerman2017-07-261-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch expands the support of lowerInterleavedStore to 32x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low). The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 32 chars: c0, c1, , c31 m0, m1, , m31 y0, y1, , y31 k0, k1, ., k31 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers: dorit Farhana RKSimon guyblank DavidKreitzer Differential Revision: https://reviews.llvm.org/D34601 llvm-svn: 309086
* Add "REQUIRES: asserts" for test unswitch-equality-undef.ll.Wei Mi2017-07-261-0/+1
| | | | llvm-svn: 309073
* Disable loop unswitching for some patterns containing equality comparison ↵Wei Mi2017-07-251-0/+121
| | | | | | | | | | | | | | | | | | | with undef. This is a workaround for the bug described in PR31652 and http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for some simple patterns we can know the equality comparison may contains undef, so we regard such comparison as unsafe and will not do loop-unswitching for them. We also need to disable the select simplification when one of select operand is undef and its result feeds into equality comparison. The patch cannot clear the safety issue caused by the bug, but it can suppress the issue from happening to some extent. Differential Revision: https://reviews.llvm.org/D35811 llvm-svn: 309059
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-1021/+53
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [LIR] Teach LIR to avoid extending the BE count prior to adding one toChandler Carruth2017-07-251-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | it when safe. Very often the BE count is the trip count minus one, and the plus one here should fold with that minus one. But because the BE count might in theory be UINT_MAX or some such, adding one before we extend could in some cases wrap to zero and break when we scale things. This patch checks to see if it would be safe to add one because the specific case that would cause this is guarded for prior to entering the preheader. This should handle essentially all of the common loop idioms coming out of C/C++ code once canonicalized by LLVM. Before this patch, both forms of loop in the added test cases ended up subtracting one from the size, extending it, scaling it up by 8 and then adding 8 back onto it. This is really silly, and it turns out made it all the way into generated code very often, so this is a surprisingly important cleanup to do. Many thanks to Sanjoy for showing me how to do this with SCEV. Differential Revision: https://reviews.llvm.org/D35758 llvm-svn: 308968
* [tests] Cleanup vect.omp.persistence.ll test.Michael Zolotukhin2017-07-252-67/+50
| | | | llvm-svn: 308964
* [ARM] Enable partial and runtime unrollingSam Parker2017-07-252-0/+213
| | | | | | | | | | Enable runtime and partial loop unrolling of simple loops without calls on M-class cores. The thresholds are calculated based on whether the target is Thumb or Thumb-2. Differential Revision: https://reviews.llvm.org/D34619 llvm-svn: 308956
* Adding base test for interleave store VF16 and expand the test for AVX512 Michael Zuckerman2017-07-241-0/+15
| | | | | | This patch doesn't modifay any non test file. llvm-svn: 308909
* X86InterleaveAccess: A fix for bug33826Farhana Aleen2017-07-211-0/+17
| | | | | | | | Reviewers: DavidKreitzer Differential Revision: https://reviews.llvm.org/D35638 llvm-svn: 308784
* ThinLTO Minimized Bitcode File Size ReductionHaojie Wang2017-07-212-19/+0
| | | | | | | | | | | | | | Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size. Reviewers: danielcdh, tejohnson, pcc Reviewed By: pcc Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D35334 llvm-svn: 308760
* [PGO] Move the PGOInstrumentation pass to new OptRemark API.Davide Italiano2017-07-201-2/+2
| | | | | | This fixes PR33791. llvm-svn: 308668
* LowerTypeTests: Drop function type metadata only if we're going to replace it.Peter Collingbourne2017-07-201-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Previously we were (mis)handling jump table members with a prevailing definition in a full LTO module and a non-prevailing definition in a ThinLTO module by dropping type metadata on those functions entirely, which would cause type tests involving such functions to fail. This patch causes us to drop metadata only if we are about to replace it with metadata from cfi.functions. We also want to replace metadata for available_externally functions, which can arise in the opposite scenario (prevailing ThinLTO definition, non-prevailing full LTO definition). The simplest way to handle that is to remove the definition; there's little value in keeping it around at this point (i.e. after most optimization passes have already run) and later code will try to use the function's linkage to create an alias, which would result in invalid IR if the function is available_externally. Fixes PR33832. Differential Revision: https://reviews.llvm.org/D35604 llvm-svn: 308642
* [TRE] Add another test for OptRemark.Davide Italiano2017-07-191-0/+39
| | | | | | This shows we emit a remark for tail recursion -> loop. llvm-svn: 308525
* [TRE] Move to the new OptRemark API.Davide Italiano2017-07-191-0/+25
| | | | | | | | Fixes PR33788. Differential Revision: https://reviews.llvm.org/D35570 llvm-svn: 308524
* ThinLTOBitcodeWriter: Do not rewrite intrinsic functions when splitting modules.Peter Collingbourne2017-07-191-0/+4
| | | | | | | | Changing the type of an intrinsic may invalidate the IR. Differential Revision: https://reviews.llvm.org/D35593 llvm-svn: 308500
* [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it ↵Balaram Makam2017-07-1911-16/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 llvm-svn: 308422
* [LV] Test once if vector trip count is zero, instead of twiceAyal Zaks2017-07-1913-46/+42
| | | | | | | | | | | | | | | | | | | | | | | | | Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 llvm-svn: 308421
* [CGP] Allow cycles during Phi traversal in OptimizaMemoryInstSerguei Katkov2017-07-191-1/+34
| | | | | | | | | | | | | | Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 llvm-svn: 308419
* Fix DebugLoc propagation for unreachable LoadInstWeiming Zhao2017-07-192-2/+81
| | | | | | | | | | | | | | Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost. Reviewers: dberlin, davide, aprantl Reviewed By: aprantl Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34639 llvm-svn: 308404
* [x86, CGP] increase memcmp() expansion up to 4 load pairsSimon Pilgrim2017-07-181-88/+1547
| | | | | | | | | | | | | | | | | | | | | It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322
* PSCEV] Create AddRec for Phis in cases of possible integer overflow,Dorit Nuzman2017-07-181-0/+240
| | | | | | | | | | | | | using runtime checks Extend the SCEVPredicateRewriter to work a bit harder when it encounters an UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes whose update chain involves casts that can be ignored under the proper runtime overflow test. This is one step towards addressing PR30654. Differential revision: http://reviews.llvm.org/D30041 llvm-svn: 308299
* [LoopInterchange] Split up interchange.ll test case (NFC).Florian Hahn2017-07-1810-749/+795
| | | | | | | | | | | | | | | | | | Summary: Currently most tests for the loop interchange pass are in test/Transforms/LoopInterchange/interchange.ll. This patch splits up the large test file in smaller pieces, which makes debugging test failures easier. Reviewers: karthikthecool, blitz.opensource, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, mcrosier, mkuper, mzolotukhin, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D35488 llvm-svn: 308284
* [IRCE] Recognize loops with ne/eq latch conditionsMax Kazantsev2017-07-181-0/+257
| | | | | | | | | | In some particular cases eq/ne conditions can be turned into equivalent slt/sgt conditions. This patch teaches parseLoopStructure to handle some of these cases. Differential Revision: https://reviews.llvm.org/D35010 llvm-svn: 308264
* Add element-atomic mem intrinsic canary tests for InstCombine.Daniel Neilson2017-07-181-0/+98
| | | | | | | | | | | | | | | | | Summary: Add canary tests to verify that InstCombine currently does nothing with the element atomic memory intrinsics for memmove and memset. Placeholder tests that will fail once element atomic @llvm.mem[move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that inst combine handles these intrinsics properly once they have been added to that class hierarchy. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35502 llvm-svn: 308247
* Revert "Restore with fix "[ThinLTO] Ensure we always select the same ↵Teresa Johnson2017-07-173-88/+0
| | | | | | | | | | | | | | | function copy to import"" This reverts commit r308114 (and follow on fixes to test). There is a linking failure in a ThinLTO bot: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/ (and undefined reference). It seems like it must be a second order effect of the heuristic change I made, and may take some time to try to reproduce locally and track down. Therefore, reverting for now. llvm-svn: 308206
* [InstCombine] Don't violate dominance when replacing instructions.Davide Italiano2017-07-162-1/+55
| | | | | | Differential Revision: https://reviews.llvm.org/D35376 llvm-svn: 308144
* Fix bot failures from r308114Teresa Johnson2017-07-161-10/+10
| | | | | | | | | | | | Finally figured out that some bots were failing from r308114 with the message: llvm-lto2: LTO::run failed: No available targets are compatible with this triple. after adding in some other checking that finally caused this to show up in the FileCheck output. Added "REQUIRES: x86-registered-target" which should fix it. llvm-svn: 308119
* Attempt 2 to debug bot failuresTeresa Johnson2017-07-161-8/+10
| | | | | | | Modify checks from r308114 even more, to see if I can narrow down why some bots are still failing. llvm-svn: 308116
* Attempt to debug bot failuresTeresa Johnson2017-07-151-8/+8
| | | | | | | Simplifying checks from r308114, to see if I can narrow down why some bots are still failing. llvm-svn: 308115
* Restore with fix "[ThinLTO] Ensure we always select the same function copy ↵Teresa Johnson2017-07-153-0/+86
| | | | | | | | | | to import" This restores r308078/r308079 with a fix for bot non-determinisim (make sure we run llvm-lto in single threaded mode so the debug output doesn't get interleaved). llvm-svn: 308114
* [InstCombine] Improve the expansion in SimplifyUsingDistributiveLaws to ↵Craig Topper2017-07-152-70/+38
| | | | | | | | | | | | | | | | | | | handle cases where one side doesn't simplify, but the other side resolves to an identity value Summary: If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified. I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed. Reviewers: spatel, majnemer, hfinkel, dberlin Reviewed By: spatel Subscribers: grandinj, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D35451 llvm-svn: 308111
* [InstCombine] improve (1 << x) & 1 --> zext(x == 0) foldingSanjay Patel2017-07-151-8/+18
| | | | | | | 1. Add a one-use check to prevent increasing instruction count. 2. Generalize the pattern matching to include vector types. llvm-svn: 308105
* [InstCombine] Add test cases for (X & (Y | ~X)) -> (X & Y) where the not is ↵Craig Topper2017-07-152-0/+319
| | | | | | | | an inverted compare. NFC Do the same for (X | (Y & ~X)) -> (X | Y) llvm-svn: 308104
* [InstCombine] Move 4 test cases from a test that didn't use FileCheck and ↵Craig Topper2017-07-152-34/+48
| | | | | | merge them into a existing test file. NFC llvm-svn: 308103
* [InstCombine] add tests for (1 << x) & 1 --> zext(x == 0) ; NFCSanjay Patel2017-07-151-0/+72
| | | | | | | | | This fold hit the trifecta: 1. It was untested. 2. It oversteps (multiuse is not checked, so increases instruction count). 3. It is incomplete (doesn't work for vectors). llvm-svn: 308102
* [InstCombine] allow (0 - x) & 1 --> x & 1 for vectorsSanjay Patel2017-07-151-2/+1
| | | | llvm-svn: 308098
* [InstCombine] remove dead code/tests; NFCISanjay Patel2017-07-151-54/+0
| | | | | | | These patterns and tests were added to InstSimplify with: https://reviews.llvm.org/rL303004 llvm-svn: 308096
* Revert r308078 (and subsequent tweak in r308079) which introduces a testChandler Carruth2017-07-153-86/+0
| | | | | | | | | that appears to exhibit non-determinism and is flaking on the bots pretty consistently. r308078: [ThinLTO] Ensure we always select the same function copy to import r308079: Require asserts in new test that uses debug flag llvm-svn: 308095
* [LoopInterchange] Add some optimization remarks.Florian Hahn2017-07-151-0/+220
| | | | | | | | | | | | Reviewers: anemet, karthikthecool, blitz.opensource Reviewed By: anemet Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35122 llvm-svn: 308094
* Require asserts in new test that uses debug flagTeresa Johnson2017-07-151-0/+3
| | | | | | This should fix bot failures from r308078. llvm-svn: 308079
* [ThinLTO] Ensure we always select the same function copy to importTeresa Johnson2017-07-153-0/+83
| | | | | | | | | | | | | | | | | | | Summary: Check if the first eligible callee is under the instruction threshold. Checking this on the first eligible callee ensures that we don't end up selecting different callees to import when we invoke this routine with different thresholds due to reaching the callee via paths that are shallower or hotter (when there are multiple copies, i.e. with weak or linkonce linkage). We don't want to leave the decision of which copy to import up to the backend. Reviewers: mehdi_amini Subscribers: inglorion, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D35436 llvm-svn: 308078
* [TTI] Refine the cost of EXT in getUserCost()Haicheng Wu2017-07-154-0/+593
| | | | | | | | | | | Now, getUserCost() only checks the src and dst types of EXT to decide it is free or not. This change first checks the types, then calls isExtFreeImpl(), and check if EXT can form ExtLoad at last. Currently, only AArch64 has customized implementation of isExtFreeImpl() to check if EXT can be folded into its use. Differential Revision: https://reviews.llvm.org/D34458 llvm-svn: 308076
* [EarlyCSE] Handle calls with no MemorySSA info.Geoff Berry2017-07-141-0/+25
| | | | | | | | | | | | | | | | | | Summary: When checking for memory dependencies between calls using MemorySSA, handle cases where the calls have no MemoryAccess associated with them because the AA analysis being used has determined that the call does not read/write memory. Fixes PR33756 Reviewers: dberlin, davide Subscribers: mcrosier, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D35317 llvm-svn: 308051
* [JumpThreading] Add a pattern to TryToUnfoldSelectInCurrBB()Haicheng Wu2017-07-141-1/+76
| | | | | | | | | | | | | | | | Add the following pattern to TryToUnfoldSelectInCurrBB() bb: %p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ... %c = cmp %p, 0 %s = select %c, trueval, falseval The Select in the above pattern will be unfolded and then jump-threaded. The current implementation does not allow CMP in the middle of PHI and Select. Differential Revision: https://reviews.llvm.org/D34762 llvm-svn: 308050
* [InstCombine] convert bitwise (in)equality checks to logical ops (PR32401)Sanjay Patel2017-07-141-8/+6
| | | | | | | | | | | | | As discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 we have a backend transform to undo this: https://reviews.llvm.org/rL299542 when it's likely that the xor version leads to better codegen, but we want this form in IR for better analysis and simplification potential. llvm-svn: 308031
* [InstCombine] add tests for PR32401; NFCSanjay Patel2017-07-141-0/+36
| | | | | | Also, add comments to a couple of tests that could be moved out of instcombine. llvm-svn: 308029
* [InstCombine] auto-generate complete test checks; NFCSanjay Patel2017-07-141-76/+55
| | | | llvm-svn: 308027
* [IRCE] Fix corner case with Start = INT_MAXMax Kazantsev2017-07-141-0/+117
| | | | | | | | | | | | | | | | | | | | When iterating through loop for (int i = INT_MAX; i > 0; i--) We fail to generate the pre-loop for it. It happens because we use the overflown value in a comparison predicate when identifying whether or not we need it. In old logic, we used SLE predicate against Greatest value which exceeds all seen values of the IV and might be overflown. Now we use the GreatestSeen value of this IV with SLT predicate. Also added a test that ensures that a pre-loop is generated for such loops. Differential Revision: https://reviews.llvm.org/D35347 llvm-svn: 308001
* [InstCombine] put tests for commuted variants of the same fold together; NFCSanjay Patel2017-07-131-44/+66
| | | | llvm-svn: 307951
OpenPOWER on IntegriCloud