summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/CodeGenPrepare
Commit message (Collapse)AuthorAgeFilesLines
...
* [CGP] Fix the crash caused by enable of complex addr modeSerguei Katkov2017-11-201-0/+35
| | | | | | | | | | | | | We must collect all AddModes even if they are the same. This is due to Original value is different but we need all original values collected as they are used as anchors in common phi finding. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40166 llvm-svn: 318638
* Let llvm.invariant.group.barrier accepts pointer to any address spaceYaxun Liu2017-11-161-3/+3
| | | | | | | | | | | llvm.invariant.group.barrier may accept pointers to arbitrary address space. This patch let it accept pointers to i8 in any address space and returns pointer to i8 in the same address space. Differential Revision: https://reviews.llvm.org/D39973 llvm-svn: 318413
* Remove stray comma in sink-addrmode testJohn Brawn2017-11-161-1/+1
| | | | | | | The extra comma meant it wasn't correctly checking that we weren't getting an extra getelementptr. llvm-svn: 318406
* Revert "[CodeGenPrepare] Check that erased sunken address are not reused"Simon Dardis2017-11-132-66/+0
| | | | | | This reverts commit r318032. The test broke some sanitizer bots. llvm-svn: 318049
* [CodeGenPrepare] Check that erased sunken address are not reusedSimon Dardis2017-11-132-0/+66
| | | | | | | | | | | | | | | | | | | | | | | CodeGenPrepare sinks address computations from one basic block to another and attempts to reuse address computations that have already been sunk. If the same address computation appears twice with the first instance as an operand of a load whose result is an operand to a simplifable select, CodeGenPrepare simplifies the select and recursively erases the now dead instructions. CodeGenPrepare then attempts to use the erased address computation for the second load. Fix this by erasing the cached address value if it has zero uses before looking for the address value in the sunken address map. This partially resolves PR35209. Thanks to Alexander Richardson for reporting the issue! Reviewers: john.brawn Differential Revision: https://reviews.llvm.org/D39841 llvm-svn: 318032
* Fix some misc. -enable-var-scope violationsMatt Arsenault2017-11-131-12/+12
| | | | llvm-svn: 318006
* [CGP] Disable Select instruction handling in optimizeMemoryInst. NFCSerguei Katkov2017-11-072-3/+3
| | | | | | | | | | | | | | | | | | This patch disables the handling of selects in optimization extensing scope of optimizeMemoryInst. The optimization itself is disable by default. The idea here is just to switch optimiztion level step by step. Specifically, first optimization will be enabled only for Phi nodes, then select instructions will be added. In case someone will complain about perfromance it will be easier to detect what part of optimizations is responsible for that. Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317555
* [CGP] Extends the scope of optimizeMemoryInst optimization. NFCSerguei Katkov2017-11-052-0/+493
| | | | | | | | | | | Commit tests for previous commit. Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn Reviewed By: john.brawn Subscribers: javed.absar, john.brawn, dneilson, llvm-commits Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317430
* Revert "Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()"Adrian Prantl2017-11-031-118/+0
| | | | | | This reverts commit 317342 while investigating bot breakage. llvm-svn: 317345
* Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()Adrian Prantl2017-11-031-0/+118
| | | | | | | | This preserves the debug info for the cast operation in the original location. rdar://problem/33460652 llvm-svn: 317340
* re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."Clement Courbet2017-11-031-771/+0
| | | | | | Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/. llvm-svn: 317318
* Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."Clement Courbet2017-11-021-0/+771
| | | | | | | | | undefined reference to `llvm::TargetPassConfig::ID' on clang-ppc64le-linux-multistage This reverts commit eea333c33fa73ad225ef28607795984829f65688. llvm-svn: 317213
* [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.Clement Courbet2017-11-021-771/+0
| | | | | | | | | | | | | | | | | Summary: This is mostly a noop (most of the test diffs are renamed blocks). There are a few temporary register renames (eax<->ecx) and a few blocks are shuffled around. See the discussion in PR33325 for more details. Reviewers: spatel Subscribers: mgorny Differential Revision: https://reviews.llvm.org/D39456 llvm-svn: 317211
* [CGP] Fix crash on i96 bit multiplyPhilip Reames2017-10-301-0/+10
| | | | | | | | Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725 If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like. llvm-svn: 316967
* [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).Clement Courbet2017-10-301-21/+7
| | | | | | | | | | | | - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905
* Revert "[CGP] Merge empty case blocks if no extra moves are added."Balaram Makam2017-10-271-98/+2
| | | | | | This reverts commit r316711. The domtree isn't getting updated correctly. llvm-svn: 316721
* [CGP] Merge empty case blocks if no extra moves are added.Balaram Makam2017-10-261-2/+98
| | | | | | | | | | | | | | | | Summary: Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case. Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet Reviewed By: efriedma Subscribers: dberlin, kuhar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37343 llvm-svn: 316711
* [CGP] In optimizeMemoryInst handle select similarly to phiJohn Brawn2017-10-031-0/+17
| | | | | | | | | | This lets us optimize away selects that perform the same address computation in two different ways and is also the first step towards being able to handle selects between two different, but compatible, address computations. Differential Revision: https://reviews.llvm.org/D38242 llvm-svn: 314794
* Revert "[BypassSlowDivision] Improve our handling of divisions by constants"Sanjoy Das2017-09-291-77/+0
| | | | | | | This reverts commit r314253. It causes a miscompile on P100 in an internal benchmark. Reverting while I investigate. llvm-svn: 314482
* [BypassSlowDivision] Improve our handling of divisions by constantsSanjoy Das2017-09-261-0/+77
| | | | | | | | | | | | | | | Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 314253
* Unmerge GEPs to reduce register pressure on IndirectBr edges.Hiroshi Yamauchi2017-09-111-0/+60
| | | | | | | | | | | | | | | | | | | | | | | Summary: GEP merging can sometimes increase the number of live values and register pressure across control edges and cause performance problems particularly if the increased register pressure results in spills. This change implements GEP unmerging around an IndirectBr in certain cases to mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP merging has happened.) With this patch, the Python interpreter loop runs faster by ~5%. Reviewers: sanjoy, hfinkel Reviewed By: hfinkel Subscribers: eastig, junbuml, llvm-commits Differential Revision: https://reviews.llvm.org/D36772 llvm-svn: 312930
* [CGP] Fix the rematerialization of gc.relocatesSerguei Katkov2017-08-171-0/+22
| | | | | | | | | | | | | | | | | | | If we want to substitute the relocation of derived pointer with gep of base then we must ensure that relocation of base dominates the relocation of derived pointer. Currently only check for basic block is present. However it is possible that both relocation are in the same basic block but relocation of derived pointer is defined earlier. The patch moves the relocation of base pointer right before relocation of derived pointer in this case. Reviewers: sanjoy,artagnon,igor-laevsky,reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36462 llvm-svn: 311067
* [CGP] use narrower types in memcmp expansion when possibleSanjay Patel2017-08-011-179/+84
| | | | | | | | This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711
* [CGP] use subtract or subtract-of-cmps for result of memcmp expansionSanjay Patel2017-07-311-13/+14
| | | | | | | | | | | | | | | | | | | | | | As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 llvm-svn: 309597
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-1021/+53
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [CGP] Allow cycles during Phi traversal in OptimizaMemoryInstSerguei Katkov2017-07-191-1/+34
| | | | | | | | | | | | | | Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 llvm-svn: 308419
* [x86, CGP] increase memcmp() expansion up to 4 load pairsSimon Pilgrim2017-07-181-88/+1547
| | | | | | | | | | | | | | | | | | | | | It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322
* [CodeGenPrepare] Don't create dead instructions in addrmode sinkingEli Friedman2017-07-121-0/+24
| | | | | | | | | | When we fail to sink an instruction, we must make sure not to modify the function; otherwise, we end up in an infinite loop because CodeGenPrepare iterates until it doesn't make any changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 . llvm-svn: 307866
* Add a test for r307754George Burgess IV2017-07-121-0/+16
| | | | | | | | | | | As promised in D35003. Uses -codegenprepare instead of -instcombine since we hit the same buggy path anyway, and CGP lets us keep this test really simple (instcombine likes turning the alloca T, N into alloca [N x T], which hides the bug this is testing for). llvm-svn: 307811
* [CGP, x86] update test checks; NFCSanjay Patel2017-07-061-38/+39
| | | | | | | | This was auto-generated using an older version of the script, and that version does not work with phis, so if we enable expansion it will go bad. llvm-svn: 307267
* [CodeGenPrepare] Don't create inttoptr for ni ptrsKeno Fischer2017-06-291-0/+68
| | | | | | | | | | | | | Summary: Arguably non-integral pointers probably shouldn't show up here at all, but since the backend doesn't complain and this takes valid (according to the Verifier) IR and makes it invalid, make sure not to introduce any inttoptr instructions if we're dealing with non-integral pointers. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33110 llvm-svn: 306737
* [CGP] add specialization for memcmp expansion with only one basic blockSanjay Patel2017-06-271-90/+33
| | | | llvm-svn: 306485
* [CGP] eliminate a sub instruction in memcmp expansionSanjay Patel2017-06-271-30/+25
| | | | | | | | | | | | | | | | | | | | | | As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471
* [x86] set the datalayout to match the RUN line triple; NFCSanjay Patel2017-06-211-4/+2
| | | | | | | I don't think there's any visible difference from having the wrong layout for the 32-bit case at this point, but that could change in the future. llvm-svn: 305931
* [x86] enable CGP memcmp() expansion for 2/4/8 byte sizesSanjay Patel2017-06-201-18/+132
| | | | | | | | | There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802
* [CGP, x86] add tests for potential memcmp expansion; NFCSanjay Patel2017-06-081-0/+337
| | | | | | | | | | | | No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ), so I want these for extra coverage if we enable memcmp expansion for x86. As shown, nothing is expanded for x86 in CGP yet. Also fundamentally, we're doing an IR transform, so we should have IR tests for just that part. If something goes wrong, we need to know if the bug is in CGP or later lowering. llvm-svn: 305011
* Restrict call metadata based hotness detection to Sample PGO modeTeresa Johnson2017-05-112-9/+66
| | | | | | | | | | | | | | | | | | | | | | | Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844
* Fix code section prefix for proper layoutTeresa Johnson2017-05-091-1/+1
| | | | | | | | | | | | | | | | | | Summary: r284533 added hot and cold section prefixes based on profile information, to enable grouping of hot/cold functions at link time. However, it used "cold" as the prefix for cold sections, but gold only recognizes "unlikely" (which is used by gcc for cold sections). Therefore, cold sections were not properly being grouped. Switch to using "unlikely" Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32983 llvm-svn: 302502
* [CodeGenPrepare] Fix crash due to an invalid CFGBrendon Cahoon2017-04-171-0/+37
| | | | | | | | | | | | | | | The splitIndirectCriticalEdges function generates and invalid CFG when the 'Target' basic block is a loop to itself. When this occurs, the code that updates the predecessor terminator needs to update the terminator in the split basic block. This occurs when there is an edge from block D back to D. Since D is split in to D0 and D1, the code needs to update the terminator in D1. But D1 is not in the OtherPreds vector, so it was not getting updated. Differential Revision: https://reviews.llvm.org/D32126 llvm-svn: 300480
* Add address space mangling to lifetime intrinsicsMatt Arsenault2017-04-101-10/+10
| | | | | | In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876
* Turn on -addr-sink-using-gep by default.Eli Friedman2017-04-063-21/+18
| | | | | | | | | The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
* [BypassSlowDivision] Do not bypass division of hash-like valuesNikolai Bozhenov2017-04-021-0/+121
| | | | | | | | | | | | | | | | | Disable bypassing if one of the operands looks like a hash value. Slow division often occurs in hashtable implementations and fast division is never taken there because a hash value is extremely unlikely to have enough upper bits set to zero. A value is considered to be hash-like if it is produced by 1) XOR operation 2) Multiplication by a constant wider than the shorter type 3) PHI node with all incoming values being hash-like Differential Revision: https://reviews.llvm.org/D28200 llvm-svn: 299329
* Use isFunctionHotInCallGraph to set the function section prefix.Dehao Chen2017-03-231-0/+22
| | | | | | | | | | | | | | Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31225 llvm-svn: 298656
* AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernelMatt Arsenault2017-03-211-1/+1
| | | | | | | | | | | | Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
* Let llvm.objectsize be conservative with null pointersGeorge Burgess IV2017-03-211-2/+40
| | | | | | | | | | | This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430
* [SimplifyCFG] move tests for PR31028 from CGPSanjay Patel2017-03-131-122/+0
| | | | | | Hopefully, this will make sense with a forthcoming patch. If not, we can move these back. llvm-svn: 297660
* [CGP] add tests for PR31028; NFCSanjay Patel2017-03-131-0/+122
| | | | llvm-svn: 297629
* [BypassSlowDivision] Use ValueTracking to simplify run-time checksNikolai Bozhenov2017-03-021-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 llvm-svn: 296832
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-281-0/+294
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416
* Revert "[CGP] Split some critical edges coming out of indirect branches"Daniel Jasper2017-02-261-254/+0
| | | | | | | This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295
OpenPOWER on IntegriCloud