summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/CodeGenPrepare/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOPSimon Pilgrim2018-01-301-6/+3
| | | | | | | | Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types. Differential Revision: https://reviews.llvm.org/D42526 llvm-svn: 323797
* Regenerate shuffle sink testSimon Pilgrim2018-01-241-28/+39
| | | | llvm-svn: 323328
* X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BWZvi Rackover2018-01-241-10/+28
| | | | | | | | | | | | | | | | Summary: AVX512BW adds support for variable shift amount for 16-bit element vectors. Reviewers: craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: rengolin, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42437 llvm-svn: 323292
* Add bdver shuffle sink tests.Simon Pilgrim2018-01-231-0/+21
| | | | llvm-svn: 323268
* Regenerate select test. NFCI.Simon Pilgrim2018-01-231-53/+74
| | | | llvm-svn: 323265
* Regenerate shuffle sink test. NFCI.Simon Pilgrim2018-01-231-42/+69
| | | | llvm-svn: 323264
* X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFCZvi Rackover2018-01-231-9/+10
| | | | | | | Case points out that we don't consider shifts supported by AVX512BW in isVectorShiftByScalarCheap() llvm-svn: 323242
* [CGP] Fix the GV handling in complex addressing modeSerguei Katkov2018-01-231-0/+15
| | | | | | | | | | | | | | | If in complex addressing mode the difference is in GV then base reg should not be installed because we plan to use base reg as a merge point of different GVs. This is a fix for PR35980. Reviewers: reames, john.brawn, santosh Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42230 llvm-svn: 323192
* Remove alignment argument from memcpy/memmove/memset in favour of alignment ↵Daniel Neilson2018-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965
* [CGP] Fix Complex addressing mode for offsetSerguei Katkov2018-01-091-0/+19
| | | | | | | | | | | | | | If the offset is differ in two addressing mode we can continue only if ScaleReg is not set due to we will use it as merge of different offsets. It should fix PR35799 and PR35805. Reviewers: john.brawn, reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41227 llvm-svn: 322056
* [CGP] Fix the handling select inst in complex addressing modeSerguei Katkov2017-12-181-0/+21
| | | | | | | | | | | | | | | When we put the value in select placeholder we must pass the value through simplification tracker due to the value might be already simplified and erased. This is a fix for PR35658. Reviewers: john.brawn, uabelho Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41251 llvm-svn: 320956
* [CGP] Fix common type handling in optimizeMemoryInstSerguei Katkov2017-11-291-0/+33
| | | | | | | | | | | | | | | | | If common type is different we should bail out due to we will not be able to create a select or Phi of these values. Basically it is done in ExtAddrMode::compare however it does not work if we handle the null first and then two values of different types. so add a check in initializeMap as well. The check in ExtAddrMode::compare is used as earlier bail out. Reviewers: reames, john.brawn Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40479 llvm-svn: 319292
* [CGP] Fix the crash caused by enable of complex addr modeSerguei Katkov2017-11-201-0/+35
| | | | | | | | | | | | | We must collect all AddModes even if they are the same. This is due to Original value is different but we need all original values collected as they are used as anchors in common phi finding. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40166 llvm-svn: 318638
* Remove stray comma in sink-addrmode testJohn Brawn2017-11-161-1/+1
| | | | | | | The extra comma meant it wasn't correctly checking that we weren't getting an extra getelementptr. llvm-svn: 318406
* [CGP] Disable Select instruction handling in optimizeMemoryInst. NFCSerguei Katkov2017-11-071-2/+2
| | | | | | | | | | | | | | | | | | This patch disables the handling of selects in optimization extensing scope of optimizeMemoryInst. The optimization itself is disable by default. The idea here is just to switch optimiztion level step by step. Specifically, first optimization will be enabled only for Phi nodes, then select instructions will be added. In case someone will complain about perfromance it will be easier to detect what part of optimizations is responsible for that. Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317555
* [CGP] Extends the scope of optimizeMemoryInst optimization. NFCSerguei Katkov2017-11-051-0/+475
| | | | | | | | | | | Commit tests for previous commit. Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn Reviewed By: john.brawn Subscribers: javed.absar, john.brawn, dneilson, llvm-commits Differential Revision: https://reviews.llvm.org/D36073 llvm-svn: 317430
* re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."Clement Courbet2017-11-031-771/+0
| | | | | | Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/. llvm-svn: 317318
* Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."Clement Courbet2017-11-021-0/+771
| | | | | | | | | undefined reference to `llvm::TargetPassConfig::ID' on clang-ppc64le-linux-multistage This reverts commit eea333c33fa73ad225ef28607795984829f65688. llvm-svn: 317213
* [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.Clement Courbet2017-11-021-771/+0
| | | | | | | | | | | | | | | | | Summary: This is mostly a noop (most of the test diffs are renamed blocks). There are a few temporary register renames (eax<->ecx) and a few blocks are shuffled around. See the discussion in PR33325 for more details. Reviewers: spatel Subscribers: mgorny Differential Revision: https://reviews.llvm.org/D39456 llvm-svn: 317211
* [CGP] Fix crash on i96 bit multiplyPhilip Reames2017-10-301-0/+10
| | | | | | | | Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725 If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like. llvm-svn: 316967
* [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).Clement Courbet2017-10-301-21/+7
| | | | | | | | | | | | - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905
* [CGP] In optimizeMemoryInst handle select similarly to phiJohn Brawn2017-10-031-0/+17
| | | | | | | | | | This lets us optimize away selects that perform the same address computation in two different ways and is also the first step towards being able to handle selects between two different, but compatible, address computations. Differential Revision: https://reviews.llvm.org/D38242 llvm-svn: 314794
* [CGP] use narrower types in memcmp expansion when possibleSanjay Patel2017-08-011-179/+84
| | | | | | | | This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711
* [CGP] use subtract or subtract-of-cmps for result of memcmp expansionSanjay Patel2017-07-311-13/+14
| | | | | | | | | | | | | | | | | | | | | | As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 llvm-svn: 309597
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-1021/+53
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [CGP] Allow cycles during Phi traversal in OptimizaMemoryInstSerguei Katkov2017-07-191-1/+34
| | | | | | | | | | | | | | Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 llvm-svn: 308419
* [x86, CGP] increase memcmp() expansion up to 4 load pairsSimon Pilgrim2017-07-181-88/+1547
| | | | | | | | | | | | | | | | | | | | | It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322
* [CodeGenPrepare] Don't create dead instructions in addrmode sinkingEli Friedman2017-07-121-0/+24
| | | | | | | | | | When we fail to sink an instruction, we must make sure not to modify the function; otherwise, we end up in an infinite loop because CodeGenPrepare iterates until it doesn't make any changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 . llvm-svn: 307866
* [CGP, x86] update test checks; NFCSanjay Patel2017-07-061-38/+39
| | | | | | | | This was auto-generated using an older version of the script, and that version does not work with phis, so if we enable expansion it will go bad. llvm-svn: 307267
* [CGP] add specialization for memcmp expansion with only one basic blockSanjay Patel2017-06-271-90/+33
| | | | llvm-svn: 306485
* [CGP] eliminate a sub instruction in memcmp expansionSanjay Patel2017-06-271-30/+25
| | | | | | | | | | | | | | | | | | | | | | As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471
* [x86] set the datalayout to match the RUN line triple; NFCSanjay Patel2017-06-211-4/+2
| | | | | | | I don't think there's any visible difference from having the wrong layout for the 32-bit case at this point, but that could change in the future. llvm-svn: 305931
* [x86] enable CGP memcmp() expansion for 2/4/8 byte sizesSanjay Patel2017-06-201-18/+132
| | | | | | | | | There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802
* [CGP, x86] add tests for potential memcmp expansion; NFCSanjay Patel2017-06-081-0/+337
| | | | | | | | | | | | No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ), so I want these for extra coverage if we enable memcmp expansion for x86. As shown, nothing is expanded for x86 in CGP yet. Also fundamentally, we're doing an IR transform, so we should have IR tests for just that part. If something goes wrong, we need to know if the bug is in CGP or later lowering. llvm-svn: 305011
* Turn on -addr-sink-using-gep by default.Eli Friedman2017-04-063-21/+18
| | | | | | | | | The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723
* [SimplifyCFG] move tests for PR31028 from CGPSanjay Patel2017-03-131-122/+0
| | | | | | Hopefully, this will make sense with a forthcoming patch. If not, we can move these back. llvm-svn: 297660
* [CGP] add tests for PR31028; NFCSanjay Patel2017-03-131-0/+122
| | | | llvm-svn: 297629
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-281-0/+294
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416
* Revert "[CGP] Split some critical edges coming out of indirect branches"Daniel Jasper2017-02-261-254/+0
| | | | | | | This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295
* [CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.Eli Friedman2017-02-241-3/+8
| | | | | | | | | | When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167
* [CGP] Split some critical edges coming out of indirect branchesMichael Kuperstein2017-02-241-0/+254
| | | | | | | | | | | | | | | | | | | | | | Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the *direct* branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-161-3/+3
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
* Revert "[CodeGenPrep] Skip merging empty case blocks"Jun Bum Lim2016-12-161-3/+3
| | | | | | This reverts commit r289951. llvm-svn: 289960
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-12-161-3/+3
| | | | | | | | | | | | | | This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
* Revert r287553: [CodeGenPrep] Skip merging empty case blocksJoerg Sonnenberger2016-11-281-3/+3
| | | | | | | It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052
* [CodeGenPrep] Skip merging empty case blocksJun Bum Lim2016-11-211-3/+3
| | | | | | | | | | | | | | | | Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553
* [CodeGenPrepare] Don't sink a cast past its userDavid Majnemer2016-04-271-0/+32
| | | | | | | | | | The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767
* [CodeGenPrepare] don't convert an unpredictable select into control flowSanjay Patel2016-04-261-5/+24
| | | | | | | Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* [CodeGenPrepare] Avoid sinking soft-FP comparisonsPeter Zotov2016-04-031-0/+29
| | | | | | | | | | | | | | | Sinking comparisons in CGP can undo the job of hoisting them done earlier by LICM, and soft-FP makes this an expensive mistake. A common pattern that produces floating point comparisons uniform over a loop is an explicit check for division by zero. If the divisor is hoisted out of the loop, the comparison can also be, but hoisting the function that unwinds is never legal, since it may cause side effects in the loop body prior to the unwinding to not be executed. Differential Revision: http://reviews.llvm.org/D18744 llvm-svn: 265264
OpenPOWER on IntegriCloud