summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [LICM] Make instruction sinking funclet-awareDavid Majnemer2016-01-042-2/+69
| | | | | | | | | | | We had two bugs here: - We might try to sink into a catchswitch, causing verifier failures. - We will succeed in sinking into a cleanuppad but we didn't update the funclet operand bundle. This fixes PR26000. llvm-svn: 256728
* Fix several accidental DOS line endings in source filesDimitry Andric2016-01-032-49/+49
| | | | | | | | | | | | | | | Summary: There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings. There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those. Reviewers: joerg, aaron.ballman Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15848 llvm-svn: 256707
* [LibCallSimplifier] propagate FMF when shrinking binary callsSanjay Patel2015-12-311-0/+16
| | | | llvm-svn: 256682
* [LibCallSimplifier] propagate FMF when shrinking unary callsSanjay Patel2015-12-311-19/+17
| | | | llvm-svn: 256679
* change function names to avoid accidentally matching the substringSanjay Patel2015-12-311-40/+53
| | | | llvm-svn: 256678
* add 'fast' attribute to calls to show that the flag isn't being propagatedSanjay Patel2015-12-311-55/+57
| | | | llvm-svn: 256677
* [JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost()Geoff Berry2015-12-291-0/+30
| | | | | | | | | | | | The code that was meant to adjust the duplication cost based on the terminator opcode was not being executed in cases where the initial threshold was hit inside the loop. Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15536 llvm-svn: 256568
* [RS4GC] Fix rematerialization of bitcast of bitcast.Manuel Jacob2015-12-281-0/+34
| | | | | | | | | | | | | | | Summary: Previously, only the outer (last) bitcast was rematerialized, resulting in a use of the unrelocated inner (first) bitcast after the statepoint. See the test case for an example. Reviewers: igor-laevsky, reames Subscribers: reames, alex, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15789 llvm-svn: 256520
* [attrs] Extract the pure inference of function attributes intoChandler Carruth2015-12-273-24/+4
| | | | | | | | | | | | | | | | | | | | | | | | | a standalone pass. There is no call graph or even interesting analysis for this part of function attributes -- it is literally inferring attributes based on the target library identification. As such, we can do it using a much simpler module pass that just walks the declarations. This can also happen much earlier in the pass pipeline which has benefits for any number of other passes. In the process, I've cleaned up one particular aspect of the logic which was necessary in order to separate the two passes cleanly. It now counts inferred attributes independently rather than just counting all the inferred attributes as one, and the counts are more clearly explained. The two test cases we had for this code path are both ... woefully inadequate and copies of each other. I've kept the superset test and updated it. We need more testing here, but I had to pick somewhere to stop fixing everything broken I saw here. Differential Revision: http://reviews.llvm.org/D15676 llvm-svn: 256466
* [attrs] Split off the forced attributes utility into its own pass thatChandler Carruth2015-12-272-12/+12
| | | | | | | | | | | | | | | is (by default) run much earlier than FuncitonAttrs proper. This allows forcing optnone or other widely impactful attributes. It is also a bit simpler as the force attribute behavior needs no specific iteration order. I've added the pass into the default module pass pipeline and LTO pass pipeline which mirrors where function attrs itself was being run. Differential Revision: http://reviews.llvm.org/D15668 llvm-svn: 256465
* Fix safepoint intrinsic signatures in test.Benjamin Kramer2015-12-262-4/+4
| | | | | | Should bring back the bots after r256443. llvm-svn: 256450
* [gc.statepoint] Change gc.statepoint intrinsic's return type to token type ↵Chen Li2015-12-2640-224/+224
| | | | | | | | | | | | | | instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443
* [InstCombine] transform more extract/insert pairs into shuffles (PR2109)Sanjay Patel2015-12-241-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
* [OperandBundles] Have TailCallElim play nice with operand bundlesDavid Majnemer2015-12-231-0/+10
| | | | | | | | | A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328
* [OperandBundles] Have InstCombine play nice with operand bundlesDavid Majnemer2015-12-231-0/+11
| | | | | | | Don't assume a call's use corresponds to an argument operand, it might correspond to a bundle operand. llvm-svn: 256327
* [OperandBundles] Have DeadArgElim play nice with operand bundlesDavid Majnemer2015-12-231-0/+12
| | | | | | | A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256326
* [RS4GC] Fix base pair printing for constants.Manuel Jacob2015-12-232-2/+2
| | | | | | | Previously, "%" + name of the value was printed for each derived and base pointer. This is correct for instructions, but wrong for e.g. globals. llvm-svn: 256305
* Also add unnamed_addr to functions.Rafael Espindola2015-12-221-0/+6
| | | | llvm-svn: 256281
* Delete dead GlobalAliases.Rafael Espindola2015-12-223-3/+6
| | | | llvm-svn: 256276
* [BPI] Replace weights by probabilities in BPI.Cong Hou2015-12-221-1/+1
| | | | | | | | | | | | This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263
* Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.Manuel Jacob2015-12-221-2/+2
| | | | | | | | | | | | | | Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262
* [RS4GC] Fix crash in the case that a live variable has a constant base.Manuel Jacob2015-12-222-0/+39
| | | | | | | | | | | | | | | | | | | Summary: Previously, RS4GC crashed in CreateGCRelocates() because it assumed that every base is also in the array of live variables, which isn't true if a live variable has a constant base. This change fixes the crash by making sure CreateGCRelocates() won't try to relocate a live variable with a constant base. This would be unnecessary anyway because anything with a constant base won't move. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15556 llvm-svn: 256252
* Determine callee's hotness and adjust threshold based on that. NFC.Easwaran Raman2015-12-222-0/+78
| | | | | | | | | | This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222
* [safestack] Add option for non-TLS unsafe stack pointer.Evgeniy Stepanov2015-12-221-0/+5
| | | | | | | | | | | | | This patch adds an option, -safe-stack-no-tls, for using normal storage instead of thread-local storage for the unsafe stack pointer. This can be useful when SafeStack is applied to an operating system kernel. http://reviews.llvm.org/D15673 Patch by Michael LeMay. llvm-svn: 256221
* [cfi] Fix LowerBitSets on 32-bit targets.Evgeniy Stepanov2015-12-211-0/+21
| | | | | | | This code attempts to truncate IntPtrTy to i32, which may be the same type. llvm-svn: 256205
* Nonnull elements in OperandBundleCallSites are not all InstructionsSanjoy Das2015-12-191-0/+36
| | | | | | | | | | `CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with `undef` before erasing them (to avoid deleting instructions that still have uses). This changes the `WeakVH` in `OperandBundleCallSites` to hold an `undef`, and we need to guard for this situation in eventuality in `llvm::InlineFunction`. llvm-svn: 256110
* [Deopt bundles] Fix a test caseSanjoy Das2015-12-191-1/+1
| | | | | | | The `CHECK-NOT` line was incorrect, and would not have caught a breakage. llvm-svn: 256109
* Remove double blanks. NFC.Manuel Jacob2015-12-191-7/+7
| | | | llvm-svn: 256100
* [RS4GC] Remove an overly strong assertionPhilip Reames2015-12-191-0/+35
| | | | | | As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice. llvm-svn: 256079
* Clean up the processing of dbg.value in various placesKeno Fischer2015-12-191-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: First up is instcombine, where in the dbg.declare -> dbg.value conversion, the llvm.dbg.value needs to be called on the actual loaded value, rather than the address (since the whole point of this transformation is to be able to get rid of the alloca). Further, now that that's cleaned up, we can remove a hack in the backend, that would add an implicit OP_deref if the argument to dbg.value was an alloca. This stems from before the existence of DIExpression and is no longer necessary since the deref can be expressed explicitly. Now, in order to make sure that the tests pass with this change, we need to correct the printing of DEBUG_VALUE comments to take into account the expression, which wasn't taken into account before. Unfortunately, for both these changes, there were a number of incorrect test cases (mostly the wrong number of DW_OP_derefs, but also a couple where the test itself was broken more badly). aprantl and I have gone through and adjusted these test case in order to make them pass with these fixes and in some cases to make sure they're actually testing what they are meant to test. Reviewers: aprantl Subscribers: dsanders Differential Revision: http://reviews.llvm.org/D14186 llvm-svn: 256077
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-192-0/+36
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* [NaryReassociate] allow candidate to have a different typeJingyue Wu2015-12-181-0/+17
| | | | | | | | | | | | | | | | Summary: If Candiadte may have a different type from GEP, we should bitcast or pointer cast it to GEP's type so that the later RAUW doesn't complain. Added a test in nary-gep.ll Reviewers: tra, meheff Subscribers: mcrosier, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D15618 llvm-svn: 256035
* [WinEH] Update LCSSA to handle catchswitch with handlers inside and outside ↵Andrew Kaylor2015-12-181-0/+95
| | | | | | | | a loop Differential Revision: http://reviews.llvm.org/D15630 llvm-svn: 256005
* [InstCombine] Extend peephole DSE to handle unordered atomicsPhilip Reames2015-12-171-0/+113
| | | | | | | | | | | | This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine. Key points: * We only remove unordered or simple stores. * The loads producing values consumed by dead stores don't influence whether the store is dead. Differential Revision: http://reviews.llvm.org/D15354 llvm-svn: 255932
* [EarlyCSE] DSE of atomic unordered storesPhilip Reames2015-12-171-0/+74
| | | | | | | | | | The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible. For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work. Differential Revision: http://reviews.llvm.org/D15352 llvm-svn: 255914
* [ThinLTO] Metadata linking for imported functionsTeresa Johnson2015-12-172-0/+72
| | | | | | | | | | | | | | | | | | | | | | | Summary: Second patch split out from http://reviews.llvm.org/D14752. Maps metadata as a post-pass from each module when importing complete, suturing up final metadata to the temporary metadata left on the imported instructions. This entails saving the mapping from bitcode value id to temporary metadata in the importing pass, and from bitcode value id to final metadata during the metadata linking postpass. Depends on D14825. Reviewers: dexonsmith, joker.eph Subscribers: davidxl, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D14838 llvm-svn: 255909
* [NFC] Update horizontal reduction test cases.Charlie Turner2015-12-162-2/+2
| | | | | | | These testcases no longer need to specify -slp-vectorize-hor, since it was enabled by default in r252733. llvm-svn: 255783
* [SimplifyCFG] Don't create unnecessary PHIsJames Molloy2015-12-161-0/+215
| | | | | | | | | | | | | | In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. Now with a fix (and fixed tests) for the conformance issue seen in Chromium. llvm-svn: 255767
* [EarlyCSE] DSE of stores which write back loaded valuesPhilip Reames2015-12-161-0/+74
| | | | | | | | | | Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed. I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :) Differential Revision: http://reviews.llvm.org/D15397 llvm-svn: 255739
* [IR] Add support for floating pointer atomic loads and storesPhilip Reames2015-12-161-0/+82
| | | | | | | | | | | | | | | | This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom. Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend. This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form. Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out. As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material. Differential Revision: http://reviews.llvm.org/D15471 llvm-svn: 255737
* Cross-DSO control flow integrity (LLVM part).Evgeniy Stepanov2015-12-151-0/+88
| | | | | | | | An LTO pass that generates a __cfi_check() function that validates a call based on a hash of the call-site-known type and the target pointer. llvm-svn: 255693
* [LoopVectorizer] Refine loop vectorizer's register usage calculator by ↵Cong Hou2015-12-152-1/+72
| | | | | | | | | | | | | | | | | | | | | | ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691
* [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)Sanjay Patel2015-12-152-43/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare. The intent is to restore the behavior enabled by: http://reviews.llvm.org/rL228826 but prevent bad performance such as: https://llvm.org/bugs/show_bug.cgi?id=24818 Earlier patches in this sequence: D12882 (disable SimplifyCFG speculation for expensive instructions) D13297 (have CGP despeculate expensive ops) D14630 (have CGP despeculate special versions of cttz/ctlz) As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal. A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow on those targets. Differential Revision: http://reviews.llvm.org/D15213 llvm-svn: 255660
* AMDGPU: mark ldexp LibCalls as unavailableNicolai Hahnle2015-12-151-10/+9
| | | | | | | | | | | | | | | | | | Summary: The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls which do not make sense with the AMDGPU backend. In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of this optimization, but this works around the problem for now. See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/D14990 llvm-svn: 255658
* Instcombine: destructor loads of structs that do not contains paddingMehdi Amini2015-12-153-84/+108
| | | | | | | | | | | | | | | | | | | | For non padded structs, we can just proceed and deaggregate them. We don't want ot do this when there is padding in the struct as to not lose information about this padding (the subsequents passes would then try hard to preserve the padding, which is undesirable). Also update extractvalue.ll and cast.ll so that they use structs with padding. Remove the FIXME in the extractvalue of laod case as the non padded case is handled when processing the load, and we don't want to do it on the padded case. Patch by: Amaury SECHET <deadalnix@gmail.com> Differential Revision: http://reviews.llvm.org/D14483 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255600
* [PGO] make profile prefix even shorter and more readableXinliang David Li2015-12-158-39/+39
| | | | llvm-svn: 255586
* [PGO] Shorten profile symbol prefixesXinliang David Li2015-12-148-39/+39
| | | | | | | | | Profile symbols have long prefixes which waste space and creating pressure for linker. This patch shortens the prefixes to minimal length without losing verbosity. Differential Revision: http://reviews.llvm.org/D15503 llvm-svn: 255575
* Revert "Don't create unnecessary PHIs"Reid Kleckner2015-12-142-200/+4
| | | | | | | | | This reverts commit r255489. It causes test failures in Chromium and does not appear to respect the AlternativeV parameter. llvm-svn: 255562
* add fast-math-flags to 'call' instructions (PR21290)Sanjay Patel2015-12-146-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp) to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add support to clang, and extend FMF to the DAG for calls. Motivating example: %y = fmul fast float %x, %x %z = tail call float @sqrtf(float %y) We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide attribute for unsafe-math, but we really want to trigger on the instructions themselves: %z = tail call fast float @sqrtf(float %y) because in an LTO build it's possible that calls with fast semantics have been inlined into a function with non-fast semantics. The code changes and tests are based on the recent commits that added "notail": http://reviews.llvm.org/rL252368 and added FMF to fcmp: http://reviews.llvm.org/rL241901 Differential Revision: http://reviews.llvm.org/D14707 llvm-svn: 255555
* [IR] Remove terminatepadDavid Majnemer2015-12-145-134/+8
| | | | | | | | | | | | | It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522
OpenPOWER on IntegriCloud