summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [ConstantMerge] Factor out check for un-mergeable globals, NFCVedant Kumar2019-01-201-10/+12
| | | | llvm-svn: 351671
* [InstCombine] Simplify cttz/ctlz + icmp ugt/ultNikita Popov2019-01-192-11/+70
| | | | | | | | | | | | Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-19252-1008/+756
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Enable IPConstantPropagation to work with abstract call sitesJohannes Doerfert2019-01-191-12/+23
| | | | | | | | | | | | | This modification of the currently unused inter-procedural constant propagation pass (IPConstantPropagation) shows how abstract call sites enable optimization of callback calls alongside direct and indirect calls. Through minimal changes, mostly dealing with the partial mapping of callbacks, inter-procedural constant propagation was enabled for callbacks, e.g., OpenMP runtime calls or pthreads_create. Differential Revision: https://reviews.llvm.org/D56447 llvm-svn: 351628
* [MergeFunc] Allow merging identical vararg functions using aliasesVedant Kumar2019-01-191-11/+14
| | | | | | | | | | | Thanks to Nikita Popov for pointing out this missed case. This is a follow-up to r351411, which disabled function merging for vararg functions outright due to a miscompile (see llvm.org/PR40345). Differential Revision: https://reviews.llvm.org/D56865 llvm-svn: 351624
* [HotColdSplit] Mark inherently cold functions as suchVedant Kumar2019-01-191-20/+39
| | | | | | | | | | If an inherently cold function is found, mark it as cold. For now this means applying the `cold` and `minsize` attributes. As a drive-by, revisit and clean up the criteria for considering a function for splitting. Add tests. llvm-svn: 351623
* [HotColdSplit] Remove a set which tracked split functions (NFC)Vedant Kumar2019-01-191-8/+3
| | | | | | | Use the begin/end iterator idiom to avoid visiting split functions, instead of doing a set lookup. llvm-svn: 351622
* [CodeExtractor] Emit lifetime markers around reloads of outputsVedant Kumar2019-01-191-66/+71
| | | | | | | | | | | | | | | | CodeExtractor permits extracting a region of blocks from a function even when values defined within the region are used outside of it. This is typically done by creating an alloca in the original function and reloading the alloca after a call to the extracted function. Wrap the reload in lifetime start/end markers to promote stack coloring. Suggested by Sergei Kachkov! Differential Revision: https://reviews.llvm.org/D56045 llvm-svn: 351621
* [LCSSA] Skip blocks in sub-loops when scanning for uses.Florian Hahn2019-01-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Scanning blocks in sub-loops for uses is unnecessary, as they were already handled while dealing with the containing sub-loop. This speeds up LCSSA for highly nested loops. For the test case in PR37202, it halves the time spent in LCSSA. In cases were we won't be able to skip any blocks, the additional lookup should be negligible. Time-passes without this patch for test case from PR37202: Total Execution Time: 48.5505 seconds (48.5511 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 10.0822 ( 21.0%) 0.1406 ( 27.0%) 10.2228 ( 21.1%) 10.2228 ( 21.1%) Loop-Closed SSA Form Pass 10.0417 ( 20.9%) 0.1467 ( 28.2%) 10.1884 ( 21.0%) 10.1890 ( 21.0%) Loop-Closed SSA Form Pass #2 4.2703 ( 8.9%) 0.0040 ( 0.8%) 4.2742 ( 8.8%) 4.2742 ( 8.8%) Unswitch loops 2.7376 ( 5.7%) 0.0229 ( 4.4%) 2.7605 ( 5.7%) 2.7611 ( 5.7%) Loop-Closed SSA Form Pass #5 2.7332 ( 5.7%) 0.0214 ( 4.1%) 2.7546 ( 5.7%) 2.7546 ( 5.7%) Loop-Closed SSA Form Pass #3 2.7088 ( 5.6%) 0.0230 ( 4.4%) 2.7319 ( 5.6%) 2.7324 ( 5.6%) Loop-Closed SSA Form Pass #4 2.6855 ( 5.6%) 0.0236 ( 4.5%) 2.7091 ( 5.6%) 2.7090 ( 5.6%) Loop-Closed SSA Form Pass #6 2.1648 ( 4.5%) 0.0018 ( 0.4%) 2.1666 ( 4.5%) 2.1664 ( 4.5%) Unroll loops 1.8371 ( 3.8%) 0.0009 ( 0.2%) 1.8379 ( 3.8%) 1.8380 ( 3.8%) Value Propagation 1.8149 ( 3.8%) 0.0021 ( 0.4%) 1.8170 ( 3.7%) 1.8169 ( 3.7%) Loop Invariant Code Motion 1.6755 ( 3.5%) 0.0226 ( 4.3%) 1.6981 ( 3.5%) 1.6980 ( 3.5%) Loop-Closed SSA Form Pass #7 Time-passes with this patch Total Execution Time: 29.9285 seconds (29.9276 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 5.2786 ( 17.7%) 0.0021 ( 1.2%) 5.2806 ( 17.6%) 5.2808 ( 17.6%) Unswitch loops 4.3739 ( 14.7%) 0.0303 ( 18.1%) 4.4042 ( 14.7%) 4.4042 ( 14.7%) Loop-Closed SSA Form Pass 4.2658 ( 14.3%) 0.0192 ( 11.5%) 4.2850 ( 14.3%) 4.2851 ( 14.3%) Loop-Closed SSA Form Pass #2 2.2307 ( 7.5%) 0.0013 ( 0.8%) 2.2320 ( 7.5%) 2.2318 ( 7.5%) Loop Invariant Code Motion 2.0888 ( 7.0%) 0.0012 ( 0.7%) 2.0900 ( 7.0%) 2.0897 ( 7.0%) Unroll loops 1.6761 ( 5.6%) 0.0013 ( 0.8%) 1.6774 ( 5.6%) 1.6774 ( 5.6%) Value Propagation 1.3686 ( 4.6%) 0.0029 ( 1.8%) 1.3716 ( 4.6%) 1.3714 ( 4.6%) Induction Variable Simplification 1.1457 ( 3.8%) 0.0010 ( 0.6%) 1.1468 ( 3.8%) 1.1468 ( 3.8%) Loop-Closed SSA Form Pass #4 1.1384 ( 3.8%) 0.0005 ( 0.3%) 1.1389 ( 3.8%) 1.1389 ( 3.8%) Loop-Closed SSA Form Pass #6 1.1360 ( 3.8%) 0.0027 ( 1.6%) 1.1387 ( 3.8%) 1.1387 ( 3.8%) Loop-Closed SSA Form Pass #5 1.1331 ( 3.8%) 0.0010 ( 0.6%) 1.1341 ( 3.8%) 1.1340 ( 3.8%) Loop-Closed SSA Form Pass #3 Reviewers: davide, efriedma, mzolotukhin Reviewed By: davide, efriedma Subscribers: hiraditya, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56848 llvm-svn: 351567
* Re-enable terminator folding in LoopSimplifyCFG: underlying bugs fixedMax Kazantsev2019-01-181-1/+1
| | | | llvm-svn: 351520
* [HotColdSplit] Allow outlining with live outputsVedant Kumar2019-01-171-10/+0
| | | | | | | | | | | | | Prior to r348205, extracting code regions with live output values was disabled because of a miscompilation (PR39433). Lift the restriction as PR39433 has been addressed. Tested on LNT+externals, on a run of check-llvm in a stage2 build, and with a full build of iOS (with hot/cold splitting enabled). As a drive-by, remove an errant TODO. llvm-svn: 351492
* [HotColdSplit] Consider resume instructions to be coldVedant Kumar2019-01-171-1/+1
| | | | | | | | | | Resuming exception unwinding is roughly as unlikely as throwing an exception. Tested on LNT+externals (in particular, the C++ EH regression tests provide end-to-end test coverage), as well as with a full build of iOS. llvm-svn: 351491
* [HotColdSplit] Relax requirement that the cold sink block be extractableVedant Kumar2019-01-171-5/+2
| | | | | | | | | Relaxing this requirement creates opportunities to split code dominated by an EH pad. Tested on LNT+externals. llvm-svn: 351483
* [HotColdSplit] Simplify tests by lowering their splitting thresholdsVedant Kumar2019-01-171-4/+9
| | | | | | | | This gets rid of the brittle/mysterious calls to @sink()/@sideeffect() peppered throughout the test cases. They are no longer needed to force splitting to occur. llvm-svn: 351480
* [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlinkWei Mi2019-01-171-2/+15
| | | | | | | | | | | | | If the sample profile has no inlining hierachy information included, we call the sample profile is flattened. For flattened profile, in ThinLTO postlink phase, SampleProfileLoader's hot function inlining and profile annotation will do nothing, so it is better to save the effort to read in the profile and run the sample profile loader pass. It is helpful for reducing compile time when the flattened profile is huge. Differential Revision: https://reviews.llvm.org/D54819 llvm-svn: 351476
* [InstCombine] Don't sink dynamic allocasReid Kleckner2019-01-171-3/+5
| | | | | | | | | | | | | | | | | | | Summary: InstCombine's sinking algorithm only thinks about memory. It doesn't think about non-memory constraints like stack object lifetime. It can sink dynamic allocas across a stacksave call, which may be used with stackrestore, which can incorrectly reduce the lifetime of the dynamic alloca. Fixes PR40365 Reviewers: hfinkel, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56872 llvm-svn: 351475
* Revert "[ThinLTO] Add summary entries for index-based WPD"Teresa Johnson2019-01-171-33/+12
| | | | | | | | Mistaken commit of something still under review! This reverts commit r351453. llvm-svn: 351455
* [ThinLTO] Add summary entries for index-based WPDTeresa Johnson2019-01-171-12/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If LTOUnit splitting is disabled, the module summary analysis computes the summary information necessary to perform single implementation devirtualization during the thin link with the index and no IR. The information collected from the regular LTO IR in the current hybrid WPD algorithm is summarized, including: 1) For vtable definitions, record the function pointers and their offset within the vtable initializer (subsumes the information collected from IR by tryFindVirtualCallTargets). 2) A record for each type metadata summarizing the vtable definitions decorated with that metadata (subsumes the TypeIdentiferMap collected from IR). Also added are the necessary bitcode records, and the corresponding assembly support. The index-based WPD will be sent as a follow-on. Depends on D53890. Reviewers: pcc Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54815 llvm-svn: 351453
* [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a siblingMax Kazantsev2019-01-171-0/+9
| | | | | | | | | | | | | During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the parent loop may stop being reachable from the child loop, and therefore they become siblings. If the former child loop had uses of some values from its former parent loop, now such uses will require LCSSA Phis, even if they weren't needed before. So we must form LCSSA for all loops that stopped being ancestors of the current loop in this case. Differential Revision: https://reviews.llvm.org/D56144 Reviewed By: fedor.sergeev llvm-svn: 351434
* [LoopSimplifyCFG] Fix order of deletion of complex dead subloopsMax Kazantsev2019-01-171-2/+3
| | | | | | | | | | | | | | | | | Function `DeleteDeadBlock` requires that all predecessors of a block being deleted have already been deleted, with the exception of a single-block loop. When we use it for removal of dead subloops that contain more than one block, we may not fulfull this requirement and fail an assertion. This patch replaces invocation of `DeleteDeadBlock` with a generalized version `DeleteDeadBlocks` that is able to deal with multiple dead blocks, even if they contain some cycles. Differential Revision: https://reviews.llvm.org/D56121 Reviewed By: fedor.sergeev llvm-svn: 351433
* [NFC] Factor out some local varsMax Kazantsev2019-01-171-7/+9
| | | | llvm-svn: 351416
* [MergeFunc] Prevent silent miscompile of vararg functionsVedant Kumar2019-01-171-1/+7
| | | | | | | | | | | | The function merging pass miscompiles identical vararg functions. The forwarding thunk it emits doesn't forward the full variable-length list of arguments. Disable merging for vararg functions for now. I've filed llvm.org/PR40345 to track the issue. rdar://47326238 llvm-svn: 351411
* [FunctionComparator] Consider tail call kindsVedant Kumar2019-01-171-22/+11
| | | | | | | | | | | Essentially, do not treat `call` and `musttail call` as the same thing. As a drive-by, fold CallInst and InvokeInst handling together using the CallSite helper. Differential Revision: https://reviews.llvm.org/D56815 llvm-svn: 351405
* Fix a mistake in rL351392.Wei Mi2019-01-161-1/+1
| | | | | | PGOInstrGen should be initialized to "" instead of false. llvm-svn: 351397
* [PGO] Make pgo related options in opt more consistent.Wei Mi2019-01-161-17/+4
| | | | | | | | | | | | | | | Currently we have pgo options defined in PassManagerBuilder.cpp only for instrument pgo, but not for sample pgo. We also have pgo options defined in NewPMDriver.cpp in opt only for new pass manager and for all kinds of pgo. They have some inconsistency. To make the options more consistent and make tests writing easier, the patch let old pass manager to share the same pgo options with new pass manager in opt, and removes the options in PassManagerBuilder.cpp. Differential Revision: https://reviews.llvm.org/D56749 llvm-svn: 351392
* [SLP] Fix PR40310: The reduction nodes should stay scalar.Alexey Bataev2019-01-161-1/+2
| | | | | | | | | | | | | | | | | Summary: Sometimes the SLP vectorizer tries to vectorize the horizontal reduction nodes during regular vectorization. This may happen inside of the loops, when there are some vectorizable PHIs. Patch fixes this by checking if the node is the reduction node and thus it must not be vectorized, it must be gathered. Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56783 llvm-svn: 351349
* Assertion in isAllocaPromotable due to extra bitcast goes into lifetime markerGabor Buella2019-01-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic. For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation) ``` %rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast %5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %5) ``` isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion. As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation. The test is a greatly simplified to just reproduce the assertion. Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com> Reviewers: chandlerc, craig.topper Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D55934 llvm-svn: 351325
* [MSan] Apply the ctor creation scheme of TSanPhilip Pfaffe2019-01-161-1/+23
| | | | | | | | | | | | Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56734 llvm-svn: 351322
* [NewPM][TSan] Reiterate the TSan portPhilip Pfaffe2019-01-163-35/+86
| | | | | | | | | | | | | | | | | | | Summary: Second iteration of D56433 which got reverted in rL350719. The problem in the previous version was that we dropped the thunk calling the tsan init function. The new version keeps the thunk which should appease dyld, but is not actually OK wrt. the current semantics of function passes. Hence, add a helper to insert the functions only on the first time. The helper allows hooking into the insertion to be able to append them to the global ctors list. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56538 llvm-svn: 351314
* Only promote args when function attributes are compatibleTom Stellard2019-01-161-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296
* [InstCombine]Avoid introduction of unaligned mem accessSerguei Katkov2019-01-161-3/+20
| | | | | | | | | | | | | | InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295
* treat invoke like callDavid Callahan2019-01-151-4/+3
| | | | | | | | | | | | | | | | | | | Summary: InvokeInst should be treated like CallInst and assigned a separate discriminator. This is particularly import when an Invoke is converted to a Call during compilation and so can invalidate sample profile data collected wtih different link time optimizations Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56491 llvm-svn: 351251
* [SanitizerCoverage] Don't create comdat for interposable functions.Matt Morehouse2019-01-151-1/+1
| | | | | | | | | | | | | | | | | | | Summary: Comdat groups override weak symbol behavior, allowing the linker to keep the comdats for weak symbols in favor of comdats for strong symbols. Fixes the issue described in: https://bugs.chromium.org/p/chromium/issues/detail?id=918662 Reviewers: eugenis, pcc, rnk Reviewed By: pcc, rnk Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56516 llvm-svn: 351247
* We can improve the performance (generally) by memo-izing the action to map a ↵David Callahan2019-01-151-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debug location to its function summary. Summary: Here are timings (as reported by "opt -time-passes") for sample-profile pass for some files holding hot functions from a major service©r. Average 17% reduction. Delta column is 100*(old-new)/old. ``` Old New Delta 0.0537 0.0538 -0.2% 0.8155 0.6522 20.0% 0.0779 0.0751 3.6% 0.0727 0.0913 -25.6% 0.1622 0.1302 19.7% 0.0627 0.0594 5.3% 0.0766 0.0744 2.9% 0.6426 0.4387 31.7% 0.3521 0.2776 21.2% 0.3549 0.2721 23.3% 0.0912 0.0904 0.9% 0.1236 0.1059 14.3% 0.0854 0.0866 -1.4% 0.0757 0.0722 4.6% 0.1293 0.1147 11.3% 0.1354 0.1122 17.1% 0.0767 0.0770 -0.4% 0.1135 0.0968 14.7% 0.0524 0.0608 -16.0% 0.1279 0.1106 13.5% ========== 3.6820 3.0520 17.1% Total ``` Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D56435 llvm-svn: 351211
* [SimpleLoopUnswitch] Increment stats counter for unswitching switch instructionZaara Syeda2019-01-151-1/+4
| | | | | | | | | | | Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for unswitching a non-trivial switch instruction. This is to fix a bug that it increments NumBranches even for the case of switch instruction. There is no functional change in this patch. Differential Revision: https://reviews.llvm.org/D56408 llvm-svn: 351193
* [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.Florian Hahn2019-01-151-4/+3
| | | | | | | | | Otherwise instcombine gets stuck in a cycle. The canonicalization was added in D55961. This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400 llvm-svn: 351187
* [NFC] Remove some code duplicationMax Kazantsev2019-01-151-26/+9
| | | | llvm-svn: 351185
* [NFC] Remove obsolete enum RangeCheckKindMax Kazantsev2019-01-151-59/+16
| | | | llvm-svn: 351183
* [NFC] Decrease if nestMax Kazantsev2019-01-151-18/+14
| | | | llvm-svn: 351180
* [NFC] Move some functions to LoopUtilsMax Kazantsev2019-01-152-42/+42
| | | | llvm-svn: 351179
* AMDGPU: Add a fast path for icmp.i1(src, false, NE)Marek Olsak2019-01-151-0/+5
| | | | | | | | | | | | | | | | | Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150
* [SanitizerCoverage][NFC] Use appendToUsed instead of includeJonathan Metzman2019-01-141-16/+7
| | | | | | | | | | | | | | | | | | | Summary: Use appendToUsed instead of include to ensure that SanitizerCoverage's constructors are not stripped. Also, use isOSBinFormatCOFF() to determine if target binary format is COFF. Reviewers: pcc Reviewed By: pcc Subscribers: hiraditya Differential Revision: https://reviews.llvm.org/D56369 llvm-svn: 351118
* Ignore PhiNodes when mapping sample profile dataDavid Callahan2019-01-141-3/+3
| | | | | | | | | | | | | | Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored. Reviewers: danielcdh, twoh, Kader, wmi Reviewed By: wmi Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55094 llvm-svn: 351102
* Revert "Merge branch 'arcpatch-D55094'"David Callahan2019-01-141-3/+3
| | | | | | | | | This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing changes made to f1309ffebf718d16aec4fab83380556c660e2825. unintended merge pushed llvm-svn: 351095
* Merge branch 'arcpatch-D55094'David Callahan2019-01-141-3/+3
| | | | llvm-svn: 351092
* cmake: Don't install plugins used for examples or testsTom Stellard2019-01-141-1/+1
| | | | | | | | | | | | | | | | Summary: This patch drops install targets for LLVMHello.so, TestPlugin.so, and BugpointPasses.so. Reviewers: chandlerc, beanz, thakis, philip.pfaffe Reviewed By: chandlerc Subscribers: SquallATF, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D55965 llvm-svn: 351087
* [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIfJeremy Morse2019-01-141-12/+1
| | | | | | | | | | | | | | | | | Following PR39807, the way in which SimplifyCFG hoists common code on branch paths was fixed in r347782. However this left extra code hanging around HoistThenElseCodeToIf that wasn't necessary and needlessly complicated matters -- we no longer need to look up through the 'if' basic block to find a location for hoisted 'select' insts, we can instead use the location chosen by applyMergedLocation. This patch deletes that extra logic, and updates a regression test to reflect the new logic (selects get the merged location, not a previous insts location). Differential Revision: https://reviews.llvm.org/D55272 llvm-svn: 351058
* [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd tryDavid Stuttard2019-01-142-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054
* [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocksMax Kazantsev2019-01-141-36/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Utility function `DeleteDeadBlock` expects that all predecessors of a block being deleted are already deleted, with the exception of single-block loop. It makes it hard to use for deletion of a set of blocks that may contain cyclic dependencies. The is no correct order of invocations of this function that does not produce dangling pointers on already deleted blocks. This patch introduces a generalized version of this function `DeleteDeadBlocks` that allows us to remove multiple blocks at once, even if there are cycles among them. The only requirement is that no block being deleted should have a predecessor that is not being deleted. The logic of `DeleteDeadBlocks` is following: for each block create relevant DT updates; remove all instructions (replace with undef if needed); replace terminator with unreacheable; apply DT updates; for each block delete block; Therefore, `DeleteDeadBlock` becomes a particular case of the general algorithm called for a single block. Differential Revision: https://reviews.llvm.org/D56120 Reviewed By: skatkov llvm-svn: 351045
* Give helper classes/functions local linkage. NFC.Benjamin Kramer2019-01-124-1/+7
| | | | llvm-svn: 351016
OpenPOWER on IntegriCloud