summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove and autoupgrade the scalar fma intrinsics with masking.Craig Topper2018-07-121-37/+0
| | | | | | This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871
* [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵Craig Topper2018-07-051-2/+0
| | | | | | independent FMA and extractelement/insertelement. llvm-svn: 336315
* Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 ↵Simon Pilgrim2018-06-251-1/+1
| | | | | | bits" MSVC warning (again). NFCI. llvm-svn: 335457
* Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 ↵Simon Pilgrim2018-06-251-1/+1
| | | | | | bits" MSVC warning. NFCI. llvm-svn: 335454
* AMDGPU: Remove old-style image intrinsicsNicolai Haehnle2018-06-211-51/+1
| | | | | | | | | | | | | | | | | | | | Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
* InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemandedNicolai Haehnle2018-06-211-71/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Use the expanded features of the TableGen generic tables to avoid manually adding the combinatorially exploded set of intrinsics. The getAMDGPUImageDimIntrinsic lookup function is early-out, i.e. non-AMDGPU intrinsics will never look at the underlying table. Use a generic approach for getting the new intrinsic overload to keep the code simple, and make the image dmask handling more generic: - handle non-sampler image loads - handle the case where the set of demanded elements is not a prefix There is some overlap between this code and an optimization that happens in the backend during code generation. They currently complement each other: - only the codegen optimization can generate vec3 loads - only the InstCombine optimization can handle D16 The InstCombine optimization also likely covers more cases since the codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization in codegen once the infrastructure for vec3 is in place (which will probably take a long time). Modify the test cases to use dimension-aware intrinsics. This makes it easier to see that the test coverage for the new intrinsics is equivalent, and the old style intrinsics will be removed in a follow-up commit anyway. Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad Reviewers: arsenm, rampitec, majnemer Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48165 llvm-svn: 335230
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-2/+0
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer ↵Craig Topper2018-05-111-6/+0
| | | | | | used by clang. llvm-svn: 332146
* [InstCombine] Only propagate known leading zeros from udiv input to output.Benjamin Kramer2018-05-101-2/+7
| | | | | | | | | Put in a conservatively correct estimate for now. Avoids miscompiling clang in FDO mode. This is really tricky to trigger in reality as basically all interesting cases will be folded away by computeKnownBits earlier, I was unable to find a reasonably small test case. llvm-svn: 331975
* [InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low ↵Benjamin Kramer2018-05-091-0/+16
| | | | | | | | | | | dividend bits that are zero in the divisor This is safe as long as the udiv is not exact. The pattern is not common in C++ code, but comes up all the time in code generated by XLA's GPU backend. Differential Revision: https://reviews.llvm.org/D46647 llvm-svn: 331933
* [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.Craig Topper2018-04-131-29/+0
| | | | | | | | This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990
* Remove useless comment - seems to be a copy+paste typo. NFCISimon Pilgrim2018-02-161-1/+0
| | | | llvm-svn: 325385
* [InstCombine] fix demanded-bits propagation for zext/truncSanjay Patel2018-01-171-1/+1
| | | | | | | | | I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. llvm-svn: 322662
* [InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515)Simon Pilgrim2017-12-091-6/+5
| | | | | | Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.) llvm-svn: 320270
* [InstCombine] improve demanded vector elements analysis of insertelementSanjay Patel2017-08-311-9/+10
| | | | | | | | | | | | | Recurse instead of returning on the first found optimization. Also, return early in the caller instead of continuing because that allows another round of simplification before we might potentially lose undef information from a shuffle mask by eliminating the shuffle. As noted in the review, we could probably do better and be more efficient by moving all of demanded elements into a separate pass, but this is yet another quick fix to instcombine. Differential Revision: https://reviews.llvm.org/D37236 llvm-svn: 312248
* [InstCombine] Call hasNoSignedWrap instead of hasNoUnsignedWrap to get the ↵Craig Topper2017-08-281-1/+1
| | | | | | | | | | NSW flag when handling Add in SimplifyDemandedUseBits. This is a typo from r311789. This should fix PR34349. llvm-svn: 311902
* [InstCombine] Don't fall back to only calling computeKnownBits if the upper ↵Craig Topper2017-08-251-23/+24
| | | | | | | | | | | | | | | | bit of Add/Sub is demanded. Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing. The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags. Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS. My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS. Differential Revision: https://reviews.llvm.org/D36486 llvm-svn: 311789
* [InstCombine] Consider more cases where SimplifyDemandedUseBits does not ↵Amjad Aboud2017-08-251-2/+5
| | | | | | | | | | convert AShr to LShr. There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit. Differential Revision: https://reviews.llvm.org/D36936 llvm-svn: 311773
* [InstCombine] Remove unnecessary temporary APInt. NFCICraig Topper2017-08-021-6/+1
| | | | llvm-svn: 309887
* [InstCombine] Remove explicit check for impossible condition. Replace with ↵Craig Topper2017-08-011-1/+2
| | | | | | | | | | | | | | | | | | | assert Summary: As far as I can tell the earlier call getLimitedValue will guaranteed ShiftAmt is saturated to BitWidth-1 preventing it from ever being equal or greater than BitWidth. At one point in the past the getLimitedValue call was only passed BitWidth not BitWidth - 1. This would have allowed the equality case to get here. And in fact this check was initially added as just BitWidth == ShiftAmt, but was changed shortly after to include > which should have never been possible. Reviewers: spatel, majnemer, davide Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36123 llvm-svn: 309690
* [InstCombine] Move (0 - x) & 1 --> x & 1 to SimplifyDemandedUseBits.Craig Topper2017-07-161-2/+4
| | | | | | This removes a dedicated matcher and allows us to support more than just an AND masking the lower bit. llvm-svn: 308124
* [InstCombine] Make InstCombine's IRBuilder be passed by reference everywhereCraig Topper2017-07-071-6/+6
| | | | | | | | Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference. This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do. llvm-svn: 307451
* [Constants] If we already have a ConstantInt*, prefer to use ↵Craig Topper2017-07-061-1/+1
| | | | | | | | isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
* [InstCombine][InstSimplify] Use APInt::isNullValue/isOneValue to reduce ↵Craig Topper2017-06-071-4/+4
| | | | | | | | compiled code for comparing APInts with 0 and 1. NFC These methods are specifically optimized to only counting leading zeros without an additional uint64_t compare. llvm-svn: 304876
* [InstCombine] Merge together the SimplifyDemandedUseBits implementations for ↵Craig Topper2017-05-241-21/+10
| | | | | | | | | | ZExt and Trunc. NFC While there avoid resizing the DemandedMask twice. Make a copy into a separate variable instead. This potentially removes an allocation on large bit widths. With the use of the zextOrTrunc methods on APInt and KnownBits these can be made almost source identical. The only difference is the zero of the upper bits for ZExt. This is similar to how its done in computeKnownBits in ValueTracking. llvm-svn: 303791
* [InstCombine] Use less bitwise operations to handle Instruction::SExt in ↵Craig Topper2017-05-241-19/+14
| | | | | | | | | | | | SimplifyDemandedUseBits. Other improvements. The current code created a NewBits mask and used it as a mask several times. One of them just before a call to trunc making it unnecessary. A call to getActiveBits can get us the same information for the case. We also ORed with this mask later when we should have just sign extended the known bits. We also called trunc on the guaranteed to be zero KnownZeros/Ones masks entering this code. Creating appropriately sized temporary APInts is probably better. Differential Revision: https://reviews.llvm.org/D32098 llvm-svn: 303779
* [KnownBits] Use !hasConflict() in asserts in place of Zero & One == 0 or ↵Craig Topper2017-05-231-16/+16
| | | | | | similar. NFC llvm-svn: 303614
* [KnownBits] Add bit counting methods to KnownBits struct and use them where ↵Craig Topper2017-05-121-1/+1
| | | | | | | | | | | | possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
* [KnownBits] Add wrapper methods for setting and clear all bits in the ↵Craig Topper2017-05-051-2/+1
| | | | | | | | | | underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262
* [KnownBits] Add zext, sext, and trunc methods to KnownBitsCraig Topper2017-05-031-12/+6
| | | | | | | | This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible. Differential Revision: https://reviews.llvm.org/D32784 llvm-svn: 302088
* [KnownBits] Add methods for determining if the known bits represent a ↵Craig Topper2017-04-291-4/+4
| | | | | | | | | | | | | | | | negative/nonnegative number and add methods for changing the negative/nonnegative state Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit. Reviewers: RKSimon, spatel, davide Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32651 llvm-svn: 301747
* [APInt] Use inplace shift methods where possible. NFCICraig Topper2017-04-281-1/+1
| | | | llvm-svn: 301612
* [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for ↵Craig Topper2017-04-261-205/+181
| | | | | | | | | | | | | | | | computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432
* [APInt] Use isSubsetOf, intersects, and bit counting methods to reduce ↵Craig Topper2017-04-251-2/+2
| | | | | | | | | | | | | | temporary APInts This patch uses various APInt methods to reduce temporary APInt creation. This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking. I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time. Differential Revision: https://reviews.llvm.org/D32495 llvm-svn: 301338
* [InstCombine] Remove superfluous curly braces around a single line if body. NFCCraig Topper2017-04-251-2/+1
| | | | llvm-svn: 301326
* Revert "[APInt] Fix a few places that use APInt::getRawData to operate ↵Renato Golin2017-04-231-1/+1
| | | | | | | | | | | | | | | | within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111
* [APInt] Use operator<<= instead of shl where possible. NFCCraig Topper2017-04-231-1/+1
| | | | llvm-svn: 301103
* [InstCombine] revert r300977 and r301021Sanjay Patel2017-04-211-14/+4
| | | | | | This can cause an inf-loop. Investigating... llvm-svn: 301035
* [InstCombine] use isSubsetOf() for efficiencySanjay Patel2017-04-211-1/+1
| | | | | | | | | | C | ~D == -1 ~(C | ~D) == 0 ~C & D == 0 D & ~C == 0 D.isSubsetOf(C) llvm-svn: 301021
* [InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706)Sanjay Patel2017-04-211-4/+14
| | | | | | | | | This matches the demanded bits behavior in the DAG and should fix: https://bugs.llvm.org/show_bug.cgi?id=32706 Differential Revision: https://reviews.llvm.org/D32255 llvm-svn: 300977
* [InstCombine] Remove the zextOrTrunc from ShrinkDemandedConstant.Craig Topper2017-04-201-4/+2
| | | | | | | | The demanded mask and the constant should always be the same width for all callers today. Also stop copying the demanded mask as its passed in. We should avoid allocating memory unless we are going to do something. The final AND to create the new constant will take care of it. llvm-svn: 300927
* [InstCombine] function names start with lower-case letter; NFCSanjay Patel2017-04-201-2/+2
| | | | | | Forgot to make this fix with the signature change in r300911. llvm-svn: 300912
* [InstCombine] allow shl+shr demanded bits folds with splat constantsSanjay Patel2017-04-201-19/+13
| | | | llvm-svn: 300911
* [InstCombine] allow shl demanded bits folds with splat constantsSanjay Patel2017-04-201-2/+4
| | | | | | More fixes are needed to enable the helper SimplifyShrShlDemandedBits(). llvm-svn: 300898
* [InstCombine] Use APInt::intersects and APInt::isSubsetOf to improve a few ↵Craig Topper2017-04-201-4/+4
| | | | | | more places in SimplifyDemandedBits. llvm-svn: 300896
* [InstCombine] allow ashr/lshr demanded bits folds with splat constantsSanjay Patel2017-04-201-11/+14
| | | | llvm-svn: 300888
* [InstCombine] Use APInt::isSubsetOf to simplify some code in ↵Craig Topper2017-04-201-37/+27
| | | | | | | | SimplifyDemandedBits. NFC This allows us to use less temporary APInt for And and Invert operations. llvm-svn: 300885
* [InstCombine] Remove redundant code from SimplifyDemandedBits handling for ↵Craig Topper2017-04-201-18/+0
| | | | | | Or. The code above it is equivalent if you work through the bitwise math. llvm-svn: 300876
* [APInt] Rename getSignBit to getSignMaskCraig Topper2017-04-201-5/+5
| | | | | | | | getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856
* [APInt] Add isSubsetOf method that can check if one APInt is a subset of ↵Craig Topper2017-04-201-1/+1
| | | | | | | | | | | | | | another without creating temporary APInts This question comes up in many places in SimplifyDemandedBits. This makes it easy to ask without allocating additional temporary APInts. The BitVector class provides a similar functionality through its (IMHO badly named) test(const BitVector&) method. Though its output polarity is reversed. I've provided one example use case in this patch. I plan to do more as a follow up. Differential Revision: https://reviews.llvm.org/D32258 llvm-svn: 300851
OpenPOWER on IntegriCloud