summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine][AMDGPU] Handle more buffer intrinsicsPiotr Sobczak2018-12-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 llvm-svn: 349735
* Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"David Stuttard2018-11-291-16/+2
| | | | | | | | | Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911
* Add support for TFE/LWE in image intrinsicsDavid Stuttard2018-11-291-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
* [InstCombine] Determine demanded and known bits for funnel shiftsNikita Popov2018-11-241-0/+24
| | | | | | | | | | | | | | Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515
* [InstCombine] Demand bits of UMinDavid Green2018-10-111-0/+10
| | | | | | | | | | This is the umin alternative to the umax code from rL344237. We use DeMorgans law on the umax case to bring us to the same thing on umin, but using countLeadingOnes, not countLeadingZeros. Differential Revision: https://reviews.llvm.org/D53036 llvm-svn: 344239
* [InstCombine] Demand bits of UMaxDavid Green2018-10-111-4/+16
| | | | | | | | | Use the demanded bits of umax(A,C) to prove we can just use A so long as the lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C Differential Revision: https://reviews.llvm.org/D53033 llvm-svn: 344237
* [InstCombine] drop poison flags in SimplifyVectorDemandedEltsSanjay Patel2018-10-041-2/+5
| | | | | | | | | | | | | | | We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. llvm-svn: 343819
* [InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCISanjay Patel2018-10-041-91/+42
| | | | llvm-svn: 343806
* [InstCombine] allow SimplifyDemandedVectorElts to work with FP binopsSanjay Patel2018-10-031-18/+20
| | | | | | | | | | | | | | | | | We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727
* [InstCombine] enhance vector demanded elements to look at a vector select ↵Sanjay Patel2018-09-111-2/+21
| | | | | | | | | | | | | | | condition operand I noticed that we were not back-propagating undef lanes to shuffle masks when we have a shuffle that reduces the vector width. This is part of investigating/solving PR38691: https://bugs.llvm.org/show_bug.cgi?id=38691 The DAG equivalent was proposed with: D51696 Differential Revision: https://reviews.llvm.org/D51433 llvm-svn: 341981
* [InstCombine] use SelectInst operand names to make code clearer; NFCSanjay Patel2018-09-101-8/+10
| | | | | | Cleanup step for D51433. llvm-svn: 341850
* [InstCombine] fix formatting in SimplifyDemandedVectorElts->Select; NFCISanjay Patel2018-09-061-12/+16
| | | | | | | | I'm preparing to add the same functionality both here and to the DAG version of this code in D51696 / D51433, so try to make those cases as similar as possible to avoid bugs. llvm-svn: 341545
* [X86] Remove and autoupgrade the scalar fma intrinsics with masking.Craig Topper2018-07-121-37/+0
| | | | | | This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871
* [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵Craig Topper2018-07-051-2/+0
| | | | | | independent FMA and extractelement/insertelement. llvm-svn: 336315
* Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 ↵Simon Pilgrim2018-06-251-1/+1
| | | | | | bits" MSVC warning (again). NFCI. llvm-svn: 335457
* Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 ↵Simon Pilgrim2018-06-251-1/+1
| | | | | | bits" MSVC warning. NFCI. llvm-svn: 335454
* AMDGPU: Remove old-style image intrinsicsNicolai Haehnle2018-06-211-51/+1
| | | | | | | | | | | | | | | | | | | | Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231
* InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemandedNicolai Haehnle2018-06-211-71/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Use the expanded features of the TableGen generic tables to avoid manually adding the combinatorially exploded set of intrinsics. The getAMDGPUImageDimIntrinsic lookup function is early-out, i.e. non-AMDGPU intrinsics will never look at the underlying table. Use a generic approach for getting the new intrinsic overload to keep the code simple, and make the image dmask handling more generic: - handle non-sampler image loads - handle the case where the set of demanded elements is not a prefix There is some overlap between this code and an optimization that happens in the backend during code generation. They currently complement each other: - only the codegen optimization can generate vec3 loads - only the InstCombine optimization can handle D16 The InstCombine optimization also likely covers more cases since the codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization in codegen once the infrastructure for vec3 is in place (which will probably take a long time). Modify the test cases to use dimension-aware intrinsics. This makes it easier to see that the test coverage for the new intrinsics is equivalent, and the old style intrinsics will be removed in a follow-up commit anyway. Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad Reviewers: arsenm, rampitec, majnemer Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48165 llvm-svn: 335230
* [X86] Lowering sqrt intrinsics to native IRTomasz Krupa2018-06-151-2/+0
| | | | | | | | | | | | | | Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849
* [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer ↵Craig Topper2018-05-111-6/+0
| | | | | | used by clang. llvm-svn: 332146
* [InstCombine] Only propagate known leading zeros from udiv input to output.Benjamin Kramer2018-05-101-2/+7
| | | | | | | | | Put in a conservatively correct estimate for now. Avoids miscompiling clang in FDO mode. This is really tricky to trigger in reality as basically all interesting cases will be folded away by computeKnownBits earlier, I was unable to find a reasonably small test case. llvm-svn: 331975
* [InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low ↵Benjamin Kramer2018-05-091-0/+16
| | | | | | | | | | | dividend bits that are zero in the divisor This is safe as long as the udiv is not exact. The pattern is not common in C++ code, but comes up all the time in code generated by XLA's GPU backend. Differential Revision: https://reviews.llvm.org/D46647 llvm-svn: 331933
* [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.Craig Topper2018-04-131-29/+0
| | | | | | | | This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990
* Remove useless comment - seems to be a copy+paste typo. NFCISimon Pilgrim2018-02-161-1/+0
| | | | llvm-svn: 325385
* [InstCombine] fix demanded-bits propagation for zext/truncSanjay Patel2018-01-171-1/+1
| | | | | | | | | I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. llvm-svn: 322662
* [InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515)Simon Pilgrim2017-12-091-6/+5
| | | | | | Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.) llvm-svn: 320270
* [InstCombine] improve demanded vector elements analysis of insertelementSanjay Patel2017-08-311-9/+10
| | | | | | | | | | | | | Recurse instead of returning on the first found optimization. Also, return early in the caller instead of continuing because that allows another round of simplification before we might potentially lose undef information from a shuffle mask by eliminating the shuffle. As noted in the review, we could probably do better and be more efficient by moving all of demanded elements into a separate pass, but this is yet another quick fix to instcombine. Differential Revision: https://reviews.llvm.org/D37236 llvm-svn: 312248
* [InstCombine] Call hasNoSignedWrap instead of hasNoUnsignedWrap to get the ↵Craig Topper2017-08-281-1/+1
| | | | | | | | | | NSW flag when handling Add in SimplifyDemandedUseBits. This is a typo from r311789. This should fix PR34349. llvm-svn: 311902
* [InstCombine] Don't fall back to only calling computeKnownBits if the upper ↵Craig Topper2017-08-251-23/+24
| | | | | | | | | | | | | | | | bit of Add/Sub is demanded. Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing. The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags. Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS. My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS. Differential Revision: https://reviews.llvm.org/D36486 llvm-svn: 311789
* [InstCombine] Consider more cases where SimplifyDemandedUseBits does not ↵Amjad Aboud2017-08-251-2/+5
| | | | | | | | | | convert AShr to LShr. There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit. Differential Revision: https://reviews.llvm.org/D36936 llvm-svn: 311773
* [InstCombine] Remove unnecessary temporary APInt. NFCICraig Topper2017-08-021-6/+1
| | | | llvm-svn: 309887
* [InstCombine] Remove explicit check for impossible condition. Replace with ↵Craig Topper2017-08-011-1/+2
| | | | | | | | | | | | | | | | | | | assert Summary: As far as I can tell the earlier call getLimitedValue will guaranteed ShiftAmt is saturated to BitWidth-1 preventing it from ever being equal or greater than BitWidth. At one point in the past the getLimitedValue call was only passed BitWidth not BitWidth - 1. This would have allowed the equality case to get here. And in fact this check was initially added as just BitWidth == ShiftAmt, but was changed shortly after to include > which should have never been possible. Reviewers: spatel, majnemer, davide Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36123 llvm-svn: 309690
* [InstCombine] Move (0 - x) & 1 --> x & 1 to SimplifyDemandedUseBits.Craig Topper2017-07-161-2/+4
| | | | | | This removes a dedicated matcher and allows us to support more than just an AND masking the lower bit. llvm-svn: 308124
* [InstCombine] Make InstCombine's IRBuilder be passed by reference everywhereCraig Topper2017-07-071-6/+6
| | | | | | | | Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference. This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do. llvm-svn: 307451
* [Constants] If we already have a ConstantInt*, prefer to use ↵Craig Topper2017-07-061-1/+1
| | | | | | | | isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
* [InstCombine][InstSimplify] Use APInt::isNullValue/isOneValue to reduce ↵Craig Topper2017-06-071-4/+4
| | | | | | | | compiled code for comparing APInts with 0 and 1. NFC These methods are specifically optimized to only counting leading zeros without an additional uint64_t compare. llvm-svn: 304876
* [InstCombine] Merge together the SimplifyDemandedUseBits implementations for ↵Craig Topper2017-05-241-21/+10
| | | | | | | | | | ZExt and Trunc. NFC While there avoid resizing the DemandedMask twice. Make a copy into a separate variable instead. This potentially removes an allocation on large bit widths. With the use of the zextOrTrunc methods on APInt and KnownBits these can be made almost source identical. The only difference is the zero of the upper bits for ZExt. This is similar to how its done in computeKnownBits in ValueTracking. llvm-svn: 303791
* [InstCombine] Use less bitwise operations to handle Instruction::SExt in ↵Craig Topper2017-05-241-19/+14
| | | | | | | | | | | | SimplifyDemandedUseBits. Other improvements. The current code created a NewBits mask and used it as a mask several times. One of them just before a call to trunc making it unnecessary. A call to getActiveBits can get us the same information for the case. We also ORed with this mask later when we should have just sign extended the known bits. We also called trunc on the guaranteed to be zero KnownZeros/Ones masks entering this code. Creating appropriately sized temporary APInts is probably better. Differential Revision: https://reviews.llvm.org/D32098 llvm-svn: 303779
* [KnownBits] Use !hasConflict() in asserts in place of Zero & One == 0 or ↵Craig Topper2017-05-231-16/+16
| | | | | | similar. NFC llvm-svn: 303614
* [KnownBits] Add bit counting methods to KnownBits struct and use them where ↵Craig Topper2017-05-121-1/+1
| | | | | | | | | | | | possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
* [KnownBits] Add wrapper methods for setting and clear all bits in the ↵Craig Topper2017-05-051-2/+1
| | | | | | | | | | underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262
* [KnownBits] Add zext, sext, and trunc methods to KnownBitsCraig Topper2017-05-031-12/+6
| | | | | | | | This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible. Differential Revision: https://reviews.llvm.org/D32784 llvm-svn: 302088
* [KnownBits] Add methods for determining if the known bits represent a ↵Craig Topper2017-04-291-4/+4
| | | | | | | | | | | | | | | | negative/nonnegative number and add methods for changing the negative/nonnegative state Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit. Reviewers: RKSimon, spatel, davide Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32651 llvm-svn: 301747
* [APInt] Use inplace shift methods where possible. NFCICraig Topper2017-04-281-1/+1
| | | | llvm-svn: 301612
* [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for ↵Craig Topper2017-04-261-205/+181
| | | | | | | | | | | | | | | | computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432
* [APInt] Use isSubsetOf, intersects, and bit counting methods to reduce ↵Craig Topper2017-04-251-2/+2
| | | | | | | | | | | | | | temporary APInts This patch uses various APInt methods to reduce temporary APInt creation. This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking. I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time. Differential Revision: https://reviews.llvm.org/D32495 llvm-svn: 301338
* [InstCombine] Remove superfluous curly braces around a single line if body. NFCCraig Topper2017-04-251-2/+1
| | | | llvm-svn: 301326
* Revert "[APInt] Fix a few places that use APInt::getRawData to operate ↵Renato Golin2017-04-231-1/+1
| | | | | | | | | | | | | | | | within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111
* [APInt] Use operator<<= instead of shl where possible. NFCCraig Topper2017-04-231-1/+1
| | | | llvm-svn: 301103
* [InstCombine] revert r300977 and r301021Sanjay Patel2017-04-211-14/+4
| | | | | | This can cause an inf-loop. Investigating... llvm-svn: 301035
OpenPOWER on IntegriCloud