summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlaneMatt Arsenault2019-07-012-0/+82
| | | | llvm-svn: 364801
* AMDGPU/GlobalISel: Implement select for 32-bit G_ADDTom Stellard2019-07-012-2/+7
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58804 llvm-svn: 364797
* [ARM] Fix MVE_VQxDMLxDH instruction classMikhail Maltsev2019-07-011-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and VQRDMLSDH instructions handle their results as follows: "The base variant writes the results into the lower element of each pair of elements in the destination register, whereas the exchange variant writes to the upper element in each pair". I.e., the initial content of the output register affects the result, as usual, we model this with an additional input. Also, for 32-bit variants Qd is not allowed to be the same register as Qm and Qn, we use @earlyclobber to indicate this. This patch also changes vpred_r to vpred_n because the instructions don't have an explicit 'inactive' operand. Reviewers: dmgreen, ostannard, simon_tatham Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64007 llvm-svn: 364796
* AMDGPU/GlobalISel: Select G_BRCOND for vccMatt Arsenault2019-07-012-25/+44
| | | | llvm-svn: 364795
* [ARM] MVE: support QQPRRegClass and QQQQPRRegClassMikhail Maltsev2019-07-011-2/+3
| | | | | | | | | | | | | | | | | | | Summary: QQPRRegClass and QQQQPRRegClass are used by the interleaving/deinterleaving loads/stores to represent sequences of consecutive SIMD registers. Reviewers: ostannard, simon_tatham, dmgreen Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64009 llvm-svn: 364794
* [InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)Roman Lebedev2019-07-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: To be noted, this pattern is not unhandled by instcombine per-se, it is somehow does end up being folded when one runs opt -O3, but not if it's just -instcombine. Regardless, that fold is indirect, depends on some other folds, and is thus blind when there are extra uses. This does address the regression being exposed in D63992. https://godbolt.org/z/7DGltU https://rise4fun.com/Alive/EPO0 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]] Reviewers: spatel, nikic, huihuiz Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63993 llvm-svn: 364792
* [InstCombine] Shift amount reassociation in bittest (PR42399)Roman Lebedev2019-07-011-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given pattern: `icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0` we should move shifts to the same hand of 'and', i.e. rewrite as `icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)` It might be tempting to not restrict this to situations where we know we'd fold two shifts together, but i'm not sure what rules should there be to avoid endless combine loops. We pick the same shift that was originally used to shift the variable we picked to shift: https://rise4fun.com/Alive/6x1v Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]]. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63829 llvm-svn: 364791
* [Hexagon] Custom-lower UADDO(x, 1) and USUBO(x, 1)Krzysztof Parzyszek2019-07-012-2/+42
| | | | llvm-svn: 364790
* AMDGPU/GlobalISel: Select G_FRAME_INDEXMatt Arsenault2019-07-012-0/+19
| | | | llvm-svn: 364789
* AMDGPU/GFX10: fix scratch resource descriptorNicolai Haehnle2019-07-011-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 llvm-svn: 364788
* AMDGPU/GlobalISel: Make s16 select legalMatt Arsenault2019-07-012-7/+9
| | | | | | | This is easy to handle and avoids legalization artifacts which are likely to obscure combines. llvm-svn: 364787
* AMDGPU/GlobalISel: Select G_BRCOND for scc conditionsMatt Arsenault2019-07-012-0/+34
| | | | llvm-svn: 364786
* AMDGPU/GlobalISel: Tolerate copies with no type setMatt Arsenault2019-07-011-3/+6
| | | | | | | isVCC has the same bug, but isn't used in a context where it can cause a problem. llvm-svn: 364784
* AMDGPU/GlobalISel: Select src modifiersMatt Arsenault2019-07-012-6/+43
| | | | llvm-svn: 364782
* Fixup r364512Diana Picus2019-07-011-10/+12
| | | | | | | | | | Fix stack-use-after-scope errors from r364512. One instance was already fixed in r364611 - this patch simplifies that fix and addresses one more instance of similar code. Discussed in: https://reviews.llvm.org/D63905 llvm-svn: 364778
* [Hexagon] Rework VLCR algorithmKrzysztof Parzyszek2019-07-011-59/+161
| | | | | | | | Add code to catch pattern for commutative instructions for VLCR. Patch by Suyog Sarda. llvm-svn: 364770
* AMDGPU: Convert some places to RegisterMatt Arsenault2019-07-012-9/+10
| | | | llvm-svn: 364769
* AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZEMatt Arsenault2019-07-011-0/+1
| | | | llvm-svn: 364768
* AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTORMatt Arsenault2019-07-011-1/+2
| | | | llvm-svn: 364767
* AMDGPU/GlobalISel: Fail on store to 32-bit address spaceMatt Arsenault2019-07-011-0/+6
| | | | llvm-svn: 364766
* AMDGPU/GlobalISel: Improve icmp selection coverage.Matt Arsenault2019-07-012-13/+38
| | | | | | Select s64 eq/ne scalar icmp. llvm-svn: 364765
* AMDGPU/GlobalISel: RegBankSelect for WWM/WQMMatt Arsenault2019-07-011-0/+2
| | | | llvm-svn: 364763
* AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.voteMatt Arsenault2019-07-011-1/+1
| | | | llvm-svn: 364762
* AMDGPU/GlobalISel: Fix scc->vcc copy handlingMatt Arsenault2019-07-012-13/+23
| | | | | | | | | | | | | This was checking the size of the register with the value of the size, which happens to be exec. Also fix assuming VCC is 64-bit to fix wave32. Also remove some untested handling for physical registers which is skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical copy source. I'm not sure if this should be trying to handle this special case instead of dealing with this in copyPhysReg. llvm-svn: 364761
* AMDGPU/GlobalISel: Use and instead of BFE with inline immediateMatt Arsenault2019-07-011-6/+29
| | | | | | | Zext from s1 is the only case where this should do anything with the current legal extensions. llvm-svn: 364760
* [mips] Add missing schedinfo for MSA and ASE instructionsSimon Atanasyan2019-07-013-2/+12
| | | | llvm-svn: 364757
* [mips] Add missing schedinfo for atomic instructionsSimon Atanasyan2019-07-012-3/+22
| | | | llvm-svn: 364756
* [mips] Add missing schedinfo for ADJCALLSTACKDOWN, ADJCALLSTACKUPSimon Atanasyan2019-07-011-1/+1
| | | | llvm-svn: 364755
* [AMDGPU] Call isLoopExiting for blocks in the loop.Florian Hahn2019-07-011-2/+4
| | | | | | | | | | | | | | | | isLoopExiting should only be called for blocks in the loop. A follow up patch makes this requirement an assertion. I've updated the usage here, to only match for actual exit blocks. Previously, it would also match blocks not in the loop. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D63980 llvm-svn: 364750
* [RISCV] Add break; to the last switch caseFangrui Song2019-07-011-0/+1
| | | | | | As suggested by jrtc27 in the post-commit review of D60528. llvm-svn: 364746
* [X86] CombineShuffleWithExtract - updated description comments. NFCI.Simon Pilgrim2019-07-011-4/+4
| | | | | | CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size. llvm-svn: 364745
* [SelectionDAG] Do minnum->minimum at legalization time instead of building timeBenjamin Kramer2019-07-012-16/+17
| | | | | | | | The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743
* [InstCombine] Omit 'urem' where possibleRoman Lebedev2019-07-011-4/+20
| | | | | | | | This was added in D63390 / rL364286 to backend, but it makes sense to also handle it in middle-end. https://rise4fun.com/Alive/Zsln llvm-svn: 364738
* [DebugInfo] Avoid adding too much indirection to pointer-valued variablesJeremy Morse2019-07-012-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses PR41675, where a stack-pointer variable is dereferenced too many times by its location expression, presenting a value on the stack as the pointer to the stack. The difference between a stack *pointer* DBG_VALUE and one that refers to a value on the stack, is currently the indirect flag. However the DWARF backend will also try to guess whether something is a memory location or not, based on whether there is any computation in the location expression. By simply prepending the stack offset to existing expressions, we can accidentally convert a register location into a memory location, which introduces a suprise (and unintended) dereference. The solution is to add DW_OP_stack_value whenever we add a DIExpression computation to a stack *pointer*. It's an implicit location computed on the expression stack, thus needs to be flagged as a stack_value. For the edge case where the offset is zero and the location could be a register location, DIExpression::prepend will still generate opcodes, and thus DW_OP_stack_value must still be added. Differential Revision: https://reviews.llvm.org/D63429 llvm-svn: 364736
* [SimpleLoopUnswitch] Implement handling of prof branch_weights metadata for ↵Yevgeny Rouban2019-07-011-17/+39
| | | | | | | | SwitchInst Differential Revision: https://reviews.llvm.org/D60606 llvm-svn: 364734
* [ARM] WLS/LE Code GenerationSam Parker2019-07-018-28/+163
| | | | | | | | | | | | | | | | | Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733
* [X86] Improve the type checking fast-isel handling of vector bitcasts.Craig Topper2019-07-011-13/+8
| | | | | | | | | | | | | | | We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729
* [X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 ↵Craig Topper2019-07-012-0/+44
| | | | | | | | | | | | | | CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728
* [X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD.Craig Topper2019-07-012-0/+18
| | | | | | | | | We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726
* [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsicsSanjay Patel2019-06-301-0/+13
| | | | | | | | | | | | This is the opposite direction of D62158 (we have to choose 1 form or the other). Now that we have FMF on the select, this becomes more palatable. And the benefits of having a single IR instruction for this operation (less chances of missing folds based on extra uses, etc) overcome my previous comments about the potential advantage of larger pattern matching/analysis. Differential Revision: https://reviews.llvm.org/D62414 llvm-svn: 364721
* Cleanup: llvm::bsearch -> llvm::partition_point after r364719Fangrui Song2019-06-3013-42/+37
| | | | llvm-svn: 364720
* [X86] Custom lower AVX masked loads to masked load and vselect instead of ↵Craig Topper2019-06-302-16/+29
| | | | | | | | | | | | selecting a maskmov+vblend during isel. AVX masked loads only support 0 as the value for masked off elements. So we need an extra blend to support other values. Previously we expanded the masked load to two instructions with isel patterns. With this patch we now insert the vselect during lowering and it will be separately selected as a blend. llvm-svn: 364718
* [SelectionDAG] Use the memory VT instead of result VT for FoldingSet ↵Craig Topper2019-06-301-3/+2
| | | | | | | | | profiling in getMaskedLoad/getMaskedStore. This matches what is done by the Profile function. Otherwise CSE won't work properly. llvm-svn: 364717
* [LFTR] Rephrase getLoopTest into "based-on" check; NFCINikita Popov2019-06-291-23/+23
| | | | | | | | | What we want to know here is whether we're already using this value for the loop condition, so make the query about that. We can extend this to a more general "based-on" relationship, rather than a direct icmp use later. llvm-svn: 364715
* [InstCombine] canonicalize fmin/fmax to LLVM intrinsics minnum/maxnumSanjay Patel2019-06-291-24/+14
| | | | | | | | | | | | | | This transform came up in D62414, but we should deal with it first. We have LLVM intrinsics that correspond exactly to libm calls (unlike most libm calls, these libm calls never set errno). This holds without any fast-math-flags, so we should always canonicalize to those intrinsics directly for better optimization. Currently, we convert to fcmp+select only when we have FMF (nnan) because fcmp+select does not preserve the semantics of the call in the general case. Differential Revision: https://reviews.llvm.org/D63214 llvm-svn: 364714
* [LFTR] Remove unnecessary latch check; NFCINikita Popov2019-06-291-14/+9
| | | | | | | | | | | The whole indvars pass works on loops in simplified form, so there is always a unique latch. Convert the condition into an assertion in needsLFTR (though we also assert this in later LFTR functions). Additionally update the comment on getLoopTest() now that we are dealing with multiple exits. llvm-svn: 364713
* [InstCombine] Shift amount reassociation (PR42391)Roman Lebedev2019-06-291-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Given pattern: `(x shiftopcode Q) shiftopcode K` we should rewrite it as `x shiftopcode (Q+K)` iff `(Q+K) u< bitwidth(x)` This is valid for any shift, but they must be identical. * https://rise4fun.com/Alive/9E2 * exact on both lshr => exact https://rise4fun.com/Alive/plHk * exact on both ashr => exact https://rise4fun.com/Alive/QDAA * nuw on both shl => nuw https://rise4fun.com/Alive/5Uk * nsw on both shl => nsw https://rise4fun.com/Alive/0plg Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42391 | PR42391]]. Reviewers: spatel, nikic, RKSimon Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63812 llvm-svn: 364712
* [APInt] Fix getBitsNeeded for INT_MIN valuesDmitry Venikov2019-06-291-1/+4
| | | | | | | | | | | | | | | | Summary: This patch fixes behaviour of APInt::getBitsNeeded for INT_MIN 10 bits values. Reviewers: regehr, RKSimon Reviewed By: RKSimon Subscribers: grandinj, dexonsmith, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63691 llvm-svn: 364710
* [LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)Nikita Popov2019-06-291-40/+37
| | | | | | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we have a truncated exit count we'll truncate the IV when comparing against the limit, in which case exit count overflow in post-inc form doesn't matter. However, for pointer IVs we don't do that, so we have to be careful about incrementing the IV in the wide type. I'm fixing this by removing the IVCount variable (which was ExitCount or ExitCount+1) and replacing it with a UsePostInc flag, and then moving the actual limit adjustment to the individual cases (which are: pointer IV where we add to the wide type, integer IV where we add to the narrow type, and constant integer IV where we add to the wide type). Differential Revision: https://reviews.llvm.org/D63686 llvm-svn: 364709
* AMDGPU/GlobalISel: RegBankSelect for update.dppMatt Arsenault2019-06-291-0/+1
| | | | llvm-svn: 364701
OpenPOWER on IntegriCloud