summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU/idot4s.ll
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtrJay Foad2019-12-171-329/+329
| | | | | | | | | | | | | | | Summary: The typo has been present since memOpsHaveSameBasePtr was introduced in r313208. It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than it was supposed to. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71616
* AMDGPU: Fix i16 arithmetic pattern redundancyMatt Arsenault2019-10-081-9/+8
| | | | | | | | | | | | | | There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092
* [TargetLowering] Add SimplifyMultipleUseDemandedBitsSimon Pilgrim2019-07-231-27/+14
| | | | | | | | | | | | | | | | | | This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799
* [AMDGPU] Regenerate idot tests. NFCI.Simon Pilgrim2019-07-111-6/+6
| | | | | | Reduces diff in D63281. llvm-svn: 365754
* [AMDGPU] gfx10 tests. NFC.Stanislav Mekhanoshin2019-06-201-0/+154
| | | | llvm-svn: 363946
* [DAGCombine] Prune unnused nodes.Nirav Dave2019-03-291-87/+46
| | | | | | | | | | | | | | | | | | | Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283
* [AMDGPU] Split idot4/8 signed and unsigned tests. NFC.Stanislav Mekhanoshin2019-02-091-0/+1006
llvm-svn: 353593
OpenPOWER on IntegriCloud