summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-192-41/+106
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Remove modifiers from v_div_scale_*Matt Arsenault2017-01-192-3/+5
| | | | | | | | They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-182-9/+80
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* DAG: Consider nnan in isKnownNeverNaNMatt Arsenault2017-01-181-0/+16
| | | | llvm-svn: 292328
* AMDGPU: Add replacement export intrinsicsMatt Arsenault2017-01-172-0/+627
| | | | llvm-svn: 292205
* ADMGPU/EG,CM: Implement _noret global atomicsJan Vesely2017-01-161-0/+542
| | | | | | | | _RTN versions will be a lot more complicated Differential Revision: https://reviews.llvm.org/D28067 llvm-svn: 292162
* [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)Konstantin Zhuravlyov2017-01-131-0/+262
| | | | | | Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954
* AMDGPU: Skip fneg/select combine if it can fold into otherMatt Arsenault2017-01-122-0/+159
| | | | llvm-svn: 291792
* AMDGPU: Fold free fneg into sinMatt Arsenault2017-01-121-0/+42
| | | | llvm-svn: 291790
* AMDGPU: Fold fneg into fmul_legacyMatt Arsenault2017-01-121-0/+178
| | | | llvm-svn: 291784
* AMDGPU: Fold fneg into rcpMatt Arsenault2017-01-121-0/+100
| | | | llvm-svn: 291779
* AMDGPU: Fold fneg into fp_roundMatt Arsenault2017-01-121-0/+172
| | | | llvm-svn: 291778
* AMDGPU: Fold fneg into fp_extendMatt Arsenault2017-01-121-0/+126
| | | | llvm-svn: 291777
* AMDGPU: Fold fneg into fma or fmadMatt Arsenault2017-01-121-0/+308
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291733
* AMDGPU: Fold fneg into fmulMatt Arsenault2017-01-123-12/+189
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291732
* AMDGPU: Fold fneg into faddMatt Arsenault2017-01-121-0/+179
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291731
* AMDGPU: Pull fneg/fabs out of a selectMatt Arsenault2017-01-112-2/+729
| | | | | | Allows better source modifier usage. llvm-svn: 291729
* AMDGPU: Fix shrinking of addc/subb.Matt Arsenault2017-01-111-0/+292
| | | | | | To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
* AMDGPU: Fix sext_inreg for i1 in i16Matt Arsenault2017-01-111-0/+133
| | | | | | | | This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
* AMDGPU: Fix breaking VOP3 v_add_i32sMatt Arsenault2017-01-111-0/+305
| | | | | | | This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
* AMDGPU: Fix folding immediates into mac src2Matt Arsenault2017-01-111-0/+66
| | | | | | | Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
* Revert "CodeGen: Allow small copyable blocks to "break" the CFG."Kyle Butt2017-01-114-51/+19
| | | | | | | | | This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
* DAGCombiner: Add hasOneUse checks to fadd/fma combineMatt Arsenault2017-01-111-0/+262
| | | | | | | | Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642
* AMDGPU/EG,CM: Add fp16 conversion instructionsJan Vesely2017-01-114-35/+49
| | | | | | Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622
* AMDGPU: Constant fold when immediate is materializedMatt Arsenault2017-01-101-0/+858
| | | | | | In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
* CodeGen: Allow small copyable blocks to "break" the CFG.Kyle Butt2017-01-104-19/+51
| | | | | | | | | | | When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609
* DAG: Avoid OOB when legalizing vector indexingMatt Arsenault2017-01-102-4/+11
| | | | | | | | | If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604
* AMDGPU: Add tests for HasMultipleConditionRegistersMatt Arsenault2017-01-101-0/+161
| | | | | | This was enabled without many specific tests or the comment. llvm-svn: 291586
* AMDGPU: Add Assert[SZ]Ext during argument load creationMatt Arsenault2017-01-091-75/+97
| | | | | | | | | | | For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461
* [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.Bjorn Pettersson2017-01-091-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441
* AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodesJan Vesely2017-01-063-209/+1003
| | | | | | | | This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
* [AMDGPU] Do not emit .AMDGPU.config section for amdhsaKonstantin Zhuravlyov2017-01-062-4/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D27732 llvm-svn: 291245
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-044-41/+202
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* AMDGPU: Invert cmp + select with constantMatt Arsenault2016-12-223-27/+400
| | | | | | | | | | | Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
* AMDGPU: Use i16 for i16 shift amountMatt Arsenault2016-12-221-37/+84
| | | | llvm-svn: 290351
* AMDGPU: Use i16 comparison instructionsMatt Arsenault2016-12-223-2/+392
| | | | llvm-svn: 290348
* AMDGPU: Swap order of operands in fadd/fsub combineMatt Arsenault2016-12-223-12/+12
| | | | | | | FMA is canonicalized to constant in the middle operand. Do the same so fmad matches and avoid an extra combine step. llvm-svn: 290313
* AMDGPU: Check fast math flags in fadd/fsub combinesMatt Arsenault2016-12-221-0/+63
| | | | llvm-svn: 290312
* AMDGPU: Form more FMAs if fusion is allowedMatt Arsenault2016-12-225-820/+1169
| | | | | | | Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311
* AMDGPU: Enable some f32 fadd/fsub combines for f16Matt Arsenault2016-12-223-3/+492
| | | | llvm-svn: 290308
* AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16Matt Arsenault2016-12-221-12/+41
| | | | llvm-svn: 290307
* AMDGPU: setcc test cleanupMatt Arsenault2016-12-222-234/+244
| | | | llvm-svn: 290306
* AMDGPU: Allow rcp and rsq usage with f16Matt Arsenault2016-12-221-10/+180
| | | | llvm-svn: 290302
* AMDGPU: Custom lower f16 fdivMatt Arsenault2016-12-221-16/+28
| | | | llvm-svn: 290301
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-221-0/+172
| | | | llvm-svn: 290300
* AMDGPU: Allow 16-bit types in inline asm constraintsMatt Arsenault2016-12-201-0/+41
| | | | llvm-svn: 290193
* AMDGPU: Run fp combine tests on VIMatt Arsenault2016-12-203-135/+171
| | | | llvm-svn: 290192
* AMDGPU: Don't add same instruction multiple times to worklistMatt Arsenault2016-12-201-0/+14
| | | | | | | | | When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191
* AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*Tom Stellard2016-12-202-3/+17
| | | | | | | | | | Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
* AMDGPU/SI: Add a MachineMemOperand to MIMG instructionsTom Stellard2016-12-201-1/+14
| | | | | | | | | | | | | | | Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
OpenPOWER on IntegriCloud