summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-182-0/+27
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* [AMDGPU] Assembler: fix v_mac_f16 immediatesSam Kolton2017-01-172-10/+18
| | | | | | | | | | Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28802 llvm-svn: 292224
* AMDGPU: Add replacement export intrinsicsMatt Arsenault2017-01-174-20/+80
| | | | llvm-svn: 292205
* AMDGPU: Remove dead patternMatt Arsenault2017-01-171-5/+0
| | | | | | | This is the unsafe conversion pattern, but not guarded by an unsafe math check. It is also already done in LegalizeDAG. llvm-svn: 292173
* ADMGPU/EG,CM: Implement _noret global atomicsJan Vesely2017-01-162-7/+113
| | | | | | | | _RTN versions will be a lot more complicated Differential Revision: https://reviews.llvm.org/D28067 llvm-svn: 292162
* [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)Konstantin Zhuravlyov2017-01-132-0/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954
* Apply clang-tidy's performance-unnecessary-value-param to LLVM.Benjamin Kramer2017-01-133-12/+13
| | | | | | | With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-1310-146/+137
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* AMDGPU: Skip fneg/select combine if it can fold into otherMatt Arsenault2017-01-121-29/+40
| | | | llvm-svn: 291792
* AMDGPU: Fold free fneg into sinMatt Arsenault2017-01-121-1/+5
| | | | llvm-svn: 291790
* AMDGPU: Fold fneg into fmul_legacyMatt Arsenault2017-01-121-2/+5
| | | | llvm-svn: 291784
* AMDGPU: Fold fneg into rcpMatt Arsenault2017-01-121-1/+7
| | | | llvm-svn: 291779
* AMDGPU: Fold fneg into fp_roundMatt Arsenault2017-01-121-2/+18
| | | | llvm-svn: 291778
* AMDGPU: Fold fneg into fp_extendMatt Arsenault2017-01-121-0/+14
| | | | llvm-svn: 291777
* AMDGPU: Fix sub_oneuse being marked commutativeMatt Arsenault2017-01-121-1/+2
| | | | llvm-svn: 291748
* AMDGPU: Fold fneg into fma or fmadMatt Arsenault2017-01-121-0/+24
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291733
* AMDGPU: Fold fneg into fmulMatt Arsenault2017-01-121-0/+17
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291732
* AMDGPU: Fold fneg into faddMatt Arsenault2017-01-122-0/+61
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291731
* AMDGPU: Pull fneg/fabs out of a selectMatt Arsenault2017-01-111-0/+74
| | | | | | Allows better source modifier usage. llvm-svn: 291729
* AMDGPU: Fix shrinking of addc/subb.Matt Arsenault2017-01-111-7/+25
| | | | | | To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
* AMDGPU: Fix sext_inreg for i1 in i16Matt Arsenault2017-01-111-0/+5
| | | | | | | | This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
* AMDGPU: Fix breaking VOP3 v_add_i32sMatt Arsenault2017-01-111-1/+11
| | | | | | | This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
* AMDGPU: Fix folding immediates into mac src2Matt Arsenault2017-01-111-2/+30
| | | | | | | Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
* [AMDGPU] Assembler: SDWA/DPP should not accept scalar registers and ↵Sam Kolton2017-01-115-39/+133
| | | | | | | | | | | | immediate operands Reviewers: artem.tamazov, nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28157 llvm-svn: 291668
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-112-2/+3
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* AMDGPU/EG,CM: Add fp16 conversion instructionsJan Vesely2017-01-111-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622
* AMDGPU: Constant fold when immediate is materializedMatt Arsenault2017-01-101-141/+228
| | | | | | In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
* AMDGPU: Add tests for HasMultipleConditionRegistersMatt Arsenault2017-01-101-0/+7
| | | | | | This was enabled without many specific tests or the comment. llvm-svn: 291586
* AMDGPU: Add Assert[SZ]Ext during argument load creationMatt Arsenault2017-01-092-13/+17
| | | | | | | | | | | For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461
* Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")Matt Arsenault2017-01-091-19/+33
| | | | llvm-svn: 291460
* AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodesJan Vesely2017-01-065-159/+159
| | | | | | | | This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
* [AMDGPU] Remove extra semicolon. NFCKonstantin Zhuravlyov2017-01-061-1/+1
| | | | llvm-svn: 291246
* [AMDGPU] Do not emit .AMDGPU.config section for amdhsaKonstantin Zhuravlyov2017-01-061-4/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D27732 llvm-svn: 291245
* Revert "Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")"Evgeniy Stepanov2017-01-051-33/+19
| | | | | | | | | | | | | | | | | | | Summary: This reverts commit r291144. It breaks build bots. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/3270, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2058 lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:1638:12: error: could not convert ‘(const unsigned int*)(& Variants)’ from ‘const unsigned int*’ to ‘llvm::ArrayRef<unsigned int>’ return Variants; Reviewers: eugenis, tstellarAMD Patch by Alex Shlyapnikov. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D28372 llvm-svn: 291168
* Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")Matt Arsenault2017-01-051-19/+33
| | | | | | Arrays are supposed to be static const llvm-svn: 291144
* Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")Richard Smith2017-01-051-22/+18
| | | | | | | This caused buildbot failures due to returning ArrayRefs referencing local (temporary) objects. llvm-svn: 291067
* AMDGPU: Remove unneccessary intermediate vectorMatt Arsenault2017-01-041-18/+22
| | | | llvm-svn: 291025
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-046-4/+21
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directiveArtem Tamazov2016-12-291-12/+17
| | | | | | | | | | | Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-233-22/+52
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* Enable '-Wstring-conversion' and fix some bad asserts that it helpedChandler Carruth2016-12-231-1/+1
| | | | | | | | find. Notable is the assert in NewGVN which had no effect because of the bug. llvm-svn: 290400
* AMDGPU: Invert cmp + select with constantMatt Arsenault2016-12-221-0/+19
| | | | | | | | | | | Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372
* AMDGPU: Use i16 for i16 shift amountMatt Arsenault2016-12-222-8/+10
| | | | llvm-svn: 290351
* AMDGPU: Fix missing 16-bit cmpx instructionsMatt Arsenault2016-12-221-0/+39
| | | | llvm-svn: 290349
* AMDGPU: Use i16 comparison instructionsMatt Arsenault2016-12-222-5/+43
| | | | llvm-svn: 290348
* AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assertMatt Arsenault2016-12-221-17/+4
| | | | | | | | Caused by dereferencing end iterator when trying to const cast the iterator. Patch by Martin Sherburn llvm-svn: 290347
* [AMDGPU] Add pseudo SDWA instructionsSam Kolton2016-12-225-85/+159
| | | | | | | | | | | | Summary: This is needed for later SDWA support in CodeGen. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27412 llvm-svn: 290338
* [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwaSam Kolton2016-12-224-5/+26
| | | | | | | | | | | | Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands. Reviewers: nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27847 llvm-svn: 290336
OpenPOWER on IntegriCloud