summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Implemented fma cost analysisStanislav Mekhanoshin2019-12-181-0/+10
| | | | Differential Revision: https://reviews.llvm.org/D71676
* [ARM] Teach the Arm cost model that a Shift can be folded into other ↵David Green2019-12-091-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-2/+4
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* [AMDGPU] Tune inlining parameters for AMDGPU target (part 2)dfukalov2019-11-191-1/+1
| | | | | | | | | | | | | | | | | Summary: Most of IR instructions got better code size estimations after commit 47a5c36b. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391
* [AMDGPU] Improve code size cost modelDaniil Fukalov2019-10-171-1/+2
| | | | | | | | | | | | | | | | | | | Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109
* [System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLIVitaly Buka2019-10-091-2/+10
| | | | | | To fix "infinite recursion" warning. llvm-svn: 374222
* InferAddressSpaces: Move target intrinsic handling to TTIMatt Arsenault2019-08-141-0/+5
| | | | | | | | I'm planning on handling intrinsics that will benefit from checking the address space enums. Don't bother moving the address collection for now, since those won't need th enums. llvm-svn: 368895
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-171-1/+3
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* AMDGPU: Ignore subtarget for InferAddressSpacesMatt Arsenault2019-06-171-2/+1
| | | | | | | Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-031-0/+4
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Remove debugger related subtarget featuresMatt Arsenault2019-02-211-2/+0
| | | | | | As far as I know these aren't needed anymore. llvm-svn: 354634
* AMDGPU: Ignore CodeObjectV3 when inliningMatt Arsenault2019-02-121-0/+1
| | | | | | | | | | This was inhibiting inlining of library functions when clang was invoking the inliner directly. This is covering a bit of a mess with subtarget feature handling, and this shouldn't be a subtarget feature. The behavior is different depending on whether you are using a -mattr flag in clang, or llc, opt. llvm-svn: 353899
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-1/+1
| | | | llvm-svn: 341165
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-11/+6
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-211-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTITom Stellard2018-05-301-1/+65
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* [AMDGPU] Support horizontal vectorization of min/max.Farhana Aleen2018-05-091-0/+3
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920
* [AMDGPU] Support horizontal vectorization.Farhana Aleen2018-05-011-0/+4
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313
* [AMDGPU] Increased vector length for global/constant loads.Farhana Aleen2018-03-071-0/+6
| | | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910
* Revert "[AMDGPU] Widened vector length for global/constant address space."Farhana Aleen2018-03-071-6/+0
| | | | | | This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907
* [AMDGPU] Widened vector length for global/constant address space.Farhana Aleen2018-03-071-0/+6
| | | | llvm-svn: 326904
* Revert "[AMDGPU] Increased vector length for global/constant loads."Konstantin Zhuravlyov2018-02-201-6/+0
| | | | | | | | | | https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643
* [AMDGPU] Increased vector length for global/constant loads.Mark Searles2018-02-191-0/+6
| | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518
* LSR: Check more intrinsic pointer operandsMatt Arsenault2017-12-111-0/+2
| | | | llvm-svn: 320424
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-201-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-8/+20
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310429
* AMDGPU: Use a custom areInlineCompatibleMatt Arsenault2017-08-071-0/+29
| | | | | | | Fixes not inlining OpenCL library functions on AMDGPU, which don't have an explicitly set target-cpu. llvm-svn: 310269
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-1/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* AMDGPU: Allow vectorization of packed typesMatt Arsenault2017-06-201-2/+4
| | | | llvm-svn: 305844
* DivergencyAnalysis patch for reviewAlexander Timofeev2017-06-151-0/+1
| | | | llvm-svn: 305494
* Const correctness for TTI::getRegisterBitWidthDaniel Neilson2017-06-121-1/+1
| | | | | | | | | | | | | | Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189
* AMDGPU: Make some packed shuffles freeMatt Arsenault2017-05-101-0/+3
| | | | | | | VOP3P instructions can encode access to either half of the register. llvm-svn: 302730
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-1/+1
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-231-0/+11
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* AMDGPU: Fix warningMatt Arsenault2017-01-311-1/+2
| | | | llvm-svn: 293717
* AMDGPU: Implement hook for InferAddressSpacesMatt Arsenault2017-01-311-1/+11
| | | | | | | | | | For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-1/+2
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* Add new target hooks for LoadStoreVectorizerVolkan Keles2016-10-031-1/+1
| | | | | | | | | | | | Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-0/+2
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* AMDGPU: Implement getLoadStoreVecRegBitWidthMatt Arsenault2016-07-011-0/+1
| | | | llvm-svn: 274312
* AMDGPU: Implement per-function subtargetsMatt Arsenault2016-06-271-3/+4
| | | | llvm-svn: 273940
* AMDGPU: Other sizes of popcnt are fastMatt Arsenault2016-05-181-1/+1
| | | | | | | We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950
* AMDGPU: Partially implement getArithmeticInstrCost for FP opsMatt Arsenault2016-03-251-1/+30
| | | | llvm-svn: 264374
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-111-3/+3
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU: Override getCFInstrCostMatt Arsenault2015-12-161-0/+2
| | | | | | The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796
* AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence()Tom Stellard2015-12-151-0/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 llvm-svn: 255661
OpenPOWER on IntegriCloud