summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-1/+1
| | | | llvm-svn: 341165
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-11/+6
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* AMDGPU: Remove ability to reserve VGPRs for debuggerKonstantin Zhuravlyov2018-06-211-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288
* AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTITom Stellard2018-05-301-1/+65
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* [AMDGPU] Support horizontal vectorization of min/max.Farhana Aleen2018-05-091-0/+3
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920
* [AMDGPU] Support horizontal vectorization.Farhana Aleen2018-05-011-0/+4
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313
* [AMDGPU] Increased vector length for global/constant loads.Farhana Aleen2018-03-071-0/+6
| | | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910
* Revert "[AMDGPU] Widened vector length for global/constant address space."Farhana Aleen2018-03-071-6/+0
| | | | | | This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907
* [AMDGPU] Widened vector length for global/constant address space.Farhana Aleen2018-03-071-0/+6
| | | | llvm-svn: 326904
* Revert "[AMDGPU] Increased vector length for global/constant loads."Konstantin Zhuravlyov2018-02-201-6/+0
| | | | | | | | | | https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643
* [AMDGPU] Increased vector length for global/constant loads.Mark Searles2018-02-191-0/+6
| | | | | | | | | | | | | | Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518
* LSR: Check more intrinsic pointer operandsMatt Arsenault2017-12-111-0/+2
| | | | llvm-svn: 320424
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-201-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-08-081-8/+20
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 310429
* AMDGPU: Use a custom areInlineCompatibleMatt Arsenault2017-08-071-0/+29
| | | | | | | Fixes not inlining OpenCL library functions on AMDGPU, which don't have an explicitly set target-cpu. llvm-svn: 310269
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-1/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* AMDGPU: Allow vectorization of packed typesMatt Arsenault2017-06-201-2/+4
| | | | llvm-svn: 305844
* DivergencyAnalysis patch for reviewAlexander Timofeev2017-06-151-0/+1
| | | | llvm-svn: 305494
* Const correctness for TTI::getRegisterBitWidthDaniel Neilson2017-06-121-1/+1
| | | | | | | | | | | | | | Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189
* AMDGPU: Make some packed shuffles freeMatt Arsenault2017-05-101-0/+3
| | | | | | | VOP3P instructions can encode access to either half of the register. llvm-svn: 302730
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-1/+1
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-231-0/+11
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* AMDGPU: Fix warningMatt Arsenault2017-01-311-1/+2
| | | | llvm-svn: 293717
* AMDGPU: Implement hook for InferAddressSpacesMatt Arsenault2017-01-311-1/+11
| | | | | | | | | | For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580
* [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.Mohammed Agabaria2017-01-111-1/+2
| | | | | | | | | | | | updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
* Do a sweep over move ctors and remove those that are identical to the default.Benjamin Kramer2016-10-201-7/+0
| | | | | | | | | | All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721
* Add new target hooks for LoadStoreVectorizerVolkan Keles2016-10-031-1/+1
| | | | | | | | | | | | Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099
* [TTI] The cost model should not assume vector casts get completely scalarizedMichael Kuperstein2016-07-061-0/+2
| | | | | | | | | | | | | | | | The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
* AMDGPU: Implement getLoadStoreVecRegBitWidthMatt Arsenault2016-07-011-0/+1
| | | | llvm-svn: 274312
* AMDGPU: Implement per-function subtargetsMatt Arsenault2016-06-271-3/+4
| | | | llvm-svn: 273940
* AMDGPU: Other sizes of popcnt are fastMatt Arsenault2016-05-181-1/+1
| | | | | | | We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950
* AMDGPU: Partially implement getArithmeticInstrCost for FP opsMatt Arsenault2016-03-251-1/+30
| | | | llvm-svn: 264374
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-111-3/+3
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU: Override getCFInstrCostMatt Arsenault2015-12-161-0/+2
| | | | | | The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796
* AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence()Tom Stellard2015-12-151-0/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 llvm-svn: 255661
* AMDGPU: Report extractelement as free in cost modelMatt Arsenault2015-12-011-0/+2
| | | | | | | | | | | | The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. llvm-svn: 254438
* Make TargetTransformInfo keeping a reference to the Module DataLayoutMehdi Amini2015-07-091-14/+3
| | | | | | | | | | | | | | | | | | | | DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+78
llvm-svn: 239657
OpenPOWER on IntegriCloud