summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-091-1/+2
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-071-2/+1
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* [IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' ↵Sanjay Patel2017-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488
* AMDGPU : Widen extending scalar loads to 32-bits.Wei Ding2017-07-261-0/+45
| | | | | | Differential Revision: http://reviews.llvm.org/D35146 llvm-svn: 309178
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-181-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-4/+1
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-5/+13
| | | | llvm-svn: 296396
* AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 opsMatt Arsenault2017-02-011-18/+41
| | | | | | | | | | | | These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. llvm-svn: 293776
* [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-201-14/+26
| | | | | | other minor fixes (NFC). llvm-svn: 292623
* AMDGPU: Allow rcp and rsq usage with f16Matt Arsenault2016-12-221-1/+0
| | | | llvm-svn: 290302
* AMDGPU: Fix crash on i16 constant expressionMatt Arsenault2016-12-061-2/+3
| | | | llvm-svn: 288861
* [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'Konstantin Zhuravlyov2016-10-071-1/+1
| | | | llvm-svn: 283558
* [AMDGPU] Promote uniform (i1, i16] operations to i32Konstantin Zhuravlyov2016-10-071-97/+101
| | | | | | Differential Revision: https://reviews.llvm.org/D25302 llvm-svn: 283555
* [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32Konstantin Zhuravlyov2016-10-061-11/+65
| | | | | | Differential Revision: https://reviews.llvm.org/D25121 llvm-svn: 283415
* [AMDGPU] Sign extend AShr when promoting (instead of zero extending)Konstantin Zhuravlyov2016-10-031-2/+2
| | | | llvm-svn: 283130
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit ↵Konstantin Zhuravlyov2016-09-281-3/+234
| | | | | | | | instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624
* AMDGPU: Use rcp for fdiv 1, x with fpmath metadataMatt Arsenault2016-07-261-1/+1
| | | | | | | Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-191-6/+117
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* AMDGPU: Add stub custom CodeGenPrepare passMatt Arsenault2016-06-241-0/+82
This will do various things including ones CodeGenPrepare does, but with knowledge of uniform values. llvm-svn: 273657
OpenPOWER on IntegriCloud