summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1111-37/+11
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487
* [AMDGPU] Pattern for v_xor3_b32Stanislav Mekhanoshin2019-05-101-1/+4
| | | | | | | | | This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395
* [AMDGPU] gfx1010 v_interp_* instructionsStanislav Mekhanoshin2019-05-091-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364
* [AMDGPU] gfx1010 changes for PAL metadataStanislav Mekhanoshin2019-05-091-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353
* AMDGPU: Mark scheduler classes as finalMatt Arsenault2019-05-081-2/+2
| | | | llvm-svn: 360294
* AMDGPU: Select VOP3 form of addMatt Arsenault2019-05-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293
* [AMDGPU] gfx1010 exp modificationsStanislav Mekhanoshin2019-05-083-2/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287
* AMDGPU: Fix a mis-placed bracketChangpeng Fang2019-05-081-1/+1
| | | | | | | Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283
* [AMDGPU] Reapplied BFE canonicalization from D60462Simon Pilgrim2019-05-081-11/+25
| | | | | | This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263
* R600InstrInfo.cpp - Add getTransSwizzle assert for the swizzle op index. NFCI.Simon Pilgrim2019-05-081-0/+1
| | | | | | Fixes static analyzer undefined value warning. llvm-svn: 360239
* [SIMode] Fix typo in Status constructorSimon Pilgrim2019-05-081-1/+1
| | | | | | | | As noted in https://www.viva64.com/en/b/0629/ (Snippet No. 36) and the scan-build CI reports (https://llvm.org/reports/scan-build/report-SIModeRegister.cpp-Status-1-1.html#EndPath), rL348754 introduced a typo in the Status constructor due to argument variable names shadowing the member variable names. Differential Revision: https://reviews.llvm.org/D61595 llvm-svn: 360236
* [AMDGPU] Check MI bundles for hazardsAustin Kerbow2019-05-072-11/+62
| | | | | | | | | | | | | | | | Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually. Reviewers: arsenm, msearles, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61564 llvm-svn: 360199
* AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operandNicolai Haehnle2019-05-071-0/+18
| | | | | | | | | | | | | | | | | | Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123
* [AMDGPU] gfx1010 verifier changesStanislav Mekhanoshin2019-05-061-7/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D61521 llvm-svn: 360095
* [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32Stanislav Mekhanoshin2019-05-061-1/+1
| | | | | | | | | GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094
* [AMDGPU] gfx1010 memory legalizerStanislav Mekhanoshin2019-05-061-1/+262
| | | | | | Differential Revision: https://reviews.llvm.org/D61535 llvm-svn: 360087
* Revert r359392 and r358887Craig Topper2019-05-061-25/+11
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* Fix compilation warnings when compiling with GCC 7.3Alexandre Ganea2019-05-061-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D61046 llvm-svn: 360044
* [AMDGPU] Fixed asan error after D61536Stanislav Mekhanoshin2019-05-041-1/+1
| | | | llvm-svn: 359963
* AMDGPU] gfx1010 hazard recognizerStanislav Mekhanoshin2019-05-042-3/+268
| | | | | | Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961
* [AMDGPU] gfx1010: use fmac instructionsStanislav Mekhanoshin2019-05-044-39/+105
| | | | | | Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959
* [AMDGPU] gfx1010 wait count insertionStanislav Mekhanoshin2019-05-031-56/+144
| | | | | | Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938
* [AMDGPU] gfx1010 s_code_end generationStanislav Mekhanoshin2019-05-034-2/+45
| | | | | | | | Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937
* [AMDGPU] gfx1010 loop alignmentStanislav Mekhanoshin2019-05-032-0/+78
| | | | | | Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935
* AMDGPU: Select VOP3 form of subMatt Arsenault2019-05-031-2/+2
| | | | | | | | | | The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
* AMDGPU: Support shrinking add with FI in SIFoldOperandsMatt Arsenault2019-05-031-35/+37
| | | | | | Avoids test regression in a future patch llvm-svn: 359898
* AMDGPU: Remove redundant patterns for shiftsMatt Arsenault2019-05-031-9/+4
| | | | llvm-svn: 359895
* AMDGPU: Remove redundant patterns for subMatt Arsenault2019-05-031-4/+0
| | | | | | | There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894
* AMDGPU: Replace shrunk instruction with dummy implicit_defMatt Arsenault2019-05-031-4/+8
| | | | | | | | | | | | This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
* AMDGPU: Fix incorrect commute with sub when folding immediatesMatt Arsenault2019-05-031-1/+4
| | | | | | | | | When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
* [SelectionDAG] remove constant folding limitations based on FP exceptionsSanjay Patel2019-05-021-5/+0
| | | | | | | | | | | | | | | | | We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
* [AMDGPU] gfx1010 lost VOP2 forms of some add/subStanislav Mekhanoshin2019-05-021-0/+27
| | | | | | | | Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
* [AMDGPU] gfx1010 allows VOP3 to have a literalStanislav Mekhanoshin2019-05-027-60/+133
| | | | | | Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
* [AMDGPU] gfx1010 constant bus limitStanislav Mekhanoshin2019-05-024-24/+136
| | | | | | | | Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
* [AMDGPU] gfx1010 GCNRegBankReassign passStanislav Mekhanoshin2019-05-014-0/+803
| | | | | | | | Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
* [AMDGPU] gfx1010 GCNNSAReassign passStanislav Mekhanoshin2019-05-014-0/+362
| | | | | | | | Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
* [AMDGPU] gfx1010 MIMG implementationStanislav Mekhanoshin2019-05-0112-161/+922
| | | | | | Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
* [AMDGPU] gfx1010 DS implementationStanislav Mekhanoshin2019-05-013-165/+221
| | | | | | Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-3016-317/+1071
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [TargetLowering] Change getOptimalMemOpType to take a function attribute listSjoerd Meijer2019-04-302-6/+5
| | | | | | | | | | | | The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
* Avoid "checking a pointer after dereferencing" warning. NFCI.Simon Pilgrim2019-04-291-1/+1
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359473
* Move if() to newline to stop ambiguity over whether it should be else if. NFCI.Simon Pilgrim2019-04-291-1/+2
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359472
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-275-128/+17
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-268-361/+696
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-264-102/+281
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-266-154/+605
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] gfx1010 - fix ubsan failureStanislav Mekhanoshin2019-04-251-1/+0
| | | | | | | Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-256-102/+306
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] gfx1010 utility functionsStanislav Mekhanoshin2019-04-254-29/+90
| | | | | | Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224
* Fix spelling error. NFCAustin Kerbow2019-04-241-1/+1
| | | | | | | | | | | | | | | | Summary: Test commit. Reviewers: msearles, jkorous Reviewed By: jkorous Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61093 llvm-svn: 359154
OpenPOWER on IntegriCloud