summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/R600
Commit message (Collapse)AuthorAgeFilesLines
...
* Call EmitFunctionHeader just before EmitFunctionBody.Rafael Espindola2015-03-178-11/+13
| | | | | | | This avoids switching to .AMDGPU.config and back and hardcoding the section it switches back to. llvm-svn: 232479
* R600/SI: don't try min3/max3/med3 with f64Tom Stellard2015-03-161-0/+24
| | | | | | | | | | There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-03-1321-185/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184
* R600/SI: Add test for min / max with immediateMatt Arsenault2015-03-132-0/+36
| | | | | | | Make sure this isn't getting confused by canonicalizations of comparisons with a constant. llvm-svn: 232177
* R600/SI: Remove _e32 and _e64 suffixes from mnemonicsTom Stellard2015-03-123-4/+4
| | | | | | | | Instead print them as part of the $dst operand. The AsmMatcher requires the 32-bit and 64-bit encodings have the same mnemonic in order to parse them correctly. llvm-svn: 232105
* R600/SI: Limit SGPRs to 80 on Tonga and IcelandMarek Olsak2015-03-091-3/+6
| | | | | | This is a candidate for stable. llvm-svn: 231659
* DAGCombiner: Canonicalize select(and/or,x,y) depending on target.Matthias Braun2015-03-061-3/+3
| | | | | | | | | | | | | | | This is based on the following equivalences: select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y) select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y)) Many target cannot perform and/or on the CPU flags and therefore the right side should be choosen to avoid materializign the i1 flags in an integer register. If the target can perform this operation efficiently we normalize to the left form. Differential Revision: http://reviews.llvm.org/D7622 llvm-svn: 231507
* R600/SI: Add an intrinsic for S_FLBIT_I32 / V_FFBH_I32Marek Olsak2015-03-041-0/+28
| | | | | | Required by OpenGL (ARB_gpu_shader5). llvm-svn: 231259
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-27217-1758/+1758
| | | | | | | | | | | | | | | | | | | | | | | | load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
* [opaque pointer type] Add textual IR support for explicit type parameter to ↵David Blaikie2015-02-27140-1314/+1314
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float*> %x, ... ->getelementptr float, <4 x float*> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name *.ll | xargs ./apply.sh From llvm/src/tools/clang: find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll | xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
* R600/SI: Remove M0 from DS assembly stringsTom Stellard2015-02-269-64/+64
| | | | | | This matches the assembly syntax for the proprietary compiler. llvm-svn: 230645
* R600/SI: Remove isel mubuf legalizationTom Stellard2015-02-241-3/+1
| | | | | | | We legalize mubuf instructions post-instruction selection, so this code is no longer needed. llvm-svn: 230352
* R600/SI: Use v_madmk_f32Matt Arsenault2015-02-212-1/+182
| | | | llvm-svn: 230149
* R600/SI: Try to use v_madak_f32Matt Arsenault2015-02-211-0/+193
| | | | | | | This is a code size optimization when the constant only has one use. llvm-svn: 230148
* R600/SI: Remove v_sub_f64 pseudoMatt Arsenault2015-02-203-10/+101
| | | | | | | | | | The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. llvm-svn: 230072
* R600: Use new fmad node.Matt Arsenault2015-02-201-0/+567
| | | | | | | | | | | This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. llvm-svn: 230071
* R600/SI: Add missing offset operand to buffer bothenMatt Arsenault2015-02-181-20/+32
| | | | llvm-svn: 229605
* R600/SI: Add missing soffset operand to global atomicsMatt Arsenault2015-02-181-0/+9
| | | | llvm-svn: 229604
* R600/SI: Fix asam errors in SIFoldOperandsTom Stellard2015-02-172-4/+4
| | | | | | | We were trying to fold into implicit uses, which led to out of bounds access of the MCInstrDesc::OpInfo arrray. llvm-svn: 229533
* R600/SI: Extend private extload pattern to include zext loadsTom Stellard2015-02-171-0/+46
| | | | llvm-svn: 229507
* R600/SI: Implement correct f64 fdivMatt Arsenault2015-02-144-24/+105
| | | | | | This version passes the OpenCL conformance test. llvm-svn: 229239
* R600/SI: Use complex operand folding for div_scaleMatt Arsenault2015-02-141-0/+77
| | | | llvm-svn: 229238
* R600/SI: Add tests for div_fmas with inline immediate operandsMatt Arsenault2015-02-141-1/+43
| | | | llvm-svn: 229237
* R600/SI: Fix implicit vcc operand to v_div_fmas_*Matt Arsenault2015-02-141-2/+108
| | | | | | | | | This should allow finally fixing the f64 fdiv implementation. Test is disabled for VI since there seems to be a problem with one of the buffer load instructions on it. llvm-svn: 229236
* Fix R600 test deadlock on Windows by giving FileCheck an argumentReid Kleckner2015-02-131-2/+2
| | | | | | | | llc would hang trying to write output to a full pipe that FileCheck wasn't reading. FileCheck wasn't reading from stdin because it needs a file as a positional argument. llvm-svn: 229157
* R600/SI: Allow f64 inline immediates in i64 operandsMatt Arsenault2015-02-135-56/+327
| | | | | | | This requires considering the size of the operand when checking immediate legality. llvm-svn: 229135
* R600/SI: Minor test scheduling fixesMatt Arsenault2015-02-134-16/+17
| | | | | | This prevents these from failing in a later commit. llvm-svn: 229134
* [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to ↵Andrea Di Biagio2015-02-131-225/+0
| | | | | | | | | | | | | | | | speculate calls to cttz/ctlz. SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are 'cheap' for the target. Therefore, some of the logic in CodeGenPrepare that was originally added at revision 224899 can now be removed. This patch is basically a no functional change. It removes the duplicated logic in CodeGenPrepare and converts all the existing target specific tests for cttz/ctlz into SimplifyCFG tests. Differential Revision: http://reviews.llvm.org/D7608 llvm-svn: 229105
* R600/SI: Disable subreg livenessTom Stellard2015-02-111-2/+67
| | | | | | This is temporary while we try to fix a crash in the register coalescer. llvm-svn: 228861
* R600/SI: Fix -march in testTom Stellard2015-02-111-4/+2
| | | | llvm-svn: 228848
* R600/SI: Enable a lot of existing tests for VI (squashed commits)Marek Olsak2015-02-1138-994/+1165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a union of these commits: * R600/SI: Enable more tests for VI which need no changes * R600/SI: Enable V_BCNT tests for VI Differences: - v_bcnt_..._e32 -> _e64 - s_load_dword* inline offset is in bytes instead of dwords * R600/SI: Enable all tests for VI which use S_LOAD_DWORD The inline offset is changed from dwords to bytes. * R600/SI: Enable LDS tests for VI Differences: - the s_load_dword inline offset changed from dwords to bytes - the tests checked very little on CI, so they have been fixed to check all instructions that "SI" checked * R600/SI: Enable lshr tests for VI * R600/SI: Fix divrem64 tests - "v_lshl_64" was missing "b" before "64" - added VI-NOT checks * R600/SI: Enable the SI.tid test for VI * R600/SI: Enable the frem test for VI Also, the frem_f64 checking is added for CI-VI. * R600/SI: Add VI tests for rsq.clamped llvm-svn: 228830
* R600/SI: Store immediate offsets > 12-bits in soffsetTom Stellard2015-02-111-9/+12
| | | | | | | This will save us from having to extend these offsets to 64-bits and storing them in a pair of vgprs. llvm-svn: 228776
* R600/SI: Amend a test to ensure WQM is enabled for LDS in pixel shadersMichel Danzer2015-02-061-0/+1
| | | | | Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228374
* R600/SI: Don't enable WQM for V_INTERP_* instructions v2Michel Danzer2015-02-062-22/+30
| | | | | | | | | Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373
* R600/SI: Also enable WQM for image opcodes which calculate LOD v3Michel Danzer2015-02-062-0/+40
| | | | | | | | | | | | | If whole quad mode isn't enabled for these, the level of detail is calculated incorrectly for pixels along diagonal triangle edges, causing artifacts. v2: Use a TSFlag instead of lots of switch cases v3: Add test coverage Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642 Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228372
* R600/SI: Fix bug in TTI loop unrolling preferencesTom Stellard2015-02-051-0/+58
| | | | | | | | | | | | | We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead of UnrollingPreferences::Count. Count is a 'forced unrolling factor', while MaxCount sets an upper limit to the unrolling factor. Setting Count to MAX_UINT was causing the loop in the testcase to be unrolled 15 times, when it only had a maximum of 4 iterations. llvm-svn: 228303
* R600/SI: Fix bug from insertion of llvm.SI.end.cf into loop headersTom Stellard2015-02-051-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks, if-then-else blocks, and loops. It is responsible for updating the exec mask to re-enable threads that had been masked during the preceding control flow block. For example: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic was being inserted into the header of loops. This would happen when an if block terminated in a loop header and we would end up with code like this: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() LOOP: ; Start of loop header s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the same value at the beginning of each loop iteration. do_stuff(); s_cbranch_execnz LOOP The fix is to create a new basic block before the loop and insert the llvm.SI.end.cf there. This way the exec mask is restored before the start of the loop instead of at the beginning of each iteration. llvm-svn: 228302
* R600/SI: Fix i64 truncate to i1Matt Arsenault2015-02-051-0/+31
| | | | llvm-svn: 228273
* R600/SI: Enable subreg liveness by defaultTom Stellard2015-02-044-38/+38
| | | | llvm-svn: 228228
* R600/SI: Expand misaligned 16-bit memory accessesTom Stellard2015-02-041-0/+24
| | | | llvm-svn: 228190
* R600/SI: Make more store operations legalTom Stellard2015-02-041-11/+46
| | | | | | | | | | | v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for all address spaces. We had marked them as custom in order to lower them for the private address space, but this is no longer necessary. This enables lowering of misaligned stores of these types in the DAGLegalizer. llvm-svn: 228189
* R600: Don't promote i64 stores to v2i32 during DAG legalizationTom Stellard2015-02-041-2/+4
| | | | | | | We take care of this during instruction selection now. This fixes a potential infinite loop when lowering misaligned stores. llvm-svn: 228188
* R600/SI: Remove the -CHECK suffix from all FileCheck prefixes in LIT testsMarek Olsak2015-02-0328-1414/+1414
| | | | llvm-svn: 228040
* R600/SI: Fix B64 VALU shifts on VIMarek Olsak2015-02-034-40/+44
| | | | | | | SI only has standard versions. VI only has REV versions. Tested-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 228037
* R600/SI: Don't generate non-existent LSHL, LSHR, ASHR B32 variants on VIMarek Olsak2015-02-032-2/+53
| | | | | | | | | | | | | | | This can happen when a REV instruction is commuted. The trick is not to define the _vi versions of instructions, which has these consequences: - code generation will always fail if a pseudo cannot be lowered (very useful to catch bugs where an unsupported instruction somehow makes it to the printer) - ability to query if a pseudo can be lowered, which is done in commuteOpcode to prevent REV from commuting to non-REV on VI Tested-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 227990
* R600/SI: Fix dependency between instruction writing M0 and S_SENDMSG on VI (v2)Marek Olsak2015-02-031-0/+20
| | | | | | | | | | This fixes a hang when using an empty geometry shader. v2: - don't add s_nop when followed by s_waitcnt - comestic changes Tested-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 227986
* R600/SI: 64-bit and larger memory access must be at least 4-byte alignedTom Stellard2015-02-022-5/+76
| | | | | | | | This is true for SI only. CI+ supports unaligned memory accesses, but this requires driver support, so for now we disallow unaligned accesses for all GCN targets. llvm-svn: 227822
* R600/SI: Merge two test filesTom Stellard2015-02-022-24/+15
| | | | llvm-svn: 227821
* R600/SI: Only select cvt_flr/cvt_rpi with no NaNs.Matt Arsenault2015-01-312-15/+27
| | | | | | These have different behavior from cvt_i32_f32 on NaN. llvm-svn: 227693
* R600/SI: Implement enableAggressiveFMAFusionMatt Arsenault2015-01-291-0/+368
| | | | | | | | | Add tests for the various combines. This should always be at least cycle neutral on all subtargets for f64, and faster on some. For f32 we should prefer selecting v_mad_f32 over v_fma_f32. llvm-svn: 227484
OpenPOWER on IntegriCloud