summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/R600
Commit message (Collapse)AuthorAgeFilesLines
...
* R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructionsTom Stellard2013-08-161-2/+2
| | | | | | | | | The SIInsertWaits pass was overwriting the first operand (gds bit) of DS_WRITE_B32 with the second operand (value to write). This meant that any time the value to write was stored in an odd number VGPR, the gds bit would be set causing the instruction to write to GDS instead of LDS. llvm-svn: 188522
* R600: Add support for global vector loads with element types less than 32-bitsTom Stellard2013-08-161-0/+176
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188521
* R600: Add support for global vector stores with elements less than 32-bitsTom Stellard2013-08-161-0/+62
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188520
* R600: Add support for i16 and i8 global storesTom Stellard2013-08-162-2/+61
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188519
* R600: Add support for v4i32 stores on CaymanTom Stellard2013-08-162-1/+15
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188518
* R600: Enable folding of inline literals into REQ_SEQUENCE instructionsTom Stellard2013-08-161-0/+13
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188517
* R600: Change the RAT instruction assembly names so they match the docsTom Stellard2013-08-165-25/+25
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188515
* [tests] Cleanup initialization of test suffixes.Daniel Dunbar2013-08-161-11/+1
| | | | | | | | | | | | | | | | | - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
* R600/SI: Improve legalization of vector operationsTom Stellard2013-08-141-0/+111
| | | | | | This should fix hangs in the OpenCL piglit tests. llvm-svn: 188431
* R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsicsTom Stellard2013-08-141-0/+17
| | | | llvm-svn: 188430
* R600/SI: Convert v16i8 resource descriptors to i128Tom Stellard2013-08-142-34/+34
| | | | | | | | | | | | | Now that compute support is better on SI, we can't continue using v16i8 for descriptors since this is also a legal type in OpenCL. This patch fixes numerous hangs with the piglit OpenCL test and since we now use a target specific DAG node for LOAD_CONSTANT with the correct MemOperandFlags, this should also fix: https://bugs.freedesktop.org/show_bug.cgi?id=66805 llvm-svn: 188429
* R600/SI: Use i8 types for resource descriptors in testsTom Stellard2013-08-144-62/+62
| | | | | | | We switched from i32 to i8 types a while ago and the tests were never updated. llvm-svn: 188428
* R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2Tom Stellard2013-08-142-1/+51
| | | | | | | | | | | | Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG instructions should make it easier for the register allocator to coalasce unnecessary copies. v2: - Use an SGPR register class if all the operands of BUILD_VECTOR are SGPRs. llvm-svn: 188427
* R600/SI: Assign a register class to the $vaddr operand for MIMG instructionsTom Stellard2013-08-141-0/+44
| | | | | | | The previous code declared the operand as unknown:$vaddr, which made it possible for scalar registers to be used instead of vector registers. llvm-svn: 188425
* R600/SI: Handle MSAA texture targetsTom Stellard2013-08-141-1/+1
| | | | | | | Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188421
* R600/SI: Allow conversion between v32i8 and v8i32Tom Stellard2013-08-141-0/+21
| | | | | | | Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188420
* R600/SI: Add pattern for fp_to_uintTom Stellard2013-08-141-9/+18
| | | | | | | | | This fixes the F2U opcode for the Mesa driver. Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188418
* R600: Set scheduling preference to Sched::SourceTom Stellard2013-08-128-8/+8
| | | | | | | | | | | | | | | | | | | | | | | R600 doesn't need to do any scheduling on the SelectionDAG now that it has a very good MachineScheduler. Also, using the VLIW SelectionDAG scheduler was having a major impact on compile times. For example with the phatk kernel here are the LLVM IR to machine code compile times: With Sched::VLIW Total Compile Time: 1.4890 Seconds (User + System) SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System) With Sched::Source Total Compile Time: 0.3330 Seconds (User + System) SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System) The code ouput was identical with both schedulers. This may not be true for all programs, but it gives me confidence that there won't be much reduction, if any, in code quality by using Sched::Source. llvm-svn: 188215
* R600/SI: FMA is faster than fmul and fadd for f64Niels Ole Salscheider2013-08-101-0/+31
| | | | llvm-svn: 188136
* R600/SI: Add FMA patternNiels Ole Salscheider2013-08-101-0/+31
| | | | llvm-svn: 188135
* R600/SI: Implement fp32<->fp64 conversionsNiels Ole Salscheider2013-08-082-0/+18
| | | | llvm-svn: 187988
* R600/SI: Implement sint<->fp64 conversionsNiels Ole Salscheider2013-08-082-0/+18
| | | | llvm-svn: 187987
* R600/SI: Use VSrc_* register classes as the default classes for typesTom Stellard2013-08-061-0/+84
| | | | | | | | | | | | | | | | | Since the VSrc_* register classes contain both VGPRs and SGPRs, copies that used be emitted by isel like this: SGPR = COPY VGPR Will now be emitted like this: VSrC = COPY VGPR This patch also adds a pass that tries to identify and fix situations where a VGPR to SGPR copy may occur. Hopefully, these changes will make it impossible for the compiler to generate illegal VGPR to SGPR copies. llvm-svn: 187831
* R600/SI: Add more special cases for opcodes to ensureSRegLimit()Tom Stellard2013-08-066-45/+45
| | | | | | Also factor out the register class lookup to its own function. llvm-svn: 187830
* Factor FlattenCFG out from SimplifyCFGTom Stellard2013-08-062-0/+115
| | | | | | Patch by: Mei Ye llvm-svn: 187764
* R600/SI: Add missing test for r187749Tom Stellard2013-08-051-0/+48
| | | | llvm-svn: 187754
* R600: Add 64-bit float load/store supportTom Stellard2013-08-0115-43/+161
| | | | | | | | | | | | | | | | | * Added R600_Reg64 class * Added T#Index#.XY registers definition * Added v2i32 register reads from parameter and global space * Added f32 and i32 elements extraction from v2f32 and v2i32 * Added v2i32 -> v2f32 conversions Tom Stellard: - Mark vec2 operations as expand. The addition of a vec2 register class made them all legal. Patch by: Dmitry Cherkassov Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com> llvm-svn: 187582
* R600: Use 64-bit alignment for 64-bit kernel argumentsTom Stellard2013-08-011-0/+2
| | | | llvm-svn: 187581
* R600/SI: Custom lower i64 ZERO_EXTENDTom Stellard2013-08-011-0/+18
| | | | llvm-svn: 187580
* Revert "R600: Non vector only instruction can be scheduled on trans unit"Tom Stellard2013-07-3125-185/+73
| | | | | | This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6. llvm-svn: 187526
* R600: Avoid more than 4 literals in the same instruction group at schedulingVincent Lejeune2013-07-311-0/+68
| | | | llvm-svn: 187515
* R600: Non vector only instruction can be scheduled on trans unitVincent Lejeune2013-07-3125-73/+185
| | | | llvm-svn: 187514
* R600/SI: Expand vector fp <-> int conversionsTom Stellard2013-07-304-36/+36
| | | | llvm-svn: 187421
* [R600] Replicate old DAGCombiner behavior in target specific DAG combine.Quentin Colombet2013-07-301-1/+0
| | | | | | | build_vector is lowered to REG_SEQUENCE, which is something the register allocator does a good job at optimizing. llvm-svn: 187397
* [DAGCombiner] insert_vector_elt: Avoid building a vector twice.Quentin Colombet2013-07-301-0/+1
| | | | | | | | | | | | | | | | This patch prevents the following combine when the input vector is used more than once. insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx => build_vector elt0, ..., NewEltIdx, ..., eltN The reasons are: - Building a vector may be expensive, so try to reuse the existing part of a vector instead of creating a new one (think big vectors). - elt0 to eltN now have two users instead of one. This may prevent some other optimizations. llvm-svn: 187396
* DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)FreeTom Stellard2013-07-233-186/+38
| | | | | | | This commit also implements these functions for R600 and removes a test case that was relying on the buggy behavior. llvm-svn: 187007
* R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessaryTom Stellard2013-07-231-25/+122
| | | | | | | | | | These are really the same address space in hardware. The only difference is that CONSTANT_ADDRESS uses a special cache for faster access. When we are unable to use the constant kcache for some reason (e.g. smaller types or lack of indirect addressing) then the instruction selector must use GLOBAL_ADDRESS loads instead. llvm-svn: 187006
* R600: Add support for 24-bit MAD instructionsTom Stellard2013-07-232-0/+90
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186923
* R600: Add support for 24-bit MUL instructionsTom Stellard2013-07-232-0/+84
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186922
* R600: Improve support for < 32-bit loadsTom Stellard2013-07-232-14/+67
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186921
* R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select()Tom Stellard2013-07-231-1/+1
| | | | | | | | | | This increases the number of opportunites we have for folding. With the previous implementation we were unable to fold into any instructions other than the first when multiple instructions were selected from a single SDNode. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186919
* R600: Use KCache for kernel argumentsTom Stellard2013-07-2317-90/+86
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186918
* R600: Use the same compute kernel calling convention for all GPUsTom Stellard2013-07-231-2/+2
| | | | | | | | A side-effect of this is that now the compiler expects kernel arguments to be 4-byte aligned. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186916
* R600: Use correct LoadExtType when lowering kernel argumentsTom Stellard2013-07-231-0/+19
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186915
* R600: Clean up extended load patternsTom Stellard2013-07-232-1/+2
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186914
* R600: Expand vector FNEGTom Stellard2013-07-231-0/+26
| | | | llvm-svn: 186913
* R600: Don't emit empty then clause and use alu_pop_afterVincent Lejeune2013-07-193-6/+129
| | | | llvm-svn: 186725
* R600/SI: Fix crash with VSELECTTom Stellard2013-07-181-0/+15
| | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=66175 llvm-svn: 186616
* R600/SI: Add support for v2f32 loadsTom Stellard2013-07-181-0/+14
| | | | llvm-svn: 186615
* R600/SI: Add support for v2f32 storesTom Stellard2013-07-181-0/+18
| | | | llvm-svn: 186614
OpenPOWER on IntegriCloud