summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/R600
Commit message (Collapse)AuthorAgeFilesLines
...
* R600: Enable folding of inline literals into REQ_SEQUENCE instructionsTom Stellard2013-08-162-17/+23
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188517
* R600: Add IsExport bit to TableGen instruction definitionsTom Stellard2013-08-166-10/+16
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188516
* R600: Change the RAT instruction assembly names so they match the docsTom Stellard2013-08-162-32/+35
| | | | | Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188515
* Fix spellingMatt Arsenault2013-08-151-7/+7
| | | | llvm-svn: 188506
* Tentative fix for global-buffer-overflow caused by r188426. Found by ↵Alexey Samsonov2013-08-151-1/+4
| | | | | | AddressSanitizer llvm-svn: 188448
* R600/SI: Improve legalization of vector operationsTom Stellard2013-08-144-5/+56
| | | | | | This should fix hangs in the OpenCL piglit tests. llvm-svn: 188431
* R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsicsTom Stellard2013-08-144-4/+18
| | | | llvm-svn: 188430
* R600/SI: Convert v16i8 resource descriptors to i128Tom Stellard2013-08-1412-44/+280
| | | | | | | | | | | | | Now that compute support is better on SI, we can't continue using v16i8 for descriptors since this is also a legal type in OpenCL. This patch fixes numerous hangs with the piglit OpenCL test and since we now use a target specific DAG node for LOAD_CONSTANT with the correct MemOperandFlags, this should also fix: https://bugs.freedesktop.org/show_bug.cgi?id=66805 llvm-svn: 188429
* R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2Tom Stellard2013-08-147-111/+86
| | | | | | | | | | | | Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG instructions should make it easier for the register allocator to coalasce unnecessary copies. v2: - Use an SGPR register class if all the operands of BUILD_VECTOR are SGPRs. llvm-svn: 188427
* R600/SI: Choose the correct MOV instruction for copying immediatesTom Stellard2013-08-145-0/+69
| | | | | | | | The instruction selector will now try to infer the destination register so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying immediates. llvm-svn: 188426
* R600/SI: Assign a register class to the $vaddr operand for MIMG instructionsTom Stellard2013-08-147-60/+109
| | | | | | | The previous code declared the operand as unknown:$vaddr, which made it possible for scalar registers to be used instead of vector registers. llvm-svn: 188425
* R600/SI: Handle MSAA texture targetsTom Stellard2013-08-142-2/+33
| | | | | | | Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188421
* R600/SI: Allow conversion between v32i8 and v8i32Tom Stellard2013-08-142-2/+7
| | | | | | | Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188420
* R600/SI: Fix an obvious typoTom Stellard2013-08-141-1/+1
| | | | | | | Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188419
* R600/SI: Add pattern for fp_to_uintTom Stellard2013-08-141-1/+3
| | | | | | | | | This fixes the F2U opcode for the Mesa driver. Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188418
* R600: Set scheduling preference to Sched::SourceTom Stellard2013-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | R600 doesn't need to do any scheduling on the SelectionDAG now that it has a very good MachineScheduler. Also, using the VLIW SelectionDAG scheduler was having a major impact on compile times. For example with the phatk kernel here are the LLVM IR to machine code compile times: With Sched::VLIW Total Compile Time: 1.4890 Seconds (User + System) SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System) With Sched::Source Total Compile Time: 0.3330 Seconds (User + System) SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System) The code ouput was identical with both schedulers. This may not be true for all programs, but it gives me confidence that there won't be much reduction, if any, in code quality by using Sched::Source. llvm-svn: 188215
* R600/SI: FMA is faster than fmul and fadd for f64Niels Ole Salscheider2013-08-102-0/+19
| | | | llvm-svn: 188136
* R600/SI: Add FMA patternNiels Ole Salscheider2013-08-101-2/+6
| | | | llvm-svn: 188135
* R600/SI: Implement fp32<->fp64 conversionsNiels Ole Salscheider2013-08-082-2/+9
| | | | llvm-svn: 187988
* R600/SI: Implement sint<->fp64 conversionsNiels Ole Salscheider2013-08-082-2/+12
| | | | llvm-svn: 187987
* Initialize SIInsertWaits::ExpInstrTypesSeen in the pass constructor.Evgeniy Stepanov2013-08-071-1/+2
| | | | | | | This value may be used uninitialized in SIInsertWaits::insertWait. Found with MemorySanitizer. llvm-svn: 187869
* R600: Add new file from r187831 to CMakeLists.txtTom Stellard2013-08-061-0/+1
| | | | llvm-svn: 187834
* R600/SI: Use VSrc_* register classes as the default classes for typesTom Stellard2013-08-065-44/+163
| | | | | | | | | | | | | | | | | Since the VSrc_* register classes contain both VGPRs and SGPRs, copies that used be emitted by isel like this: SGPR = COPY VGPR Will now be emitted like this: VSrC = COPY VGPR This patch also adds a pass that tries to identify and fix situations where a VGPR to SGPR copy may occur. Hopefully, these changes will make it impossible for the compiler to generate illegal VGPR to SGPR copies. llvm-svn: 187831
* R600/SI: Add more special cases for opcodes to ensureSRegLimit()Tom Stellard2013-08-064-32/+83
| | | | | | Also factor out the register class lookup to its own function. llvm-svn: 187830
* Target/*/CMakeLists.txt: Add the dependency to CommonTableGen explicitly for ↵NAKAMURA Takumi2013-08-061-1/+1
| | | | | | | | | each corresponding CodeGen. Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel. It races to emit *.inc files simultaneously. llvm-svn: 187780
* Factor FlattenCFG out from SimplifyCFGTom Stellard2013-08-061-1/+1
| | | | | | Patch by: Mei Ye llvm-svn: 187764
* R600: Implement TargetLowering::getVectorIdxTy()Tom Stellard2013-08-053-5/+14
| | | | | | | We use MVT::i32 for the vector index type, because we use 32-bit operations to caculate offsets when dynamically indexing vectors. llvm-svn: 187749
* R600: Add 64-bit float load/store supportTom Stellard2013-08-018-23/+139
| | | | | | | | | | | | | | | | | * Added R600_Reg64 class * Added T#Index#.XY registers definition * Added v2i32 register reads from parameter and global space * Added f32 and i32 elements extraction from v2f32 and v2i32 * Added v2i32 -> v2f32 conversions Tom Stellard: - Mark vec2 operations as expand. The addition of a vec2 register class made them all legal. Patch by: Dmitry Cherkassov Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com> llvm-svn: 187582
* R600: Use 64-bit alignment for 64-bit kernel argumentsTom Stellard2013-08-011-1/+1
| | | | llvm-svn: 187581
* R600/SI: Custom lower i64 ZERO_EXTENDTom Stellard2013-08-012-0/+16
| | | | llvm-svn: 187580
* Revert "R600: Non vector only instruction can be scheduled on trans unit"Tom Stellard2013-07-314-60/+19
| | | | | | This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6. llvm-svn: 187526
* Revert "R600: Use SchedModel enum for is{Trans,Vector}Only functions"Tom Stellard2013-07-314-19/+23
| | | | | | This reverts commit 3f1de26cb5cc0543a6a1d71259a7a39d97139051. llvm-svn: 187524
* R600: Do not mergevector after a vector reg is usedVincent Lejeune2013-07-311-1/+10
| | | | | | | | | | If we merge vector when a vector is used, it will generate an artificial antidependency that can prevent 2 tex/vtx instructions to use the same clause and thus generate extra clauses that reduce performance. There is no test case as such situation is really hard to predict. llvm-svn: 187516
* R600: Avoid more than 4 literals in the same instruction group at schedulingVincent Lejeune2013-07-311-0/+5
| | | | llvm-svn: 187515
* R600: Non vector only instruction can be scheduled on trans unitVincent Lejeune2013-07-314-19/+60
| | | | llvm-svn: 187514
* R600: Don't mix LDS and non-LDS instructions in the same groupVincent Lejeune2013-07-311-0/+4
| | | | | | | | There are a lot of restrictions on instruction groups that contain LDS instructions, so for now we will be conservative and not packetize anything else with them. llvm-svn: 187513
* R600: Use SchedModel enum for is{Trans,Vector}Only functionsVincent Lejeune2013-07-314-23/+19
| | | | llvm-svn: 187512
* R600: Remove predicated_break instVincent Lejeune2013-07-314-59/+7
| | | | | | | | | | | | | We were using two instructions for similar purpose : break and predicated break. Only predicated_break was emitted and it was lowered at R600ControlFlowFinalizer to JUMP;CF_BREAK;POP. This commit simplify the situation by making AMDILCFGStructurizer emit IF_PREDICATE;BREAK;ENDIF; instead of predicated_break (which is now removed). There is no functionality change. llvm-svn: 187510
* R600/SI: Expand vector fp <-> int conversionsTom Stellard2013-07-302-4/+4
| | | | llvm-svn: 187421
* [R600] Replicate old DAGCombiner behavior in target specific DAG combine.Quentin Colombet2013-07-301-0/+56
| | | | | | | build_vector is lowered to REG_SEQUENCE, which is something the register allocator does a good job at optimizing. llvm-svn: 187397
* SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch ↵Tom Stellard2013-07-275-0/+110
| | | | | | | | | | | | | | conditions Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks. Patch by: Mei Ye llvm-svn: 187278
* DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)FreeTom Stellard2013-07-232-0/+17
| | | | | | | This commit also implements these functions for R600 and removes a test case that was relying on the buggy behavior. llvm-svn: 187007
* R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessaryTom Stellard2013-07-232-19/+7
| | | | | | | | | | These are really the same address space in hardware. The only difference is that CONSTANT_ADDRESS uses a special cache for faster access. When we are unable to use the constant kcache for some reason (e.g. smaller types or lack of indirect addressing) then the instruction selector must use GLOBAL_ADDRESS loads instead. llvm-svn: 187006
* R600: Add support for 24-bit MAD instructionsTom Stellard2013-07-232-2/+12
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186923
* R600: Add support for 24-bit MUL instructionsTom Stellard2013-07-234-5/+75
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186922
* R600: Improve support for < 32-bit loadsTom Stellard2013-07-234-11/+39
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186921
* R600: Rename AMDILISelDAGToDAG.cpp -> AMDGPUISelDAGToDAG.cppTom Stellard2013-07-232-1/+1
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186920
* R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select()Tom Stellard2013-07-234-49/+160
| | | | | | | | | | This increases the number of opportunites we have for folding. With the previous implementation we were unable to fold into any instructions other than the first when multiple instructions were selected from a single SDNode. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186919
* R600: Use KCache for kernel argumentsTom Stellard2013-07-234-49/+22
| | | | | Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186918
* R600: Simplify assembly for KCache registers using the TableGen !add operatorTom Stellard2013-07-231-4/+4
| | | | | | | | | | | | | Before: MOV * T0.W, KC0[131-128].Y After: MOV * T0.W, KC0[3].Y Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186917
OpenPOWER on IntegriCloud