summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/R600
Commit message (Collapse)AuthorAgeFilesLines
...
* R600/SI: Expand udiv v[24]i32 for SI and v2i32 for EGAaron Watry2013-06-252-0/+4
| | | | | | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Note: I followed the guidance of the v4i32 EG check... UDIV produces really complex code, so let's just check that the instruction was lowered successfully. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184843
* R600/SI: Expand ashr of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+2
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184842
* R600/SI: Expand srl of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+2
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184841
* R600/SI: Expand shl of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+3
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184840
* R600/SI: Expand or of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+3
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184839
* R600/SI: Expand mul of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+3
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184838
* R600/SI: Expand and of v2i32/v4i32 for SIAaron Watry2013-06-251-0/+3
| | | | | | | Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184837
* R600/SI: Report unaligned memory accesses as legal for > 32-bit typesTom Stellard2013-06-252-0/+13
| | | | | | | | | | | In reality, some unaligned memory accesses are legal for 32-bit types and smaller too, but it all depends on the address space. Allowing unaligned loads/stores for > 32-bit types is mainly to prevent the legalizer from splitting one load into multiple loads of smaller types. https://bugs.freedesktop.org/show_bug.cgi?id=65873 llvm-svn: 184822
* R600: Add support for i32 loads from the constant address space on CaymanTom Stellard2013-06-251-0/+9
| | | | | Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 184821
* R600/SI: Add support for v4i32 and v4f32 kernel argsTom Stellard2013-06-251-4/+5
| | | | | Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 184820
* R600: Fix typo in R600Schedule.tdTom Stellard2013-06-251-2/+2
| | | | | | | | | | | | | | | | This should only make a difference in programs that use a lot of the vector ALU instructions like BFI_INT and BIT_ALIGN. There is a slight improvement in the phatk bitcoin mining kernel with this patch on Evergreen (vector size == 1): Before: 1173 Instruction Groups / 9520 dwords After: 1167 Instruction Groups / 9510 dwords Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184819
* R600: Fix spelling error in commentAaron Watry2013-06-241-1/+1
| | | | | | our -> or llvm-svn: 184756
* R600/SI: Expand sub for v2i32 and v4i32 for SITom Stellard2013-06-201-0/+3
| | | | | | | | | | Also add a v2i32 test to the existing v4i32 test. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry<awatry@gmail.com> llvm-svn: 184482
* R600/SI: Expand add for v2i32 and v4i32Tom Stellard2013-06-201-0/+2
| | | | | | | | | | | Also add SI tests to existing file and a v2i32 test for both R600 and SI. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 184481
* R600: Expand v2i32 load/store instead of custom loweringTom Stellard2013-06-201-2/+2
| | | | | | | | | | | | | The custom lowering causes llc to crash with a segfault. Ideally, the custom lowering can be fixed, but this allows programs which load/store v2i32 to work without crashing. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry<awatry@gmail.com> llvm-svn: 184480
* Access the TargetLoweringInfo from the TargetMachine object instead of ↵Bill Wendling2013-06-192-4/+5
| | | | | | caching it. The TLI may change between functions. No functionality change. llvm-svn: 184360
* Move StructurizeCFG out of R600 to generic Transforms.Matt Arsenault2013-06-193-898/+1
| | | | | | Register it with PassManager llvm-svn: 184343
* Use GetUnderlyingObject instead of custom functionMatt Arsenault2013-06-181-58/+20
| | | | llvm-svn: 184261
* Remove dead prototype.Bill Wendling2013-06-181-2/+0
| | | | llvm-svn: 184173
* R600: PV stores Reg id, not indexVincent Lejeune2013-06-171-1/+1
| | | | llvm-svn: 184117
* R600: Properly set COUNT_3 bit in TEX clause initiating inst for pre EG gen.Vincent Lejeune2013-06-171-14/+16
| | | | | | | Fixes rv7x0 bug in Heaven reported here: https://bugs.freedesktop.org/show_bug.cgi?id=64257 llvm-svn: 184116
* R600: Add SI load support for v[24]i32 and store for v2i32Tom Stellard2013-06-151-0/+5
| | | | | | | | | | | Also add a seperate vector lit test file, since r600 doesn't seem to handle v2i32 load/store yet, but we can test both for SI. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 184021
* R600: Use correct encoding for Vertex Fetch instructions on CaymanTom Stellard2013-06-143-156/+294
| | | | | Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184016
* R600: Use EXPORT_RAT_INST_STORE_DWORD for stores on CaymanTom Stellard2013-06-142-37/+57
| | | | | | | | | We were using RAT_INST_STORE_RAW, which seemed to work, but the docs say this instruction doesn't exist for Cayman, so it's probably safer to use a documented instruction instead. Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184015
* R600: Factor the instruction encoding out the RAT_WRITE_CACHELESS_eg classTom Stellard2013-06-142-50/+68
| | | | | Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184014
* R600: Move instruction encoding definitions into a separate .td fileTom Stellard2013-06-142-362/+393
| | | | | Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184013
* R600: Don't try to fix reg class when copying IMPLICIT_DEF to a registerTom Stellard2013-06-131-1/+2
| | | | | | | | | The test case for this is way too complex to be useful as a lit test, and I was unable to reduce it. https://bugs.freedesktop.org/show_bug.cgi?id=65438 llvm-svn: 183937
* R600: Make helper functions static.Benjamin Kramer2013-06-111-4/+5
| | | | llvm-svn: 183744
* R600: Use a refined heuristic to choose when switching clauseVincent Lejeune2013-06-072-10/+47
| | | | | | | | | | | | | | | This is using a hint from AMD APP OpenCL Programming Guide with empirically tweaked parameters. I used Unigine Heaven 3.0 to determine best parameters on my system (i7 2600/Radeon 6950/Kernel 3.9.4) the benchmark : it went from 38.8 average fps to 39.6, which is ~3% gain. (Lightmark 2008.2 gain is much more marginal: from 537 to 539) There is no lit test provided as the parameter were determined empirically and it it would be nearly impossiblet to find a test program that check for optimal behavior. llvm-svn: 183593
* R600: Anti dep better handled in tex clauseVincent Lejeune2013-06-071-6/+4
| | | | llvm-svn: 183592
* R600: Fix calculation of stack offset in AMDGPUFrameLoweringTom Stellard2013-06-071-21/+2
| | | | | | | | | We weren't computing structure size correctly and we were relying on the original alloca instruction to compute the offset, which isn't always reliable. Reviewed-by: Vincent Lejeune <vljn@ovi.com> llvm-svn: 183568
* R600: Rework subtarget info and remove AMDILDevice classesTom Stellard2013-06-0736-1458/+218
| | | | | | | | This should simplify the subtarget definitions and make it easier to add new ones. Reviewed-by: Vincent Lejeune <vljn@ovi.com> llvm-svn: 183566
* Don't cache the instruction and register info from the TargetMachine, becauseBill Wendling2013-06-0721-63/+75
| | | | | | | | the internals of TargetMachine could change. No functionality change intended. llvm-svn: 183561
* R600: Fix the fetch limits for R600 generation GPUsTom Stellard2013-06-074-27/+30
| | | | | | | | Reviewed-by: Vincent Lejeune <vljn@ovi.com> https://bugs.freedesktop.org/show_bug.cgi?id=64257 llvm-svn: 183560
* R600: Move Subtarget feature definitions into AMDGPU.tdTom Stellard2013-06-072-64/+66
| | | | | | | This is the convention used by the other targets. Reviewed-by: Vincent Lejeune <vljn@ovi.com> llvm-svn: 183559
* R600: Remove unnecessary includeTom Stellard2013-06-073-2/+4
| | | | | Reviewed-by: Vincent Lejeune <vljn@ovi.com> llvm-svn: 183558
* R600: Don't compare iterators of different maps.Benjamin Kramer2013-06-071-1/+1
| | | | | | Found be libstdc's debug mode. llvm-svn: 183549
* Vincent says the element is at most once in the vector, so we don't need a ↵Benjamin Kramer2013-06-071-3/+7
| | | | | | full std::remove. llvm-svn: 183541
* R600: Fix a potential iterator invalidation issue.Benjamin Kramer2013-06-071-5/+3
| | | | | | As a bonus this reduces the loop from O(n^2) to O(n). llvm-svn: 183532
* R600: Remove an extra break in R600OptimizeVectorRegisters.cppVincent Lejeune2013-06-071-3/+1
| | | | llvm-svn: 183528
* R600: Rewrite an awkward loop in R600MachineSchedulerVincent Lejeune2013-06-061-7/+15
| | | | llvm-svn: 183458
* R600: Remove leftover code in R600MachineScheduler.cppVincent Lejeune2013-06-061-16/+0
| | | | | | Spotted by Benjamin Kramer. llvm-svn: 183413
* Cast to the correct type. Pointer, not reference.Bill Wendling2013-06-061-1/+1
| | | | llvm-svn: 183385
* R600OptimizeVectorRegisters.cpp: Tweak a warning. [-Wsometimes-uninitialized]NAKAMURA Takumi2013-06-061-1/+1
| | | | | FIXME: Is it false alarm? llvm-svn: 183371
* R600OptimizeVectorRegisters.cpp: Suppress a warning. [-Wunused-variable]NAKAMURA Takumi2013-06-061-0/+1
| | | | llvm-svn: 183370
* Trailing linefeed.NAKAMURA Takumi2013-06-061-1/+0
| | | | llvm-svn: 183369
* Cast to the proper type.Bill Wendling2013-06-061-1/+1
| | | | llvm-svn: 183365
* R600: Replace predicate loop with predicate functionTom Stellard2013-06-051-11/+13
| | | | llvm-svn: 183351
* R600: Add a pass that merge Vector RegisterVincent Lejeune2013-06-054-0/+370
| | | | | | | Previously commited @183279 but tests were failing, reverted @183286 It was broken because @183336 was missing, now it's there. llvm-svn: 183343
* R600: Schedule copy from phys register at beginning of blockVincent Lejeune2013-06-052-1/+32
| | | | | | It allows regalloc pass to remove them by trivially assigning associated reg llvm-svn: 183336
OpenPOWER on IntegriCloud