| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188517
|
|
|
|
|
| |
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188516
|
|
|
|
|
| |
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188515
|
|
|
|
| |
llvm-svn: 188506
|
|
|
|
|
|
| |
AddressSanitizer
llvm-svn: 188448
|
|
|
|
|
|
| |
This should fix hangs in the OpenCL piglit tests.
llvm-svn: 188431
|
|
|
|
| |
llvm-svn: 188430
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.
v2:
- Use an SGPR register class if all the operands of BUILD_VECTOR are
SGPRs.
llvm-svn: 188427
|
|
|
|
|
|
|
|
| |
The instruction selector will now try to infer the destination register
so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying
immediates.
llvm-svn: 188426
|
|
|
|
|
|
|
| |
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425
|
|
|
|
|
|
|
| |
Patch by: Marek Olšák
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188421
|
|
|
|
|
|
|
| |
Patch by: Marek Olšák
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188420
|
|
|
|
|
|
|
| |
Patch by: Marek Olšák
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188419
|
|
|
|
|
|
|
|
|
| |
This fixes the F2U opcode for the Mesa driver.
Patch by: Marek Olšák
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler. Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:
With Sched::VLIW
Total Compile Time: 1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)
With Sched::Source
Total Compile Time: 0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)
The code ouput was identical with both schedulers. This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.
llvm-svn: 188215
|
|
|
|
| |
llvm-svn: 188136
|
|
|
|
| |
llvm-svn: 188135
|
|
|
|
| |
llvm-svn: 187988
|
|
|
|
| |
llvm-svn: 187987
|
|
|
|
|
|
|
| |
This value may be used uninitialized in SIInsertWaits::insertWait.
Found with MemorySanitizer.
llvm-svn: 187869
|
|
|
|
| |
llvm-svn: 187834
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
|
|
|
|
|
|
| |
Also factor out the register class lookup to its own function.
llvm-svn: 187830
|
|
|
|
|
|
|
|
|
| |
each corresponding CodeGen.
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
|
|
|
|
|
|
| |
Patch by: Mei Ye
llvm-svn: 187764
|
|
|
|
|
|
|
| |
We use MVT::i32 for the vector index type, because we use 32-bit
operations to caculate offsets when dynamically indexing vectors.
llvm-svn: 187749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions
Tom Stellard:
- Mark vec2 operations as expand. The addition of a vec2 register
class made them all legal.
Patch by: Dmitry Cherkassov
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
|
|
|
|
| |
llvm-svn: 187581
|
|
|
|
| |
llvm-svn: 187580
|
|
|
|
|
|
| |
This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6.
llvm-svn: 187526
|
|
|
|
|
|
| |
This reverts commit 3f1de26cb5cc0543a6a1d71259a7a39d97139051.
llvm-svn: 187524
|
|
|
|
|
|
|
|
|
|
| |
If we merge vector when a vector is used, it will generate an artificial
antidependency that can prevent 2 tex/vtx instructions to use the same
clause and thus generate extra clauses that reduce performance.
There is no test case as such situation is really hard to predict.
llvm-svn: 187516
|
|
|
|
| |
llvm-svn: 187515
|
|
|
|
| |
llvm-svn: 187514
|
|
|
|
|
|
|
|
| |
There are a lot of restrictions on instruction groups that contain
LDS instructions, so for now we will be conservative and not packetize
anything else with them.
llvm-svn: 187513
|
|
|
|
| |
llvm-svn: 187512
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were using two instructions for similar purpose : break and
predicated break. Only predicated_break was emitted and it was
lowered at R600ControlFlowFinalizer to JUMP;CF_BREAK;POP.
This commit simplify the situation by making AMDILCFGStructurizer
emit IF_PREDICATE;BREAK;ENDIF; instead of predicated_break (which
is now removed).
There is no functionality change.
llvm-svn: 187510
|
|
|
|
| |
llvm-svn: 187421
|
|
|
|
|
|
|
| |
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.
llvm-svn: 187397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
|
|
|
|
|
|
|
| |
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.
llvm-svn: 187007
|
|
|
|
|
|
|
|
|
|
| |
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186923
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186922
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186921
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186920
|
|
|
|
|
|
|
|
|
|
| |
This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186918
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before:
MOV * T0.W, KC0[131-128].Y
After:
MOV * T0.W, KC0[3].Y
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186917
|