| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
The schedule model is not complete yet, and could be improved.
llvm-svn: 227461
|
|
|
|
|
|
|
|
| |
This is a follow up to r227113.
It is now required to use the amdgcn target for SI and newer GPUs.
llvm-svn: 227316
|
|
|
|
| |
llvm-svn: 227214
|
|
|
|
|
|
|
| |
Fix broken check lines, use multiple check prefixes,
add an additional test for i1 or.
llvm-svn: 227137
|
|
|
|
| |
llvm-svn: 226970
|
|
|
|
|
|
|
|
|
|
|
| |
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.
It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.
llvm-svn: 226945
|
|
|
|
|
|
|
|
| |
v2: add and enable tests for SI
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226881
|
|
|
|
|
|
|
| |
It can help with argument juggling on some targets, and is generally a good
idea.
llvm-svn: 226740
|
|
|
|
|
|
|
| |
Make sure this uses the faster expansion using magic constants
to avoid the full division path.
llvm-svn: 226734
|
|
|
|
| |
llvm-svn: 226713
|
|
|
|
|
|
|
|
|
| |
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.
llvm-svn: 226682
|
|
|
|
|
|
|
|
| |
It hadn't gone through review yet, but was still on my local copy.
This reverts commit r226663
llvm-svn: 226665
|
|
|
|
| |
llvm-svn: 226663
|
|
|
|
| |
llvm-svn: 226596
|
|
|
|
|
|
|
| |
This allows us to re-use the same register for the scratch offset
when accessing large private arrays.
llvm-svn: 226585
|
|
|
|
|
|
|
|
| |
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.
llvm-svn: 226584
|
|
|
|
|
|
| |
This is already covered in ftrunc.ll
llvm-svn: 226412
|
|
|
|
|
|
|
|
| |
These were using different naming schemes,
not using multiple check prefixes and not using
-LABEL.
llvm-svn: 226333
|
|
|
|
| |
llvm-svn: 226230
|
|
|
|
|
|
|
| |
Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.
llvm-svn: 226226
|
|
|
|
| |
llvm-svn: 226197
|
|
|
|
| |
llvm-svn: 226190
|
|
|
|
|
|
|
| |
This reduces coverage for Evergreen, since the more
complete tests have those run lines disabled.
llvm-svn: 225927
|
|
|
|
|
|
|
|
|
| |
Don't do the v4i8 -> v4f32 combine if the load will need to
be expanded due to alignment. This stops adding instructions
to repack into a single register that the v_cvt_ubyteN_f32
instructions read.
llvm-svn: 225926
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.
This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.
This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.
llvm-svn: 225925
|
|
|
|
|
|
|
|
| |
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.
llvm-svn: 225827
|
|
|
|
|
|
|
|
|
| |
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.
llvm-svn: 225822
|
|
|
|
|
|
|
| |
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.
llvm-svn: 225744
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some operands which can take either immediates or registers
and we were previously using different register class to distinguish
between operands that could take immediates and those that could not.
This patch switches to using RegisterOperands which should simplify the
backend by reducing the number of register classes and also make it
easier to implement the assembler.
llvm-svn: 225662
|
|
|
|
|
|
|
|
|
| |
Its functionality has been replaced by calling
SIInstrInfo::legalizeOperands() from
SIISelLowering::AdjstInstrPostInstrSelection() and running the
SIFoldOperands and SIShrinkInstructions passes.
llvm-svn: 225445
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.
Fixed the bug, improved comments and simplified code accordingly.
Testcase by Tom Stellard!
llvm-svn: 225415
|
|
|
|
| |
llvm-svn: 225410
|
|
|
|
|
|
|
| |
Folding the same immediate into multiple instruction will increase
program size, which can hurt performance.
llvm-svn: 225405
|
|
|
|
|
|
|
| |
This is used to simplify the SIFoldOperands pass and make it easier to
fold immediates.
llvm-svn: 225373
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows folding of sequences like:
s[0:1] = s_mov_b64 4
v_add_i32 v0, s0, v0
v_addc_u32 v1, s1, v1
into
v_add_i32 v0, 4, v0
v_add_i32 v1, 0, v1
llvm-svn: 225369
|
|
|
|
| |
llvm-svn: 225310
|
|
|
|
| |
llvm-svn: 225307
|
|
|
|
| |
llvm-svn: 225306
|
|
|
|
| |
llvm-svn: 225305
|
|
|
|
|
|
|
| |
This ensures that all memory operations are complete when all threads
reach the barrier.
llvm-svn: 225290
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.
It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.
llvm-svn: 225277
|
|
|
|
|
|
|
|
|
| |
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.
llvm-svn: 224691
|
|
|
|
|
|
|
| |
If the condition is used for something else, this increases
the number of instructions.
llvm-svn: 224646
|
|
|
|
|
|
|
|
| |
mubuf instructions now define the soffset field using the SCSrc_32
register class which indicates that only SGPRs and inline constants
are allowed.
llvm-svn: 224622
|
|
|
|
| |
llvm-svn: 224458
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that `Metadata` is typeless, reflect that in the assembly. These
are the matching assembly changes for the metadata/value split in
r223802.
- Only use the `metadata` type when referencing metadata from a call
intrinsic -- i.e., only when it's used as a `Value`.
- Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
when referencing it from call intrinsics.
So, assembly like this:
define @foo(i32 %v) {
call void @llvm.foo(metadata !{i32 %v}, metadata !0)
call void @llvm.foo(metadata !{i32 7}, metadata !0)
call void @llvm.foo(metadata !1, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{metadata !3}, metadata !0)
ret void, !bar !2
}
!0 = metadata !{metadata !2}
!1 = metadata !{i32* @global}
!2 = metadata !{metadata !3}
!3 = metadata !{}
turns into this:
define @foo(i32 %v) {
call void @llvm.foo(metadata i32 %v, metadata !0)
call void @llvm.foo(metadata i32 7, metadata !0)
call void @llvm.foo(metadata i32* @global, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{!3}, metadata !0)
ret void, !bar !2
}
!0 = !{!2}
!1 = !{i32* @global}
!2 = !{!3}
!3 = !{}
I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines). I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.
This is part of PR21532.
llvm-svn: 224257
|
|
|
|
|
|
|
|
| |
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.
llvm-svn: 224094
|
|
|
|
|
|
|
|
| |
This is nice for the instruction patterns, but it complicates
min / max matching. The select doesn't have the correct type and would
require looking through the bitcasts for the real float operands.
llvm-svn: 224092
|
|
|
|
|
|
|
|
| |
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.
llvm-svn: 224084
|
|
|
|
| |
llvm-svn: 224067
|