| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 239657
|
|
|
|
| |
llvm-svn: 239377
|
|
|
|
|
|
|
|
|
|
| |
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
llvm-svn: 238108
|
|
|
|
|
|
|
| |
Instead add m0 as an implicit operand. This helps avoid spills
of the m0 register in some cases.
llvm-svn: 237140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead add m0 as an implicit operand. This allows us to avoid using
the M0Reg register class and eliminates a number of unnecessary spills
when using s_sendmsg instructions. This impacts one shader in the
shader-db:
SGPRS: 48 -> 40 (-16.67 %)
VGPRS: 112 -> 108 (-3.57 %)
Code Size: 40132 -> 38796 (-3.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 2048 -> 0 (-100.00 %) bytes per wave
llvm-svn: 237133
|
|
|
|
|
|
| |
fix them
llvm-svn: 236775
|
|
|
|
|
|
|
|
|
| |
changes:
Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
Add location to getConstant
Drop comment about the ops being turned into expand
llvm-svn: 236240
|
|
|
|
|
|
|
|
|
|
|
| |
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"
Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.
llvm-svn: 234768
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: tighten the sub64 tests
v3: rename to CARRY/BORROW
v4: fixup test cmdline
add known bits computation
use sign extend instead of sub 0,x
better add test
v5: remove redundant break
move lowering to separate functions
fix comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: arsenm
llvm-svn: 234759
|
|
|
|
|
|
|
|
|
|
|
| |
This enables a few useful combines that used to only
use fma.
Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.
llvm-svn: 230071
|
|
|
|
|
|
| |
without a Function argument.
llvm-svn: 227638
|
|
|
|
|
|
|
|
| |
v2: add and enable tests for SI
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226881
|
|
|
|
|
|
|
|
|
| |
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.
llvm-svn: 226682
|
|
|
|
|
|
|
|
|
| |
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.
llvm-svn: 225828
|
|
|
|
|
|
|
|
| |
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.
llvm-svn: 225827
|
|
|
|
|
|
|
|
|
| |
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.
llvm-svn: 225822
|
|
|
|
| |
llvm-svn: 225305
|
|
|
|
|
|
|
|
| |
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.
llvm-svn: 224094
|
|
|
|
|
|
|
|
| |
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.
llvm-svn: 224084
|
|
|
|
|
|
|
|
| |
This is so it could potentially be used by SI. However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.
llvm-svn: 222072
|
|
|
|
| |
llvm-svn: 222032
|
|
|
|
| |
llvm-svn: 222015
|
|
|
|
|
|
| |
select_cc is expanded on SI, so this was never matched.
llvm-svn: 221941
|
|
|
|
| |
llvm-svn: 219879
|
|
|
|
| |
llvm-svn: 219038
|
|
|
|
| |
llvm-svn: 219037
|
|
|
|
| |
llvm-svn: 217553
|
|
|
|
|
|
| |
No functionality change. Changes made by clang-tidy + some manual cleanup.
llvm-svn: 217028
|
|
|
|
|
|
|
| |
We can use a negate source modifier to match
this for fsub.
llvm-svn: 216735
|
|
|
|
| |
llvm-svn: 215734
|
|
|
|
|
|
|
|
|
|
| |
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
|
|
|
|
|
|
|
|
|
| |
v2: drop enum keyword
use correct extension mode
don't bother computing the sign in unsinged case
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215461
|
|
|
|
|
|
|
|
|
| |
v2: add tests
rename LowerSDIV24 to LowerSDIVREM24
handle the rem part in this function
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
|
|
|
|
|
|
| |
These will be used in future patches and shouldn't change anything yet.
llvm-svn: 213877
|
|
|
|
| |
llvm-svn: 213551
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
|
|
|
|
|
|
|
|
|
| |
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
llvm-svn: 213031
|
|
|
|
| |
llvm-svn: 212052
|
|
|
|
|
|
| |
No functional change intended.
llvm-svn: 211783
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
llvm-svn: 211637
|
|
|
|
|
|
|
| |
This corresponded to an amdil instruction which there is
a 2 instruction equivalent for.
llvm-svn: 211616
|
|
|
|
| |
llvm-svn: 211519
|
|
|
|
|
|
|
|
| |
v2: move div/rem node replacement to R600ISelLowering
make lowerSDIVREM protected
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211478
|
|
|
|
|
|
|
|
|
|
| |
Instead of separate SDIV/SREM. SDIV used UDIV which in turn used UDIVREM anyway.
SREM used SDIV(UDIV->UDIVREM)+MUL+SUB, using UDIVREM directly is more efficient.
v2: Don't use all caps names
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211477
|
|
|
|
|
|
|
|
| |
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.
llvm-svn: 211247
|
|
|
|
|
|
|
|
| |
The difference from rint isn't really relevant here,
so treat them as equivalent. OpenCL doesn't have nearbyint,
so this is sort of pointless other than for completeness.
llvm-svn: 211229
|
|
|
|
| |
llvm-svn: 211187
|
|
|
|
|
|
| |
CI has instructions for these, so this fixes them for older hardware.
llvm-svn: 211183
|
|
|
|
| |
llvm-svn: 211182
|