| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r340959 | mareko | 2018-08-29 22:03:00 +0200 (Wed, 29 Aug 2018) | 9 lines
AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes
Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51203
------------------------------------------------------------------------
llvm-svn: 341351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r340417 | hakzsam | 2018-08-22 18:08:48 +0200 (Wed, 22 Aug 2018) | 14 lines
AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS
v4: - fix compilation issues
- fix out of bounds access
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
------------------------------------------------------------------------
llvm-svn: 341041
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r340416 | hakzsam | 2018-08-22 18:08:43 +0200 (Wed, 22 Aug 2018) | 8 lines
AMDGPU: fix existing alias rules for constant and global
Constant and global may alias, also one rules table wasn't
ordered correctly.
Pinpointed by Matt.
v2: add a test with swapped parameters
------------------------------------------------------------------------
llvm-svn: 341040
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r339600 | scott.linder | 2018-08-13 20:44:21 +0200 (Mon, 13 Aug 2018) | 8 lines
[CodeGen] Fix assert in SelectionDAG::computeKnownBits
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.
Differential Revision: https://reviews.llvm.org/D49574
------------------------------------------------------------------------
llvm-svn: 339664
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r339190 | jvesely | 2018-08-07 23:54:37 +0200 (Tue, 07 Aug 2018) | 12 lines
AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
v_mad_i16 v2, v10, v9, v11
v_med3_i32 v2, v2, v8, v7
Fixes mad_sat(char) piglit on VI.
Differential Revision: https://reviews.llvm.org/D49836
------------------------------------------------------------------------
llvm-svn: 339235
|
|
|
|
|
|
|
|
|
|
|
|
| |
------------------------------------------------------------------------
r338610 | jvesely | 2018-08-01 20:36:07 +0200 (Wed, 01 Aug 2018) | 3 lines
AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS
Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8)
------------------------------------------------------------------------
llvm-svn: 339105
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.
Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71
Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49483
llvm-svn: 338523
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49874
llvm-svn: 338470
|
|
|
|
|
|
|
| |
R600 breaks on too many things to usefully test changes
with ieee_mode on vs. off.
llvm-svn: 338435
|
|
|
|
| |
llvm-svn: 338421
|
|
|
|
|
|
|
| |
This improves code for the same reasons as scalarizing 32-bit
element vectors.
llvm-svn: 338418
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.
This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.
This also avoids the need for the shader input processing code
to manually split vector types.
llvm-svn: 338416
|
|
|
|
| |
llvm-svn: 338382
|
|
|
|
|
|
|
|
|
|
| |
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.
llvm-svn: 338376
|
|
|
|
| |
llvm-svn: 338374
|
|
|
|
|
|
|
|
| |
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.
llvm-svn: 338244
|
|
|
|
| |
llvm-svn: 338243
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.
Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48431
Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44
llvm-svn: 338235
|
|
|
|
|
|
|
|
|
|
| |
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.
llvm-svn: 338197
|
|
|
|
|
|
|
| |
This was trying to replace i8/i16 arguments with i32, which
was broken and no longer necessary.
llvm-svn: 338193
|
|
|
|
|
|
|
|
|
| |
R600 can't handle immediates for BFE, these will be eliminated later.
Fixes powr/pow regressions n r600 since r334817
Differential Revision: https://reviews.llvm.org/D49641
llvm-svn: 338127
|
|
|
|
| |
llvm-svn: 338113
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D49624
llvm-svn: 338102
|
|
|
|
|
|
|
|
|
|
| |
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.
Differential Revision: https://reviews.llvm.org/D49448
llvm-svn: 338060
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49761
llvm-svn: 337938
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49601
llvm-svn: 337798
|
|
|
|
|
|
| |
Reverts r337079 with fix for msan error.
llvm-svn: 337535
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange
intervals when updating.
2 changes. The first to ensure that bogus SubRange Segments aren't propagated when
encountering Segments of the form [1234r, 1234d:0) when preparing to merge value
numbers. These can be removed in this case.
The second forces a shrinkToUses call if SubRanges end on the copy index
(instead of just the parent register).
V2: Addressed review comments, plus MIR test instead of ll test
Subscribers: MatzeB, qcolombet, nhaehnle
Differential Revision: https://reviews.llvm.org/D40308
Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05
llvm-svn: 337273
|
|
|
|
|
|
|
|
| |
If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops.
Differential Revision: https://reviews.llvm.org/D49262
llvm-svn: 337258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
-> fdot2((v2f16)S0, (v2f16)S1, (float)z)
Author: FarhanaAleen
Reviewed By: rampitec, b-sumner
Subscribers: AMDGPU
Differential Revision: https://reviews.llvm.org/D49146
llvm-svn: 337198
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory legalizer, waitcnt, and shrink passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.
Differential Revision: https://reviews.llvm.org/D49288
llvm-svn: 337154
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is almost the same as an existing IR canonicalization in instcombine,
so I'm assuming this is a good early generic DAG combine too.
The motivation comes from reduced bit-hacking for select-of-constants in IR
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html
The PPC and AArch tests show that those targets are already doing something
similar. x86 will be neutral in the minimal case and generally better when
this pattern is extended with other ops as shown in the signbit-shift.ll tests.
Note the asymmetry: we don't include the (extend (ifneg X)) transform because
it already exists in SimplifySelectCC(), and that is verified in the later
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the
general transform to use a shift is always a win because that's a single
instruction.
Alive proofs:
https://rise4fun.com/Alive/ysli
Name: if pos, get -1
%c = icmp sgt i16 %x, -1
%r = sext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = ashr i16 %n, 15
Name: if pos, get 1
%c = icmp sgt i16 %x, -1
%r = zext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = lshr i16 %n, 15
Differential Revision: https://reviews.llvm.org/D48970
llvm-svn: 337130
|
|
|
|
|
|
|
|
| |
This was improved with rL337127, but I missed the failure in this test.
I'm not sure what the expected result will be, so I've generalized it
and added a FIXME comment.
llvm-svn: 337128
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337021.
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
#1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
#2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
#3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
#4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
#5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
#6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
#7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
#8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26
[...]
Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
#0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192
llvm-svn: 337079
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D46172
llvm-svn: 337056
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45882
llvm-svn: 337046
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.
The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.
Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)
llvm-svn: 337021
|
|
|
|
|
|
|
| |
The piece above probably has the same problem, but I need
to try to come up with a test for it.
llvm-svn: 336935
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
llvm-svn: 336868
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See https://reviews.llvm.org/D47106 for details.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D47171
This commit drops that patch's changes to:
llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
llvm/test/CodeGen/NVPTX/param-load-store.ll
For some reason, the dos line endings there prevent me from commiting
via the monorepo. A follow-up commit (not via the monorepo) will
finish the patch.
llvm-svn: 336843
|
|
|
|
|
|
|
|
| |
amdgpu-implicitarg-num-bytes attribute
Differential Revision: https://reviews.llvm.org/D49096
llvm-svn: 336697
|
|
|
|
|
|
|
| |
hsa-metadata-enqueu-kernel.ll ->
hsa-metadata-enqueue-kernel.ll
llvm-svn: 336689
|
|
|
|
|
|
| |
This reverts commit r336623
llvm-svn: 336675
|
|
|
|
|
|
|
| |
This reverts commit r336587, it was causing test failures on the
sanitizer bots.
llvm-svn: 336623
|
|
|
|
|
|
|
|
|
| |
Ensure that, if updating a tied operand pair, to only update
that pair.
Differential Revision: https://reviews.llvm.org/D49052
llvm-svn: 336593
|
|
|
|
|
|
|
|
|
|
| |
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.
Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.
llvm-svn: 336587
|
|
|
|
|
|
| |
Also use verify-machineinstrs.
llvm-svn: 336374
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.
For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.
llvm-svn: 336373
|
|
|
|
|
|
|
|
|
|
|
|
| |
Wait states are not properly being inserted after buffer_store for v_interp instructions.
Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can
check and insert the appropriate wait states when needed.
Differential Revision: https://reviews.llvm.org/D48772
Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64
llvm-svn: 336339
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48777
llvm-svn: 336041
|