| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
| |
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.
Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.
llvm-svn: 334132
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.
Differential Revision: https://reviews.llvm.org/D44928
llvm-svn: 334078
|
| |
|
|
|
|
|
|
| |
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.
llvm-svn: 334045
|
| |
|
|
|
|
| |
Apply to i8 vectors.
llvm-svn: 334044
|
| |
|
|
|
|
|
|
|
|
| |
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
|
| |
|
|
|
|
|
|
| |
After last changes some code can be simplified.
Differential Revision: https://reviews.llvm.org/D47661
llvm-svn: 333934
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47664
llvm-svn: 333931
|
| |
|
|
|
|
|
|
| |
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.
Differential Revision: https://reviews.llvm.org/D46616
llvm-svn: 333926
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Avoid name clashes with the corresponding bit fields in the instruction
encoding.
Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f
Reviewers: arsenm, rampitec, kzhuravl
Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47433
llvm-svn: 333905
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The new rules are straightforward. The main rules to keep in mind
are:
1. NAME is an implicit template argument of class and multiclass,
and will be substituted by the name of the instantiating def/defm.
2. The name of a def/defm in a multiclass must contain a reference
to NAME. If such a reference is not present, it is automatically
prepended.
And for some additional subtleties, consider these:
3. defm with no name generates a unique name but has no special
behavior otherwise.
4. def with no name generates an anonymous record, whose name is
unique but undefined. In particular, the name won't contain a
reference to NAME.
Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.
The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.
Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.
Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.
Change-Id: I694095231565b30f563e6fd0417b41ee01a12589
Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47430
llvm-svn: 333900
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.
Target that uses these opcodes are changed in order to ensure their behavior doesn't change.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D47422
llvm-svn: 333748
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D47487
llvm-svn: 333720
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.
This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.
Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.
Differential Revision: https://reviews.llvm.org/D47511
llvm-svn: 333691
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LegalizerInfo::verify(...) call for AMDGPU
Reviewers: aemerson, qcolombet
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D46339
llvm-svn: 333664
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:
- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.
MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.
New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.
Differential Revision: https://reviews.llvm.org/D47509
llvm-svn: 333629
|
| |
|
|
|
|
|
|
|
|
|
| |
v2: use "ensureAlignment"
make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"
Differential Revision: https://reviews.llvm.org/D47516
llvm-svn: 333622
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D47359
llvm-svn: 333605
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/rL333556 caused a buildbot failure.
See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio
/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),
The unused variable was for debugging purposes; removing that piece of code
to fix the build.
llvm-svn: 333559
|
| |
|
|
|
|
|
|
|
|
| |
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.
llvm-svn: 333558
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.
Differential Revision: https://reviews.llvm.org/D47488
llvm-svn: 333556
|
| |
|
|
| |
llvm-svn: 333457
|
| |
|
|
|
|
|
|
|
|
| |
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.
llvm-svn: 333456
|
| |
|
|
|
|
|
|
| |
it is set by CP
Differential Revision: https://reviews.llvm.org/D47392
llvm-svn: 333451
|
| |
|
|
|
|
|
| |
These functions just query the underlying IR function,
so pass it directly.
llvm-svn: 333442
|
| |
|
|
| |
llvm-svn: 333441
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D47307
llvm-svn: 333439
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary:
V2: Use cast instead of extra if.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47426
Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a
llvm-svn: 333397
|
| |
|
|
|
|
|
|
|
|
| |
default.
Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found
to be resolved by some other fixes.
Author: FarhanaAleen
llvm-svn: 333380
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.
This commit fixes that.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47027
Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6
llvm-svn: 333362
|
| |
|
|
|
|
|
|
| |
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.
Differential Revision: https://reviews.llvm.org/D47378
llvm-svn: 333303
|
| |
|
|
|
|
|
| |
We shall not keep iterator to a map while map is modified,
this leads to a broken map.
llvm-svn: 333298
|
| |
|
|
| |
llvm-svn: 333291
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is adoption of HSAIL perfhint pass. Two types of hints are produced:
1. Function is memory bound.
2. Kernel can use wave limiter.
Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.
Differential Revision: https://reviews.llvm.org/D46992
llvm-svn: 333289
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.
The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.
V2: Addressed some review comments.
V3: Test fixes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44046
Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp,
so we don't need a header file.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47264
llvm-svn: 333254
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D47245
llvm-svn: 333219
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47181
llvm-svn: 333153
|
| |
|
|
|
|
|
|
|
|
| |
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.
llvm-svn: 333019
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is always false for R600.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47180
llvm-svn: 333016
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.
This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.
I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46272
llvm-svn: 332930
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47045
llvm-svn: 332868
|
| |
|
|
|
|
|
|
|
|
| |
AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage
but does not list it in pass dependencies which may lead to
crash.
Differential Revision: https://reviews.llvm.org/D47151
llvm-svn: 332862
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47035
llvm-svn: 332857
|
| |
|
|
|
|
| |
MCRegisterInfo::getPhysRegSize() will be deprecated.
llvm-svn: 332856
|
| |
|
|
|
|
| |
AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance.
llvm-svn: 332807
|
| |
|
|
|
|
|
|
|
|
|
| |
Eliminate loads from the dispatch packet when they will have
a known value.
Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.
llvm-svn: 332771
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47032
llvm-svn: 332757
|
| |
|
|
|
|
| |
EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL.
llvm-svn: 332753
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.
Part of PR37466.
Differential Revision: https://reviews.llvm.org/D47050
llvm-svn: 332749
|