| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
There are no scalar extloads.
llvm-svn: 371414
|
|
|
|
|
|
|
| |
Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due
to not be reported as legal.
llvm-svn: 371413
|
|
|
|
| |
llvm-svn: 371412
|
|
|
|
|
|
|
| |
The pointer is always a VGPR. Also fix hardcoding the pointer size to
64.
llvm-svn: 371411
|
|
|
|
| |
llvm-svn: 371409
|
|
|
|
| |
llvm-svn: 371407
|
|
|
|
|
|
| |
This will allow optimization patterns which fold adds away to work.
llvm-svn: 371406
|
|
|
|
| |
llvm-svn: 371404
|
|
|
|
| |
llvm-svn: 371364
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
|
|
|
|
| |
llvm-svn: 371253
|
|
|
|
| |
llvm-svn: 371249
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D66958
llvm-svn: 371214
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes poor scheduling in a function containing a barrier and a few
load instructions.
Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial
edge in the dependency graph from the barrier instruction to the exit
node representing live-out latency, with a latency of about 500 cycles.
Because of this it thinks the critical path through the graph also has
a latency of about 500 cycles. And because of that it does not think
that any of the load instructions are on the critical path, so it
schedules them with no regard for their (80 cycle) latency, which gives
poor results.
Reviewers: arsenm, dstuttard, tpr, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67218
llvm-svn: 371192
|
|
|
|
| |
llvm-svn: 371156
|
|
|
|
|
|
| |
There should probably be a size only matcher.
llvm-svn: 371155
|
|
|
|
|
|
|
| |
Report soffset as a base register if the scratch resource can be
ignored.
llvm-svn: 371149
|
|
|
|
|
|
|
|
|
|
| |
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.
llvm-svn: 371148
|
|
|
|
| |
llvm-svn: 371141
|
|
|
|
|
|
|
|
|
|
|
|
| |
Approximately 30% of the time was spent in the std::vector
constructor. In one testcase this pushes the scheduler to being the
second slowest pass.
I'm not sure I understand why these vector are necessary. The default
scheduler initCandidate seems to use some pre-existing vectors for the
pressure.
llvm-svn: 371136
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
|
|
|
|
|
|
|
| |
The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.
llvm-svn: 371009
|
|
|
|
|
|
| |
Avoids SSA violations in a future patch.
llvm-svn: 371008
|
|
|
|
| |
llvm-svn: 371007
|
|
|
|
| |
llvm-svn: 371006
|
|
|
|
| |
llvm-svn: 370980
|
|
|
|
| |
llvm-svn: 370979
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since an add instruction must produce an unused carry out, this
requires additional SGPRs. This can be avoided by keeping the entire
offset computation in SGPRs. If one SGPR is still available, this only
costs one extra mov. If none are available, the entire computation can
be done in place and reversed.
This does assume the use is a VGPR operand. This was already assumed,
and we currently only select frame indexes to VALU instructions. This
should probably be fixed at some point to handle more possible MIR.
llvm-svn: 370929
|
|
|
|
|
|
| |
This is mostly for the benefit of patterns which use 16-bit constants.
llvm-svn: 370921
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
calling conventions.
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.
Support for splitting for return types not yet implemented.
Differential Revision: https://reviews.llvm.org/D66180
llvm-svn: 370822
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.
Reviewers: nhaehnle, arsenm, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65813
llvm-svn: 370667
|
|
|
|
|
|
|
|
|
|
| |
See AMD SWDEV-157286
Reviewers: atamazov, arsenm
Differential Revision: https://reviews.llvm.org/D65229
llvm-svn: 370665
|
|
|
|
|
|
|
|
|
|
| |
See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745
Reviewers: atamazov, arsenm
https://reviews.llvm.org/D65231
llvm-svn: 370660
|
|
|
|
|
|
|
|
|
|
| |
See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744
Reviewers: atamazov, arsenm
Differential Revision: https://reviews.llvm.org/D65228
llvm-svn: 370652
|
|
|
|
| |
llvm-svn: 370603
|
|
|
|
|
|
|
| |
/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation]
1 warning generated.
llvm-svn: 370596
|
|
|
|
| |
llvm-svn: 370405
|
|
|
|
| |
llvm-svn: 370402
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.
This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).
llvm-svn: 370281
|
|
|
|
|
|
|
|
|
|
|
| |
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
|
|
|
|
|
|
|
|
| |
Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor.
Differential Revision: https://reviews.llvm.org/D66900
llvm-svn: 370250
|
|
|
|
|
|
|
|
|
| |
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.
Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
|
|
|
|
|
|
|
| |
If the result register already had a register class assigned, the
sources may not have been properly constrained.
llvm-svn: 370150
|
|
|
|
| |
llvm-svn: 370140
|
|
|
|
| |
llvm-svn: 370089
|
|
|
|
|
|
|
| |
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.
llvm-svn: 370086
|
|
|
|
|
|
|
|
|
| |
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.
llvm-svn: 369994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.
As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.
llvm-svn: 369991
|
|
|
|
|
|
| |
GCC 5 happy
llvm-svn: 369867
|
|
|
|
| |
llvm-svn: 369857
|