| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Fixes bug 32248.
llvm-svn: 298125
|
|
|
|
|
|
|
|
| |
If the loop condition was an i1 phi with a constantexpr input, this
would add a loop intrinsic fed by a phi dependent on a call to
if.break in the same block. Insert the call in the loop header.
llvm-svn: 298121
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.
It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.
llvm-svn: 298119
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.
Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.
Differential Revision: https://reviews.llvm.org/D30796
llvm-svn: 298104
|
|
|
|
|
|
|
|
|
|
| |
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.
Differential Revision: https://reviews.llvm.org/D31016
llvm-svn: 297958
|
|
|
|
| |
llvm-svn: 297913
|
|
|
|
|
|
|
|
|
| |
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.
llvm-svn: 297873
|
|
|
|
|
|
| |
Newline fixes, early return, range loops.
llvm-svn: 297865
|
|
|
|
| |
llvm-svn: 297846
|
|
|
|
| |
llvm-svn: 297841
|
|
|
|
| |
llvm-svn: 297840
|
|
|
|
| |
llvm-svn: 297662
|
|
|
|
| |
llvm-svn: 297658
|
|
|
|
| |
llvm-svn: 297557
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Differential Revision: https://reviews.llvm.org/D30557
llvm-svn: 297536
|
|
|
|
|
|
|
|
| |
for SI
Differential Revision: https://reviews.llvm.org/D29674
llvm-svn: 297499
|
|
|
|
|
|
|
|
| |
Patch by Guansong Zhang.
Differential Revision: https://reviews.llvm.org/D30750
llvm-svn: 297498
|
|
|
|
|
|
|
|
|
|
|
|
| |
vectorized.
Reviewers:
arsenm
Differential Revision:
http://reviews.llvm.org/D30719
llvm-svn: 297328
|
|
|
|
|
|
|
|
|
|
| |
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
llvm-svn: 297251
|
|
|
|
|
|
|
| |
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.
llvm-svn: 297248
|
|
|
|
|
|
|
|
|
|
| |
Reviewers:
arsenm
Differential Revision:
http://reviews.llvm.org/D22025
llvm-svn: 297243
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297241
|
|
|
|
|
|
|
|
|
|
| |
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.
Somehow, this change causes the build to need Attributes.gen before it's been
generated.
llvm-svn: 297188
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 297177
|
|
|
|
|
|
|
|
| |
It breaks line tables because the patch is not complete, working on a complete one at the moment
This reverts commit r294031.
llvm-svn: 297118
|
|
|
|
|
|
|
|
| |
also exit early on kill instead of redefinition.
Differential Revision: https://reviews.llvm.org/D30230
llvm-svn: 297060
|
|
|
|
| |
llvm-svn: 296901
|
|
|
|
|
|
|
|
| |
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.
llvm-svn: 296873
|
|
|
|
| |
llvm-svn: 296842
|
|
|
|
| |
llvm-svn: 296523
|
|
|
|
|
|
|
|
| |
This is somewhat tricky because there are two
pairs of tied operands, and it isn't allowed to be
VOP3 encoded.
llvm-svn: 296519
|
|
|
|
| |
llvm-svn: 296515
|
|
|
|
| |
llvm-svn: 296513
|
|
|
|
|
|
|
|
|
| |
It's not clear to me if this is always better than
doing ds_write2_b64 This adds the constraint of
a 128-bit register input instead of a pair of
64-bit.
llvm-svn: 296512
|
|
|
|
|
|
|
|
|
|
|
| |
If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.
Differential Revision: https://reviews.llvm.org/D30442
llvm-svn: 296506
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change introduces new method to estimate register pressure in
GCNScheduler. Standard RPTracker gives huge error due to the following
reasons:
1. It does not account for live-ins or live-outs if value is not used
in the region itself. That creates a huge error in a very common case
if there are a lot of live-thu registers.
2. It does not properly count subregs.
3. It assumes a register used as an input operand can be reused as an
output. This is not always possible by itself, this is not what RA
will finally do in many cases for various reasons not limited to RA's
inability to do so, and this is not so if the value is actually a
live-thu.
In addition we can now see clear separation between live-in pressure
which we cannot change with the scheduling and tentative pressure
which we can change.
Differential Revision: https://reviews.llvm.org/D30439
llvm-svn: 296491
|
|
|
|
|
|
|
|
| |
- We do emit amd_kernel_code_t v1.1
Differential Revision: https://reviews.llvm.org/D30433
llvm-svn: 296489
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If two subregs of the same register are defined and we need to revert
schedule changing def order, we will end up with both instructions
having def,read-undef flags because adjustLaneLiveness() will only set
this flag but will not remove it.
Fix this by removing read-undef flags before calling adjustLaneLiveness.
Differential Revision: https://reviews.llvm.org/D30428
llvm-svn: 296484
|
|
|
|
|
|
|
|
| |
subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.
llvm-svn: 296478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar
Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30046
llvm-svn: 296474
|
|
|
|
| |
llvm-svn: 296401
|
|
|
|
| |
llvm-svn: 296396
|
|
|
|
| |
llvm-svn: 296382
|
|
|
|
|
|
|
| |
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.
llvm-svn: 296379
|
|
|
|
|
|
|
| |
Doesn't fix any practical problems because clamp/omod
are currently folded after peephole optimizer.
llvm-svn: 296375
|
|
|
|
| |
llvm-svn: 296372
|
|
|
|
|
|
| |
Mostly useful for writing tests for f16 features.
llvm-svn: 296370
|
|
|
|
|
|
|
|
| |
Add a few non-VOP3P but instructions related to packed.
Includes hack with dummy operands for the benefit of the assembler
llvm-svn: 296368
|
|
|
|
|
|
|
|
|
|
|
| |
- Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it:
.amdgpu_runtime_metadata
{ amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ...
- Make IsaInfo optional, and always emit it.
Differential Revision: https://reviews.llvm.org/D30349
llvm-svn: 296324
|