| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: This is a candidate for LLVM 6.0, because it was planned to be
in that release but was delayed due to a long review period.
Merge conflict in release_60 - resolution:
Add "-p6:32:32" into the second (non-amdgiz) string.
Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.
Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D41651
llvm-svn: 324487
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42756
llvm-svn: 324486
|
| |
|
|
|
|
| |
SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324485
|
| |
|
|
| |
llvm-svn: 324477
|
| |
|
|
|
|
|
|
| |
Both operand codes now work the same way in case of register or memory
operands. It print high-order or low-order word in a double-word
register or memory location.
llvm-svn: 324476
|
| |
|
|
|
|
|
|
|
| |
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).
Differential Revision: https://reviews.llvm.org/D42973
llvm-svn: 324456
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that happened to end up in GCC.
This is really unfortunate, as the names don't have much rhyme or reason
to them. Originally in the discussions it seemed fine to rely on aliases
to map different names to whatever external thunk code developers wished
to use but there are practical problems with that in the kernel it turns
out. And since we're discovering this practical problems late and since
GCC has already shipped a release with one set of names, we are forced,
yet again, to blindly match what is there.
Somewhat rushing this patch out for the Linux kernel folks to test and
so we can get it patched into our releases.
Differential Revision: https://reviews.llvm.org/D42998
llvm-svn: 324449
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42152
llvm-svn: 324446
|
| |
|
|
|
|
|
|
|
|
|
|
| |
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.
2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.
3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.
Differential Revision: https://reviews.llvm.org/D42854
llvm-svn: 324440
|
| |
|
|
| |
llvm-svn: 324431
|
| |
|
|
|
|
|
|
|
|
|
|
| |
cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero
X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together.
For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa
Differential Revision: https://reviews.llvm.org/D42985
llvm-svn: 324427
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.
The following two lines now both generate equivalent .bss data:
@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for
This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).
Differential Revision: https://reviews.llvm.org/D41705
Patch by varkor!
llvm-svn: 324424
|
| |
|
|
|
|
|
| |
Fix the modeling of long division and SIMD conversion from integer and
horizontal minimum and maximum.
llvm-svn: 324417
|
| |
|
|
| |
llvm-svn: 324392
|
| |
|
|
| |
llvm-svn: 324391
|
| |
|
|
|
|
|
|
|
| |
It was always using cmpxchg path and in rmw and cmpxchg instructions
are not distinguishable in the BE.
Differential Revision: https://reviews.llvm.org/D42976
llvm-svn: 324383
|
| |
|
|
|
|
|
| |
Additionally, verify that the register defined by the producer is a
32-bit register.
llvm-svn: 324381
|
| |
|
|
|
|
|
|
|
| |
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.
Differential Revision: https://reviews.llvm.org/D42954
llvm-svn: 324360
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instruction Selection
Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.
As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.
Reviewers: craig.topper, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41293
llvm-svn: 324359
|
| |
|
|
|
|
|
|
| |
Author: Bas Nieuwenhuizen
https://reviews.llvm.org/D42881
llvm-svn: 324353
|
| |
|
|
| |
llvm-svn: 324352
|
| |
|
|
|
|
|
|
| |
Vector pairs are legal types, but not every operation can work on pairs.
For those operations that are legal for single vectors, generate a concat
of their results on pair halves.
llvm-svn: 324350
|
| |
|
|
| |
llvm-svn: 324349
|
| |
|
|
|
|
|
|
|
|
| |
It was expanded directly into instructions earlier. That was to avoid
loads from a constant pool for a vector negation: "xor x, splat(i1 -1)".
Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of
all true and all false values, and handle setcc with negations through
selection patterns.
llvm-svn: 324348
|
| |
|
|
|
|
| |
Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37760
Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases.
Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D42295
llvm-svn: 324343
|
| |
|
|
|
|
| |
Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324339
|
| |
|
|
|
|
| |
This patch fixes up my previous commit (add initialization of local variables).
llvm-svn: 324336
|
| |
|
|
|
|
|
| |
This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for
ICH_ELRSR_EL2.
llvm-svn: 324325
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the CSDB instruction, which is a new barrier instruction
described by the whitepaper at [1].
This is in encoding space which was previously executed as a NOP, so it is
available for all targets that have the relevant NOP encoding space. This
matches the binutils behaviour for these instructions [2][3].
[1] https://developer.arm.com/support/security-update
[2] https://sourceware.org/ml/binutils/2018-01/msg00116.html
[3] https://sourceware.org/ml/binutils/2018-01/msg00120.html
llvm-svn: 324324
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
|
| |
|
|
|
|
| |
Those should have glc bit set for system and agent synchronization scopes
llvm-svn: 324314
|
| |
|
|
| |
llvm-svn: 324303
|
| |
|
|
|
|
|
|
| |
This reverts r323297.
It breaks building grub.
llvm-svn: 324301
|
| |
|
|
|
|
|
|
| |
sext when AVX512 is enabled.
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.
llvm-svn: 324294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.
Reviewers: craig.topper, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42808
llvm-svn: 324261
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AND with immediate will match first.
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.
This also allows us to match an and to a movzx after a not.
This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.
llvm-svn: 324260
|
| |
|
|
| |
llvm-svn: 324250
|
| |
|
|
|
|
|
|
|
|
|
|
| |
of a 64 bit mask.
If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops.
This patch teachs shrinkAndImmediate not to break that optimization.
Differential Revision: https://reviews.llvm.org/D42899
llvm-svn: 324249
|
| |
|
|
|
|
|
|
|
|
| |
defining file.
Patch by Dean Sturtevant!
Differential Revision: https://reviews.llvm.org/D42907
llvm-svn: 324245
|
| |
|
|
| |
llvm-svn: 324244
|
| |
|
|
|
|
|
|
| |
The function shuffp2 was breaking up a wide shuffle into a pair of
narrower ones, except that the narrower shuffle masks were actually
uninitialized.
llvm-svn: 324243
|
| |
|
|
|
|
|
| |
Unlike V6_vmpyhv, it produces the result in the exact form that is
expected without the need for a shuffle.
llvm-svn: 324241
|
| |
|
|
|
|
|
|
|
| |
See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154
Differential Revision: https://reviews.llvm.org/D42847
Reviewers: cfang, artem.tamazov, arsenm
llvm-svn: 324237
|
| |
|
|
|
|
|
|
|
|
|
| |
See bugs 36094, 36095:
https://bugs.llvm.org/show_bug.cgi?id=36094
https://bugs.llvm.org/show_bug.cgi?id=36095
Differential Revision: https://reviews.llvm.org/D42692
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr.
But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge:
for (unsigned i = 0; i < TripCount; i++) {
// do something
if (__builtin_expect(check(), 1))
break;
// do something
}
Differential Revision: https://reviews.llvm.org/D42637
llvm-svn: 324229
|
| |
|
|
|
|
|
|
| |
Remove combineBitcastForMaskedOp.
Add test cases for the merge masked versions to make sure we have all those covered.
llvm-svn: 324210
|
| |
|
|
| |
llvm-svn: 324206
|