| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 324497
|
| |
|
|
|
|
|
|
| |
This reverts commit r324487.
It broke clang tests.
llvm-svn: 324494
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note: This is a candidate for LLVM 6.0, because it was planned to be
in that release but was delayed due to a long review period.
Merge conflict in release_60 - resolution:
Add "-p6:32:32" into the second (non-amdgiz) string.
Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.
Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D41651
llvm-svn: 324487
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42756
llvm-svn: 324486
|
| |
|
|
|
|
| |
SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324485
|
| |
|
|
| |
llvm-svn: 324479
|
| |
|
|
| |
llvm-svn: 324477
|
| |
|
|
|
|
|
|
| |
Both operand codes now work the same way in case of register or memory
operands. It print high-order or low-order word in a double-word
register or memory location.
llvm-svn: 324476
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With fixes from rL324341.
Original commit message:
[MergeICmps] Enable the MergeICmps Pass by default.
Summary: Now that PR33325 is fixed, this should always improve the generated code.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42793
llvm-svn: 324465
|
| |
|
|
|
|
|
|
|
| |
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).
Differential Revision: https://reviews.llvm.org/D42973
llvm-svn: 324456
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that happened to end up in GCC.
This is really unfortunate, as the names don't have much rhyme or reason
to them. Originally in the discussions it seemed fine to rely on aliases
to map different names to whatever external thunk code developers wished
to use but there are practical problems with that in the kernel it turns
out. And since we're discovering this practical problems late and since
GCC has already shipped a release with one set of names, we are forced,
yet again, to blindly match what is there.
Somewhat rushing this patch out for the Linux kernel folks to test and
so we can get it patched into our releases.
Differential Revision: https://reviews.llvm.org/D42998
llvm-svn: 324449
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D42737
llvm-svn: 324447
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42152
llvm-svn: 324446
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Some of the commands tries to get the register without checking
if the specified operands is a register and causing crash. All commands
should check the type of the operand first and reject if the type is
not expected.
Reviewers: dsanders, qcolombet
Reviewed By: qcolombet
Subscribers: qcolombet, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42984
llvm-svn: 324442
|
| |
|
|
|
|
|
|
|
|
|
|
| |
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.
2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.
3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.
Differential Revision: https://reviews.llvm.org/D42854
llvm-svn: 324440
|
| |
|
|
|
|
|
|
|
|
| |
be simplified to an equality/inequality or to always true/false.
For example 'ugt X, 0' can be simplified to 'ne X, 0'. Or 'uge X, 0' is always true.
We already simplify this for scalars in SimplifySetCC, but we don't currently for vectors in SimplifySetCC. D42948 proposes to change that.
llvm-svn: 324436
|
| |
|
|
| |
llvm-svn: 324431
|
| |
|
|
|
|
|
|
|
|
|
|
| |
cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero
X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together.
For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa
Differential Revision: https://reviews.llvm.org/D42985
llvm-svn: 324427
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.
The following two lines now both generate equivalent .bss data:
@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for
This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).
Differential Revision: https://reviews.llvm.org/D41705
Patch by varkor!
llvm-svn: 324424
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See D42509 for the original version of this.
Basically, there are two significant changes to behavior here:
- addLiveOuts always adds all pristine registers (even if a block has
no successors).
- addLiveOuts and addLiveOutsNoPristines always add all callee-saved
registers for return blocks (including conditional return blocks).
I cleaned up the functions a bit to make it clear these properties hold.
Differential Revision: https://reviews.llvm.org/D42655
llvm-svn: 324422
|
| |
|
|
|
|
|
|
|
|
|
|
| |
combineCmov tries to remove compares against BSR/BSF if we can prove the input to the BSR/BSF are never zero.
As far as I can tell most of the time codegenprepare despeculates ctlz/cttz and gives us a cttz_zero_undef/ctlz_zero_undef which don't use a cmov.
So the only way I found to trigger this code is to show codegenprepare an illegal type which it won't despeculate.
I think we should be turning ctlz/cttz into ctlz_zero_undef/cttz_zero_undef for these cases before we ever get to operation legalization where the cmov is created. But wanted to add these tests so we don't regress.
llvm-svn: 324409
|
| |
|
|
| |
llvm-svn: 324408
|
| |
|
|
| |
llvm-svn: 324404
|
| |
|
|
| |
llvm-svn: 324403
|
| |
|
|
| |
llvm-svn: 324391
|
| |
|
|
|
|
|
| |
Additionally, verify that the register defined by the producer is a
32-bit register.
llvm-svn: 324381
|
| |
|
|
| |
llvm-svn: 324367
|
| |
|
|
|
|
|
|
|
| |
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.
Differential Revision: https://reviews.llvm.org/D42954
llvm-svn: 324360
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instruction Selection
Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.
As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.
Reviewers: craig.topper, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41293
llvm-svn: 324359
|
| |
|
|
|
|
|
|
| |
Author: Bas Nieuwenhuizen
https://reviews.llvm.org/D42881
llvm-svn: 324353
|
| |
|
|
|
|
|
|
| |
Vector pairs are legal types, but not every operation can work on pairs.
For those operations that are legal for single vectors, generate a concat
of their results on pair halves.
llvm-svn: 324350
|
| |
|
|
|
|
|
|
|
|
| |
It was expanded directly into instructions earlier. That was to avoid
loads from a constant pool for a vector negation: "xor x, splat(i1 -1)".
Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of
all true and all false values, and handle setcc with negations through
selection patterns.
llvm-svn: 324348
|
| |
|
|
|
|
| |
Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37760
Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
|
| |
|
|
|
|
| |
Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching.
llvm-svn: 324339
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
|
| |
|
|
|
|
|
|
| |
Breaks clang-ppc64be-linux-multistage buildbot.
This reverts commit 515bab711f308c2e8299c49dd8c84ea6a2e0b60e.
llvm-svn: 324319
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Now that PR33325 is fixed, this should always improve the generated code.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42793
llvm-svn: 324317
|
| |
|
|
|
|
|
|
| |
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.
I plan to teach DAG combine to optimize these so need to stop using them.
llvm-svn: 324315
|
| |
|
|
|
|
| |
Those should have glc bit set for system and agent synchronization scopes
llvm-svn: 324314
|
| |
|
|
|
|
|
| |
Wasm uses the expand action for several FP compare ops, and that behavior
changed.
llvm-svn: 324305
|
| |
|
|
| |
llvm-svn: 324304
|
| |
|
|
|
|
|
|
| |
This reverts r323297.
It breaks building grub.
llvm-svn: 324301
|
| |
|
|
| |
llvm-svn: 324295
|
| |
|
|
|
|
|
|
| |
sext when AVX512 is enabled.
We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX.
llvm-svn: 324294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
|
| |
|
|
|
|
| |
X86FrameLowering sets stack size to 0 if redzone is enabled.
llvm-svn: 324285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The major visible difference here is that in line-table dumps,
directory and file names are wrapped in double-quotes; previously,
directory names got single quotes and file names were not quoted at
all.
The improvement in this patch is that when a DWARF v5 line table
header has indirect strings, in a verbose dump these will all have
their section[offset] printed as well as the name itself. This
matches the format used for dumping strings in the .debug_info
section.
Differential Revision: https://reviews.llvm.org/D42802
llvm-svn: 324270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold.
This fixes a regression from upcoming D41293 change.
Reviewers: craig.topper, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42808
llvm-svn: 324261
|
| |
|
|
|
|
|
|
|
|
|
|
| |
AND with immediate will match first.
This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate.
This also allows us to match an and to a movzx after a not.
This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT.
llvm-svn: 324260
|