| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.
llvm-svn: 263856
|
| |
|
|
|
|
|
| |
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.
llvm-svn: 263855
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid modifying other modules in `AArch64PromoteConstant` when the
constant is `ConstantData` (a horrible accident, I'm sure, caught by an
experimental follow-up to r261464).
Previously, this walked through all the users of a constant, but that
reaches into other modules when the constant doesn't depend transitively
on a `GlobalValue`! Since we're walking instructions anyway, just
modify the instructions we actually see.
As a drive-by, instead of storing `Use` and getting the instructions
again via `Use::getUser()` (which is not a constantant time lookup),
store `std::pair<Instruction, unsigned>`. Besides being cheaper, this
makes it easier to drop use-lists form `ConstantData` in the future.
(I threw this in because I was touching all the code anyway.)
Because the patch completely changes the traversal logic, it looks
like a rewrite of the pass, but the core logic is all the same (or
should be, minus the out-of-module changes). In other words, there
should be NFC as long as the LLVMContext only has a single Module.
I didn't think of a good way to test this, but I hope to submit a patch
eventually that makes walking these use-lists illegal/impossible.
llvm-svn: 263853
|
| |
|
|
|
|
| |
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
|
| |
|
|
|
|
| |
This fixes an issue with rL263658 pointed out by Tom Stellard.
llvm-svn: 263823
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing. I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.
Differential Revision: http://reviews.llvm.org/D18048
llvm-svn: 263819
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18255
llvm-svn: 263792
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).
The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, rivanvx, llvm-commits
Differential Revision: http://reviews.llvm.org/D18151
llvm-svn: 263791
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.
Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18218
llvm-svn: 263790
|
| |
|
|
|
| |
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
|
| |
|
|
| |
llvm-svn: 263775
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It can hurt performance to prefetch ahead too much. Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.
Reviewers: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17949
llvm-svn: 263772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
And use this TTI for Cyclone. As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.
I am also adding tests for this and the previous change (D17943):
* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either
Reviewers: hfinkel
Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17945
llvm-svn: 263771
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This wires up the pass for Cyclone but keeps it off for now because we
need a few more TTIs.
The getPrefetchMinStride value is not very well tuned right now but it
works well with CFP2006/433.milc which motivated this.
Tests will be added as part of the upcoming large-stride prefetching
patch.
Reviewers: t.p.northover
Subscribers: llvm-commits, aemerson, hfinkel, rengolin
Differential Revision: http://reviews.llvm.org/D17943
llvm-svn: 263770
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
llvm-svn: 263753
|
| |
|
|
| |
llvm-svn: 263741
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D17600
llvm-svn: 263727
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.
Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.
Differential Revision: http://reviews.llvm.org/D18234
llvm-svn: 263725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Symmary:
ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.
Reviewers
arsenm, tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18197
llvm-svn: 263720
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18156
llvm-svn: 263719
|
| |
|
|
|
|
|
|
| |
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines
This patch stops the combine if we already have a blend in place (means we miss some domain corrections)
llvm-svn: 263717
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The two changes together weakened the test and caused a regression with division
handling in MSVC mode. They were applied to avoid an assertion being triggered
in the block frequency analysis. However, the underlying problem was simply
being masked rather than solved properly. Address the actual underlying problem
and revert the changes. Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.
The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them. This would result in assertions being triggered
in the block frequency analysis pass.
Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code. Update this with new tests that actually ensure that we do not
regress on the code generation.
llvm-svn: 263714
|
| |
|
|
|
|
|
|
|
| |
That allows, for example, to print hex-formatted immediates using
llvm-objdump --print-imm-hex command line option.
Differential Revision: http://reviews.llvm.org/D18195
llvm-svn: 263704
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
warnings, NFC
Summary:
This should eliminate all occurrences of this within LLVMMipsAsmParser.
This patch is in response to http://reviews.llvm.org/D17983. I was unable
to reproduce the warnings on my machine so please advise if this fixes the
warnings.
Reviewers: ariccio, vkalintiris, dsanders
Subscribers: dblaikie, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18087
llvm-svn: 263703
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
'__sync' libcalls and '__atomic' libcalls, and this function is for
the '__sync' ones.
- getInsertFencesForAtomic() has been replaced with
shouldInsertFencesForAtomic(Instruction), so that the decision can be
made per-instruction. This functionality will be used soon.
- emitLeadingFence/emitTrailingFence are no longer called if
shouldInsertFencesForAtomic returns false, and thus don't need to
check the condition themselves.
llvm-svn: 263665
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18137
llvm-svn: 263658
|
| |
|
|
|
|
| |
have been appended to the end.
llvm-svn: 263657
|
| |
|
|
| |
llvm-svn: 263646
|
| |
|
|
|
|
|
|
|
|
|
|
| |
And emit an error if it fails.
This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.
This is a candidate for the stable branch(es).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
|
| |
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
|
| |
|
|
|
|
|
|
| |
512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204
llvm-svn: 263624
|
| |
|
|
|
|
| |
Similarly to what was done for TLSCALL in r263515.
llvm-svn: 263564
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.
Reviewers:
arsenm
tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18064
llvm-svn: 263563
|
| |
|
|
| |
llvm-svn: 263557
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
`MCSymbolRefExpr` variant kind for TLSCALL is prefixed with
_ARM_ since this is how it was originally implemented.
The X86_64 version is exactly the same so there's no reason
to create a new variant, we can just rename the existing
one to be machine-independent.
This generalization is the first step to implement support
for GNU2 TLS dialect in MC.
Differential Revision: http://reviews.llvm.org/D18160
llvm-svn: 263515
|
| |
|
|
|
|
|
|
|
| |
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.
This reverts commit r263303.
llvm-svn: 263512
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Some instructions were missing isBranch, isCall, or isTerminator
flags. This didn't really affect code generation since most of
the affected patterns were used only for the AsmParser and/or
disassembler.
However, it could affect tools using the MC layer to disassemble
and parse binary code (e.g. via MCInstrDesc::mayAffectControlFlow).
llvm-svn: 263478
|
| |
|
|
|
|
|
| |
http://reviews.llvm.org/D18125
Patch by Aditya Kumar.
llvm-svn: 263461
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.
http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.
llvm-svn: 263458
|
| |
|
|
| |
llvm-svn: 263454
|
| |
|
|
| |
llvm-svn: 263453
|
| |
|
|
| |
llvm-svn: 263448
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17543
llvm-svn: 263447
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.
1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.
Differential Revision: http://reviews.llvm.org/D18094
llvm-svn: 263446
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.
It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.
Patch by Simon Dardis.
Reviewers: vkalintiris, dsanders
Subscribers: MatzeB, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16353
llvm-svn: 263444
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D18058
llvm-svn: 263441
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18101
llvm-svn: 263440
|
| |
|
|
| |
llvm-svn: 263438
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).
Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE. However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.
This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.
llvm-svn: 263431
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17760
llvm-svn: 263428
|