| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
This partially reverts r215844 by removing test objdump-reloc-shared.test which stated GNU objdump doesn't print relocations, it does.
In executable and shared object ELF files, relocations in the file contain the final virtual address rather than section offset so this is adjusted to display section offset.
Differential revision: http://reviews.llvm.org/D15965
llvm-svn: 263971
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.
The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18282
llvm-svn: 263969
|
| |
|
|
|
|
|
| |
These instructions do not have the same negative base
address problem that DS instructions do on SI.
llvm-svn: 263964
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.
Before:
push {r7, lr}
cmp r0, r2
mov.w r0, #0
mov.w r12, #0
it hs
movhs r0, #1
cmp r1, r3
it ge
movge.w r12, #1
it eq
moveq r12, r0
cmp.w r12, #0
bne .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
After:
push {r7, lr}
subs r0, r0, r2
sbcs.w r0, r1, r3
bge .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
Saves around 80KB in Chromium's libchrome.so.
Some notes on this patch:
- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
introduced (nothing else needs them). However, they are necessary in
order to avoid poor codegen, and they seem similar to existing combines
in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
(brcond Compare)).
- No support for Thumb-1. This is in principle possible, but we'd need
to implement ARMISD::SUBE for Thumb-1.
Differential Revision: http://reviews.llvm.org/D15256
llvm-svn: 263962
|
| |
|
|
|
|
|
|
| |
Adding Cortex-A32 as an available target in the ARM backend.
Patch by Sam Parker.
llvm-svn: 263956
|
| |
|
|
|
|
|
|
| |
Implements "readelf -sW and readelf -DsW"
Differential Revision: http://reviews.llvm.org/D18224
llvm-svn: 263952
|
| |
|
|
| |
llvm-svn: 263948
|
| |
|
|
| |
llvm-svn: 263945
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
replaceCongruentIVs can break LCSSA when trying to replace IV increments
since it tries to replace all uses of a phi node with another phi node
while both of the phi nodes are not necessarily in the processed loop.
This will cause an assert in IndVars.
To fix this, we add a check to make sure that the replacement maintains
LCSSA.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18266
llvm-svn: 263941
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
while processing AND SDNodes
Summary:
extract_vector_elt can cause an implicit any_ext if the types don't
match. When processing the following pattern:
(and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c)
DAGCombine was ignoring the possible extend, and sometimes removing
the AND even though it was required to maintain some of the bits
in the result to 0, resulting in a miscompile.
This change fixes the issue by limiting the transformation only to
cases where the extract_vector_elt doesn't perform the implicit
extend.
Reviewers: t.p.northover, jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18247
llvm-svn: 263935
|
| |
|
|
|
|
| |
"core-avx" does not exist; I changed to "nehalem"
llvm-svn: 263932
|
| |
|
|
|
|
| |
Expanded tests and split into sdiv/srem and udiv/urem cases for 128 and 256 bit vectors.
llvm-svn: 263917
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).
The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.
Reviewers: jholewinski, tra, jlebar
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17965
llvm-svn: 263916
|
| |
|
|
|
|
|
|
| |
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.
Differential Revision: http://reviews.llvm.org/D14261
llvm-svn: 263906
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18211
llvm-svn: 263898
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Also expose getters and setters in the C API, so that the change can be tested.
Reviewers: nhaehnle, axw, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18260
From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
llvm-svn: 263886
|
| |
|
|
|
|
|
|
|
|
| |
The sinpi/cospi can be replaced with sincospi to remove unnecessary
computations. However, we need to make sure that the calls are within
the same function!
This fixes PR26993.
llvm-svn: 263875
|
| |
|
|
|
|
|
|
|
| |
CatchSwitches are not splittable, we cannot insert casts, etc. before
them.
This fixes PR26992.
llvm-svn: 263874
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ThinLTO is relying on linkInModule to import selected function.
However a lot of "magic" was hidden in linkInModule and the IRMover,
who would rename and promote global variables on the fly.
This is moving to an approach where the steps are decoupled and the
client is reponsible to specify the list of globals to import.
As a consequence some test are changed because they were relying on
the previous behavior which was importing the definition of *every*
single global without control on the client side.
Now the burden is on the client to decide if a global has to be imported
or not.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18122
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263863
|
| |
|
|
|
|
|
|
| |
On Rafael's suggestion!
(also fix a discrepancy between this error message format and the others)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263860
|
| |
|
|
|
|
|
|
|
| |
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.
llvm-svn: 263857
|
| |
|
|
|
|
|
| |
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.
llvm-svn: 263856
|
| |
|
|
|
|
|
| |
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.
llvm-svn: 263855
|
| |
|
|
|
|
|
|
|
|
|
|
| |
While not strictly necessary, since we don't support large integer
types, this avoids bugs due to silent truncation from uint64_t to a
32-bit unsigned (e.g. DL.isLegalInteger(DL.getTypeSizeInBits(Ty) )
This fixes PR26972.
Differential Revision: http://reviews.llvm.org/D18258
llvm-svn: 263850
|
| |
|
|
|
|
| |
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
|
| |
|
|
|
|
|
|
|
|
|
| |
The loop on IVOperand's incoming values assumes IVOperand to be an
induction variable on the loop over which `S Pred X` is invariant;
otherwise loop invariant incoming values to IVOperand are not guaranteed
to dominate the comparision.
This fixes PR26973.
llvm-svn: 263827
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing. I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.
Differential Revision: http://reviews.llvm.org/D18048
llvm-svn: 263819
|
| |
|
|
|
|
|
| |
We aren't referencing any other kind of function currently.
Should save a bit on our debug info size.
llvm-svn: 263817
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D14781
llvm-svn: 263802
|
| |
|
|
|
|
| |
be disassembled.
llvm-svn: 263793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18255
llvm-svn: 263792
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).
The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, rivanvx, llvm-commits
Differential Revision: http://reviews.llvm.org/D18151
llvm-svn: 263791
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.
Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18218
llvm-svn: 263790
|
| |
|
|
|
| |
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
|
| |
|
|
|
|
| |
X86 target supporting. NFC.
llvm-svn: 263781
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It can hurt performance to prefetch ahead too much. Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.
Reviewers: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17949
llvm-svn: 263772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
And use this TTI for Cyclone. As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.
I am also adding tests for this and the previous change (D17943):
* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either
Reviewers: hfinkel
Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17945
llvm-svn: 263771
|
| |
|
|
|
|
|
|
|
|
|
|
| |
functions.
A virtual index of -1u indicates that the subprogram's virtual index is
unrepresentable (for example, when using the relative vtable ABI), so do
not emit a DW_AT_vtable_elem_location attribute for it.
Differential Revision: http://reviews.llvm.org/D18236
llvm-svn: 263765
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
llvm-svn: 263753
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks. This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).
The test also covers the bug I had in the initial version in D16712.
The vectorizer did not previously use LoopVersioning. The reason is
that the vectorizer performs its transformations in single shot. It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions. Thus creating an intermediate
versioned scalar loop seems wasteful.
So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.
As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.
Reviewers: hfinkel, nadav, mzolotukhin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17191
llvm-svn: 263744
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop. This allows non-aliasing information to be used by subsequent
passes.
One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops. To vectorize we
version this loop to fully disambiguate may-aliasing accesses. If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop. The overall
performance improves by 18% on our ARM64.
The scoped noalias annotation is added in LoopVersioning. The patch
then enables this for loop distribution. A follow-on patch will enable
it for the vectorizer. Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.
Essentially my approach was to have a separate alias domain for each
versioning of the loop. For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning. By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.
As written, I also have a separate domain for each loop. This is not
necessary and we could save some metadata here by using the same domain
across the different loops. I don't think it's a big deal either way.
Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers. I have plenty of comments
in the tests.
Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions. This is also
why the maps have to become part of the object state.
Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline. Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.
Reviewers: hfinkel, nadav, ashutosh.nema
Subscribers: mssimpso, aemerson, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D16712
llvm-svn: 263743
|
| |
|
|
| |
llvm-svn: 263741
|
| |
|
|
|
|
|
|
|
|
| |
This patch enhances InstCombine to handle following case:
A -> B bitcast
PHI
B -> A bitcast
llvm-svn: 263734
|
| |
|
|
|
|
|
|
| |
Autogenerated from the corresponding assembler tests with a few FIXME added (will fix soon).
Differential Revision: http://reviews.llvm.org/D18249
llvm-svn: 263729
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D17600
llvm-svn: 263727
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.
Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.
Differential Revision: http://reviews.llvm.org/D18234
llvm-svn: 263725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Symmary:
ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.
Reviewers
arsenm, tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18197
llvm-svn: 263720
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18156
llvm-svn: 263719
|
| |
|
|
|
|
|
|
| |
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines
This patch stops the combine if we already have a blend in place (means we miss some domain corrections)
llvm-svn: 263717
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is similar to D18133 where we allowed profile weights on select instructions.
This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects.
A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at:
http://reviews.llvm.org/rL263648
Differential Revision: http://reviews.llvm.org/D18220
llvm-svn: 263716
|