| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
These instructions are added to AArch64 only.
llvm-svn: 336913
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A complementary fold to D49179.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://rise4fun.com/Alive/Rny
Caveat: one more thing in `test/Transforms/InstCombine/icmp-logical.ll` breaks.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49205
llvm-svn: 336911
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have located a bug in AssemblyWriter::printModuleSummaryIndex(). This
function outputs path strings incorrectly. Backslashes in the strings
are not correctly escaped.
Consequently, if a path name contains a backslash followed by two
hexadecimal characters, the sequence is incorrectly interpreted when the
output is read by another component. This mangles the path and results
in error.
This patch fixes this issue by calling printEscapedString() to output
the module paths.
Patch by Chris Jackson.
Differential Revision: https://reviews.llvm.org/D49090
llvm-svn: 336908
|
| |
|
|
|
|
|
|
|
|
| |
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.
Original Patch by Zvi Rackover @zvi
Differential Revision: https://reviews.llvm.org/D42044
llvm-svn: 336903
|
| |
|
|
|
|
|
|
|
|
| |
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.
NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.
Differential Revision: https://reviews.llvm.org/D49092
llvm-svn: 336900
|
| |
|
|
|
|
|
|
|
| |
We no longer care about the order of blocks in these collections,
so can change to SmallPtrSets, making contains checks quicker.
Differential revision: https://reviews.llvm.org/D49060
llvm-svn: 336897
|
| |
|
|
|
|
|
|
|
|
|
| |
Mark standard encoded instructions and pseudo "standard encoded"
as not being in MIPS16e by default.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D48379
llvm-svn: 336893
|
| |
|
|
|
|
| |
i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary.
llvm-svn: 336889
|
| |
|
|
| |
llvm-svn: 336887
|
| |
|
|
|
|
| |
We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector.
llvm-svn: 336883
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49216
llvm-svn: 336881
|
| |
|
|
| |
llvm-svn: 336879
|
| |
|
|
|
|
|
|
|
|
| |
information for comparisons of parameters." as it's causing miscompiles.
A testcase was provided in the original review thread.
This reverts commit r336098.
llvm-svn: 336877
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there for a long time.
The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.
This is another bug found testing SLH and extracted to its own focused
patch.
llvm-svn: 336876
|
| |
|
|
|
|
|
|
| |
with zero.
These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now.
llvm-svn: 336875
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiple successors where some of the uses end up killing the EFLAGS
register.
There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/
Test case produces completely nonsensical code w/o this tiny fix.
This was found testing Speculative Load Hardening and split out of that
work.
Differential Revision: https://reviews.llvm.org/D49211
llvm-svn: 336874
|
| |
|
|
|
|
| |
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.
llvm-svn: 336871
|
| |
|
|
|
|
|
|
|
|
| |
This changes `-print-*` from transformation passes to analysis passes so
that `-print-after-all` and `-print-before-all` don't trigger. This
avoids some redundant output.
Patch by Son Tuan Vu!
llvm-svn: 336869
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
llvm-svn: 336868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit suppresses turning loops like this into "(bitwidth - ctlz(input))".
unsigned foo(unsigned input) {
unsigned num = 0;
do {
++num;
input >>= 1;
} while (input != 0);
return num;
}
The loop version returns a value of 1 for both an input of 0 and an input of 1. Converting to a naive ctlz does not preserve that.
Theoretically we could do better if we checked isKnownNonZero or we could insert a select to handle the divergence. But until we have motivating cases for that, this is the easiest solution.
llvm-svn: 336864
|
| |
|
|
|
|
|
|
| |
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first. This fixes msan buildbot failures and was
introduced by r336851.
llvm-svn: 336861
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changes.
Summary:
The move APIs added in this patch will be used to update MemorySSA when CFG changes merge or split blocks, by moving memory accesses accordingly in MemorySSA's internal data structures.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48897
llvm-svn: 336860
|
| |
|
|
|
|
| |
This was added in r336851.
llvm-svn: 336853
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
|
| |
|
|
| |
llvm-svn: 336837
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
https://bugs.llvm.org/show_bug.cgi?id=38123
This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.
https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.
Caveat: this somehow exposes a missing opportunities
in `test/Transforms/InstCombine/icmp-logical.ll`
It seems, the problem is in `foldLogOpOfMaskedICmps()` in `InstCombineAndOrXor.cpp`.
But i'm not quite sure what is wrong, because it calls `getMaskedTypeForICmpPair()`,
which calls `decomposeBitTestICmp()` which should already work for these cases...
As @spatel notes in https://reviews.llvm.org/D49179#1158760,
that code is a rather complex mess, so we'll let it slide.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: yamauchi, majnemer, t.p.northover, llvm-commits
Differential Revision: https://reviews.llvm.org/D49179
llvm-svn: 336834
|
| |
|
|
|
|
|
|
| |
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.
This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.
llvm-svn: 336828
|
| |
|
|
|
|
| |
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.
llvm-svn: 336827
|
| |
|
|
|
|
|
| |
size instead of calculating it again when filling
out the metadata.
llvm-svn: 336825
|
| |
|
|
|
|
|
|
|
| |
Make the DIE iterator bidirectional so we can move to the previous
sibling of a DIE.
Differential revision: https://reviews.llvm.org/D49173
llvm-svn: 336823
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.
r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.
When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.
As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).
The mca tests show the differences in the instruction info flags. Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.
Differential Revision: https://reviews.llvm.org/D49182
llvm-svn: 336818
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Reapplied with fix to only accept 2 different casts if they come from the same source type.
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336812
|
| |
|
|
|
|
|
|
| |
cast instructions.
Reverting due to buildbot failures
llvm-svn: 336806
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336804
|
| |
|
|
|
|
|
| |
Debug uses should not count as real uses, since the presence of debug
information could affect the generated code.
llvm-svn: 336803
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D45245
llvm-svn: 336802
|
| |
|
|
|
|
|
|
|
|
|
| |
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.
Reviewsers: samparker
Differential Revision: https://reviews.llvm.org/D48624
llvm-svn: 336800
|
| |
|
|
|
|
|
|
|
|
| |
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AT_NAME was being emitted before the directory paths were remapped. This
ensures that all paths are remapped before anything is emitted.
An additional test case has been added.
Note that this only works if the replacement string is an absolute path.
If not, then AT_decl_file believes the new path is a relative path, and
joins that path with the compilation directory. I do not know of a good
way to resolve this.
Patch by: Siddhartha Bagaria (starsid)
Differential revision: https://reviews.llvm.org/D49169
llvm-svn: 336793
|
| |
|
|
|
|
|
|
|
|
|
| |
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero.
e.g.
compact z0.s, p0, z1.s
llvm-svn: 336789
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.
The added variants are:
Scalar:
last(a|b) w0, p0, z0.b
last(a|b) w0, p0, z0.h
last(a|b) w0, p0, z0.s
last(a|b) x0, p0, z0.d
SIMD & FP Scalar:
last(a|b) b0, p0, z0.b
last(a|b) h0, p0, z0.h
last(a|b) s0, p0, z0.s
last(a|b) d0, p0, z0.d
The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.
The added variants are:
Scalar:
clast(a|b) w0, p0, w0, z0.b
clast(a|b) w0, p0, w0, z0.h
clast(a|b) w0, p0, w0, z0.s
clast(a|b) x0, p0, x0, z0.d
SIMD & FP Scalar:
clast(a|b) b0, p0, b0, z0.b
clast(a|b) h0, p0, h0, z0.h
clast(a|b) s0, p0, s0, z0.s
clast(a|b) d0, p0, d0, z0.d
Vector:
clast(a|b) z0.b, p0, z0.b, z1.b
clast(a|b) z0.h, p0, z0.h, z1.h
clast(a|b) z0.s, p0, z0.s, z1.s
clast(a|b) z0.d, p0, z0.d, z1.d
Please refer to the architecture specification for more details on
the semantics of the added instructions.
llvm-svn: 336783
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48281
llvm-svn: 336782
|
| |
|
|
|
|
| |
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
|
| |
|
|
| |
llvm-svn: 336777
|
| |
|
|
|
|
|
|
|
|
| |
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
llvm-svn: 336774
|
| |
|
|
| |
llvm-svn: 336773
|
| |
|
|
|
|
|
|
|
|
| |
gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say
'return std::move(thing)' instead of just 'return thing'. All the
json::Arrays and json::Objects that I was implicitly turning into
json::Values by returning them from functions now have explicit
std::move wrappers, so hopefully 4.7 will be happy now.
llvm-svn: 336772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.
The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.
The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.
To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arichardson, labath, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46054
llvm-svn: 336771
|
| |
|
|
|
|
|
| |
This fixes compile error in r336759. llvm::value::dump is not available
in released build.
llvm-svn: 336770
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
bsfl %edi, %ecx
addl $1, %ecx
testl %edi, %edi
movl $0, %eax
cmovnel %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.
We now produce the following code:
bsfl %edi, %ecx
movl $-1, %eax
cmovnel %ecx, %eax
addl $1, %eax
Patch by Ivan Kulagin
Reviewers: davide, craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48765
llvm-svn: 336768
|