| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
X86InterleavedAccess test.
Exapnding the test to include AVX target.
Adding base tast (to trunk) for Store strid=4 vf=32.
llvm-svn: 306543
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.
Reviewers: hfinkel
Subscribers: jholewinski, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34707
llvm-svn: 306541
|
| |
|
|
| |
llvm-svn: 306537
|
| |
|
|
| |
llvm-svn: 306536
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code.
Reviewers: zvi, guyblank, aymanmus, m_zuckerman
Reviewed By: aymanmus
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34605
llvm-svn: 306533
|
| |
|
|
| |
llvm-svn: 306532
|
| |
|
|
|
|
| |
Use triple and attribute only for consistency
llvm-svn: 306531
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.
Majority of the changes in this patch:
1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.
These changes are target independent and described below.
Changed CFI instructions so that they:
1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal
Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.
Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D18046
llvm-svn: 306529
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 306528
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 306525
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds the tests for clamp pattern as a prerequisite of
D33186 to make the impact of that fix more clear and also to document
current behavior.
Reviewers: spatel, jmolloy
Reviewed By: spatel
Subscribers: n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D34350
llvm-svn: 306524
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The benchmarking summarized in
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed
this is beneficial for a wide range of cores.
As is to be expected, quite a few small adaptations are needed to the
regressions tests, as the difference in scheduling results in:
- Quite a few small instruction schedule differences.
- A few changes in register allocation decisions caused by different
instruction schedules.
- A few changes in IfConversion decisions, due to a difference in
instruction schedule and/or the estimated cost of a branch mispredict.
llvm-svn: 306514
|
| |
|
|
|
|
| |
(trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC
llvm-svn: 306510
|
| |
|
|
|
|
|
|
| |
handle icmp eq (trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC"
I accidentally had a extra change in there.
llvm-svn: 306509
|
| |
|
|
|
|
| |
(trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC
llvm-svn: 306508
|
| |
|
|
|
|
|
|
| |
If immediate in shift is less than 32 we can use alignbit too.
Differential Revision: https://reviews.llvm.org/D34729
llvm-svn: 306500
|
| |
|
|
|
|
|
|
|
|
|
| |
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.
Differential Revision: https://reviews.llvm.org/D34723
llvm-svn: 306499
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the llvm part of the initial implementation to support Windows ARM64 COFF format.
I will gradually add more functionality in subsequent patches.
Reviewers: ruiu, rnk, t.p.northover, compnerd
Reviewed By: ruiu, compnerd
Subscribers: aemerson, mgorny, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D34705
llvm-svn: 306490
|
| |
|
|
|
|
|
|
| |
Fixes PR27551.
Differential Revision: https://reviews.llvm.org/D33974
llvm-svn: 306488
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D33973
llvm-svn: 306487
|
| |
|
|
| |
llvm-svn: 306485
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34312
llvm-svn: 306484
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Similar to X86, it should be safe to inline callees if their target-features
are a subset of the caller. This change matches GCC's inlining behavior
with respect to attributes [1].
[1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes
Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover
Reviewed By: t.p.northover
Subscribers: aemerson, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34698
llvm-svn: 306478
|
| |
|
|
|
|
| |
EarlyCSE.
llvm-svn: 306477
|
| |
|
|
|
|
|
| |
Also add IRTranslator support.
https://reviews.llvm.org/D34710
llvm-svn: 306475
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D33341
llvm-svn: 306473
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
llvm-svn: 306471
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Without this check, COPY instructions can actually be one of the generic casts
in disguise. That's confusing and bad.
At some point during ISel this restriction has to be relaxed since the fully
selected instructions will usually use COPY for those purposes. Right now I
think it's possible that relaxation occurs during RegBankSelect (hence the
change there). I'm not convinced that's where it belongs long-term though.
llvm-svn: 306470
|
| |
|
|
| |
llvm-svn: 306468
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34640
llvm-svn: 306466
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34712
llvm-svn: 306464
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34658
llvm-svn: 306461
|
| |
|
|
|
|
|
|
|
| |
The overal size of the data section (including BSS)
is otherwise not included in the wasm binary.
Differential Revision: https://reviews.llvm.org/D34657
llvm-svn: 306459
|
| |
|
|
| |
llvm-svn: 306458
|
| |
|
|
|
|
|
|
|
|
|
|
| |
the constant is a vector splat or the scalar bit width is larger than 64-bits
The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less.
This patch changes it to use m_APInt to remove both these issues
Differential Revision: https://reviews.llvm.org/D34699
llvm-svn: 306457
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34655
llvm-svn: 306449
|
| |
|
|
|
|
|
|
|
| |
BlockAddress are only valid within their function context, which does not
interact well with CodeExtractor. Detect this case and prevent it.
Differential Revision: https://reviews.llvm.org/D33839
llvm-svn: 306448
|
| |
|
|
|
|
|
|
|
|
|
| |
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.
Differential Revision: https://reviews.llvm.org/D34545
llvm-svn: 306446
|
| |
|
|
|
|
|
| |
Account for the fact that both, the feeder and the compare can be moved
over instructions that kill registers.
llvm-svn: 306443
|
| |
|
|
|
|
|
|
| |
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.
llvm-svn: 306441
|
| |
|
|
|
|
|
|
| |
SROA assumes alloca address space is 0, which causes assertion. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D34104
llvm-svn: 306440
|
| |
|
|
|
|
|
|
|
|
| |
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.
Differential Revision: https://reviews.llvm.org/D34500
llvm-svn: 306439
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform.
We have this transform for icmp+br, so unless there's some reason that icmp+select should be
treated differently, we should do the same thing here.
The benefit comes from increasing the chances of creating identical instructions. This is shown in
the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE
can simplify the identical cmps, and then InstCombine can fold the selects together.
The possible regression for the tests in select.ll raises questions about poison/undef:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html
...but that transform is just as likely to be triggered by this canonicalization as it is to be
missed, so we're just pointing out a commutation deficiency in the pattern matching:
https://reviews.llvm.org/rL228409
Differential Revision: https://reviews.llvm.org/D34242
llvm-svn: 306435
|
| |
|
|
|
|
| |
when converting mul by pow2 to shl when the type is larger than 64-bits. NFC
llvm-svn: 306427
|
| |
|
|
|
|
| |
when converting mul by pow2 constant to shl for splat vectors. NFC
llvm-svn: 306426
|
| |
|
|
|
|
|
|
|
|
|
| |
Introducing MOD binary operator
https://msdn.microsoft.com/en-us/library/hha180wt.aspx
Enhancing unary operators NEG and NOT, to support more complex patterns
Differential Revision: https://reviews.llvm.org/D33876
llvm-svn: 306425
|
| |
|
|
| |
llvm-svn: 306420
|
| |
|
|
|
|
|
|
|
|
| |
Not sure why this restriction existed, but it seems like we should support any size Constant here.
The particular pattern in the tests is not the only use of this matcher in the tree. There's one in CodeGenPrepare and one in InstSimplify as well.
Differential Revision: https://reviews.llvm.org/D34666
llvm-svn: 306417
|
| |
|
|
|
|
| |
Looks like I forgot to 'git add' when I submitted the commit. Thanks to Chandler for noticing.
llvm-svn: 306416
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to include the following data:
•static latency
•number of uOps from which the instructions consists
•all ports used by the instruction
Reviewers:
RKSimon
zvi
aymanmus
m_zuckerman
Differential Revision: https://reviews.llvm.org/D33897
llvm-svn: 306414
|