| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
-INT64_MAX as power of 2 minus 1 in the multiply expansion code.
Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap.
llvm-svn: 338101
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the pattern you get from the loop vectorizer for something like this
int16_t A[1024];
int16_t B[1024];
int32_t C[512];
void pmaddwd() {
for (int i = 0; i != 512; ++i)
C[i] = (A[2*i]*B[2*i]) + (A[2*i+1]*B[2*i+1]);
}
In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49636
llvm-svn: 338097
|
| |
|
|
|
|
|
|
| |
handle UpdateNodeOperands finding an existing node to CSE with.
If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node.
llvm-svn: 338090
|
| |
|
|
|
|
|
|
|
|
| |
DAG.setRoot.
Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load.
This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization.
llvm-svn: 338085
|
| |
|
|
|
|
|
|
|
|
| |
Scale the offset of VGPR spills by the wave size when it cannot fit in the
12-bit offset immediate field and so is added to the soffset SGPR. This
accounts for hardware swizzling of scratch memory.
Differential Revision: https://reviews.llvm.org/D49448
llvm-svn: 338060
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Save/restore only registers that are used.
This includes Callee saved registers and Caller saved registers
(arguments and temporaries) for integer and FP registers.
- If there is a call in the interrupt handler, save/restore all
Caller saved registers (arguments and temporaries) and all FP registers.
- Emit special return instructions depending on "interrupt"
attribute type.
Based on initial patch by Zhaoshi Zheng.
Reviewers: asb
Reviewed By: asb
Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D48411
llvm-svn: 338047
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When fusing instructions A and B, we must add all predecessors of B as
predecessors of A to avoid instructions getting scheduling in between.
There is a special case involving ExitSU: Every other node must be
scheduled before it by design and we don't need to make this explicit in
the graph, however when fusing with a different node we need to schedule
every othere node before the fused node too and we need to make this
explicit now: This patch adds a dependency from the fused node to all
roots in the graph.
Differential Revision: https://reviews.llvm.org/D49830
llvm-svn: 338046
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A follow-up for D49266 / rL337166.
At least one of these cases is more canonical,
so we really do have to handle it.
https://godbolt.org/g/pkzP3X
https://rise4fun.com/Alive/pQyhZZ
We won't get to these cases with I1 being -1,
as that will be constant-folded to true or false.
I'm also not sure we actually hit the 'ule' case,
but i think the worst think that could happen is that being dead code.
Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49497
llvm-svn: 338044
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Override getTypeForExtReturn so that functions returning
an i32 typed value have it sign extended on MIPS64.
Also provide patterns to get rid of unneeded sign extensions
for arithmetic instructions which implicitly sign extend
their results.
Differential Revision: https://reviews.llvm.org/D48374
llvm-svn: 338019
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r337951.
While that kind of shared constant generally works fine in a MinGW
setting, it broke some cases of inline assembly that worked before:
$ cat const-asm.c
int MULH(int a, int b) {
int rt, dummy;
__asm__ (
"imull %3"
:"=d"(rt), "=a"(dummy)
:"a"(a), "rm"(b)
);
return rt;
}
int func(int a) {
return MULH(a, 1);
}
$ clang -target x86_64-win32-gnu -c const-asm.c -O2
const-asm.c:4:9: error: invalid variant '00000001'
"imull %3"
^
<inline asm>:1:15: note: instantiated into assembly here
imull __real@00000001(%rip)
^
A similar error is produced for i686 as well. The same test with a
target of x86_64-win32-msvc or i686-win32-msvc works fine.
llvm-svn: 338018
|
| |
|
|
|
|
|
|
|
|
|
|
| |
worklist in combineMul.
I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited.
So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations.
Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now.
llvm-svn: 338007
|
| |
|
|
|
|
| |
We don't currently support these, fall back until we do.
llvm-svn: 337994
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.
However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.
This patch introduce new intrinsic expand infrastructure to memcpy.
To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.
This new memcpy expand infrastructure is gated by a new option:
-bpf-expand-memcpy-in-order
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
|
| |
|
|
|
|
|
|
|
|
|
| |
If the DAGCombiner's rotate matching was working as expected,
I don't think we'd see any test diffs here.
This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.
llvm-svn: 337966
|
| |
|
|
| |
llvm-svn: 337964
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.
This effectively reverts SVN r335918 by hooking the rest of it up
properly.
This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.
Differential Revision: https://reviews.llvm.org/D49646
llvm-svn: 337951
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.
With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4
Differential Revision: https://reviews.llvm.org/D49644
llvm-svn: 337950
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Saves materializing the immediate for the "ands".
Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.
Now implemented as a DAGCombine.
Differential Revision: https://reviews.llvm.org/D49585
llvm-svn: 337945
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.
Reviewed by: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D49805
llvm-svn: 337939
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49761
llvm-svn: 337938
|
| |
|
|
|
|
|
|
| |
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.
llvm-svn: 337934
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D49419
llvm-svn: 337912
|
| |
|
|
|
|
|
|
| |
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).
llvm-svn: 337903
|
| |
|
|
| |
llvm-svn: 337889
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
against v1.2 BCBS attacks directly.
Attacks using spectre v1.2 (a subset of BCBS) are described in the paper
here:
https://people.csail.mit.edu/vlk/spectre11.pdf
The core idea is to speculatively store over the address in a vtable,
jumptable, or other target of indirect control flow that will be
subsequently loaded. Speculative execution after such a store can
forward the stored value to subsequent loads, and if called or jumped
to, the speculative execution will be steered to this potentially
attacker controlled address.
Up until now, this could be mitigated by enableing retpolines. However,
that is a relatively expensive technique to mitigate this particular
flavor. Especially because in most cases SLH will have already mitigated
this. To fully mitigate this with SLH, we need to do two core things:
1) Unfold loads from calls and jumps, allowing the loads to be post-load
hardened.
2) Force hardening of incoming registers even if we didn't end up
needing to harden the load itself.
The reason we need to do these two things is because hardening calls and
jumps from this particular variant is importantly different from
hardening against leak of secret data. Because the "bad" data here isn't
a secret, but in fact speculatively stored by the attacker, it may be
loaded from any address, regardless of whether it is read-only memory,
mapped memory, or a "hardened" address. The only 100% effective way to
harden these instructions is to harden the their operand itself. But to
the extent possible, we'd like to take advantage of all the other
hardening going on, we just need a fallback in case none of that
happened to cover the particular input to the control transfer
instruction.
For users of SLH, currently they are paing 2% to 6% performance overhead
for retpolines, but this mechanism is expected to be substantially
cheaper. However, it is worth reminding folks that this does not
mitigate all of the things retpolines do -- most notably, variant #2 is
not in *any way* mitigated by this technique. So users of SLH may still
want to enable retpolines, and the implementation is carefuly designed to
gracefully leverage retpolines to avoid the need for further hardening
here when they are enabled.
Differential Revision: https://reviews.llvm.org/D49663
llvm-svn: 337878
|
| |
|
|
|
|
|
|
| |
of 2 plus 2/4/8.
The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2.
llvm-svn: 337875
|
| |
|
|
|
|
| |
do for pow2 - 2.
llvm-svn: 337874
|
| |
|
|
|
|
| |
These fit a pattern used by 11, 21, and 19.
llvm-svn: 337871
|
| |
|
|
|
|
| |
These can all be handled with 2 LEAs similar to what we do for 11, 19, 21.
llvm-svn: 337870
|
| |
|
|
|
|
|
|
| |
multiply by 3 and 9 and a subtract.
Same number of operations, but ending in an add is friendlier due to it being commutable.
llvm-svn: 337869
|
| |
|
|
|
|
|
|
|
|
| |
like 31, don't generate a negate of a subtract that we'll never optimize.
We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway.
This patch makes use explicitly emit the swapped subtract.
llvm-svn: 337858
|
| |
|
|
|
|
|
|
|
|
| |
minus 2.
Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA.
Use this for multiply by 14 as well.
llvm-svn: 337856
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently all wasm atomic memory access instructions are sequentially
consistent, so even if LLVM IR specifies weaker orderings than that, we
should upgrade them to sequential ordering and treat them in the same
way.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D49194
llvm-svn: 337854
|
| |
|
|
|
|
| |
The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub.
llvm-svn: 337851
|
| |
|
|
|
|
|
|
|
|
| |
created by mul by constant expansion.
Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away.
We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist.
llvm-svn: 337843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the final DTPREL addition, rather than a lui/daddiu/daddu triple,
LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi
as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64.
Instead, use a new TlsHi node and, although unnecessary due to the exact
structure of the nodes emitted, use TlsHi for local exec too to prevent
future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes,
and TprelHi since its functionality is provided by the new common TlsHi node.
Patch by James Clarke.
Differential revision: https://reviews.llvm.org/D49259
llvm-svn: 337827
|
| |
|
|
|
|
|
|
| |
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.
llvm-svn: 337821
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This test was already checking microscopic behavior of tail call under
specific conditions. This just makes the CHECK lines much more
consistent, clear, and easily updated when intentional changes are made.
I've also switched the test to consistently name the entry block and to
order the helper declarations and comments for specific tests in the
more usual locations.
llvm-svn: 337806
|
| |
|
|
|
|
| |
from the script. This minimizes the diff in subsequent changes.
llvm-svn: 337805
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49601
llvm-svn: 337798
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D49704
llvm-svn: 337771
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SVN r335286, the .xdata sections are produced without an attached
symbol, which requires using a different syntax when printing assembly
output.
Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>',
use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC
uses for all assembly output).
This fixes PR38254.
Differential Revision: https://reviews.llvm.org/D49651
llvm-svn: 337756
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.
Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.
This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.
Differential Revision: https://reviews.llvm.org/D49637
llvm-svn: 337755
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
models"
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.
llvm-svn: 337740
|
| |
|
|
| |
llvm-svn: 337734
|
| |
|
|
|
|
| |
Instead of comparing names, compare positions in the parent module.
llvm-svn: 337723
|
| |
|
|
| |
llvm-svn: 337705
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48809
llvm-svn: 337698
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.
Differential Revision: https://reviews.llvm.org/D48832
llvm-svn: 337687
|
| |
|
|
|
|
|
|
|
| |
a call, and then again as a return.
Also added a comment to try and explain better why we would be doing
what we're doing when hardening the (non-call) returns.
llvm-svn: 337673
|