| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.
This is the second part of https://reviews.llvm.org/D27704
llvm-svn: 290105
|
| |
|
|
|
|
|
|
|
| |
Teach the instruction selector that it's ok to copy small values from physical
registers.
First part of https://reviews.llvm.org/D27704
llvm-svn: 290104
|
| |
|
|
|
|
|
|
|
|
| |
This adds support for lowering more than 4 arguments (although still i32 only).
It uses the handleAssignments / ValueHandler infrastructure extracted from
the AArch64 backend in r288658.
Differential Revision: https://reviews.llvm.org/D27195
llvm-svn: 290098
|
| |
|
|
|
|
|
|
|
|
| |
Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit
scalars only). This will be useful for functions that need to pass arguments on
the stack.
First part of https://reviews.llvm.org/D27195.
llvm-svn: 290096
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original version of the code in XRayInstrumentation.cpp assumed that
functions may not have empty machine basic blocks (or that the first one
couldn't be). This change addresses that by special-casing that specific
situation.
We provide two .mir test-cases to make sure we're handling this
appropriately.
Fixes llvm.org/PR31424.
Reviewers: chandlerc
Subscribers: varno, llvm-commits
Differential Revision: https://reviews.llvm.org/D27913
llvm-svn: 290091
|
| |
|
|
|
|
|
|
| |
Not sure whether it causes and ASAN false positive or whether it
actually leads to incorrect code or whether it even exposes bad code.
Hans, I'll get you instructions to reproduce this.
llvm-svn: 290066
|
| |
|
|
|
|
| |
As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point.
llvm-svn: 290064
|
| |
|
|
|
|
|
|
|
|
|
|
| |
if they are available. This will allow a bunch of patterns to be removed.
These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine.
For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode.
For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now.
llvm-svn: 290060
|
| |
|
|
|
|
|
|
| |
DQI and VLX instructions are available.
This can give the register allocator more registers to use.
llvm-svn: 290057
|
| |
|
|
|
|
| |
for scalars. I missed this in r290049.
llvm-svn: 290055
|
| |
|
|
| |
llvm-svn: 290050
|
| |
|
|
|
|
| |
available. This gives the register allocator more registers to work with.
llvm-svn: 290049
|
| |
|
|
|
|
| |
encoded logic instructions to get more registers to use.
llvm-svn: 290048
|
| |
|
|
|
|
|
|
| |
It is still breaking Chrome. http://llvm.org/PR31361
This reverts commit r290026.
llvm-svn: 290047
|
| |
|
|
|
|
| |
See also test/CodeGen/MIR/README
llvm-svn: 290032
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply r288561: Liveness tracking should be correct now after r290014.
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Differential Revision: https://reviews.llvm.org/D27329
llvm-svn: 290026
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27559
llvm-svn: 290014
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).
Sorry for the churn!
llvm-svn: 289982
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.
Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).
Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).
Differential Revision: https://reviews.llvm.org/D27694
llvm-svn: 289972
|
| |
|
|
|
|
| |
This reverts commit r289951.
llvm-svn: 289960
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.
This targets a pattern that occurs frequently with reference counting pointers:
void decrement(long volatile *ptr) {
if (_InterlockedDecrement(ptr) == 0)
release();
}
Clang would previously compile it (for 32-bit at -Os) as:
00000000 <?decrement@@YAXPCJ@Z>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 31 c9 xor %ecx,%ecx
6: 49 dec %ecx
7: f0 0f c1 08 lock xadd %ecx,(%eax)
b: 83 f9 01 cmp $0x1,%ecx
e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14>
14: c3 ret
and with this patch it becomes:
00000000 <?decrement@@YAXPCJ@Z>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: f0 ff 08 lock decl (%eax)
7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd>
d: c3 ret
(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)
Differential Revision: https://reviews.llvm.org/D27781
llvm-svn: 289955
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289951
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837
This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
llvm-svn: 289946
|
| |
|
|
|
|
| |
v16i32 to VSHUFPS (PR27885)
llvm-svn: 289945
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add the minimal support necessary to select a function that returns the sum of
two i32 values.
This includes some support for argument/return lowering of i32 values through
registers, as well as the handling of copy and add instructions throughout the
GlobalISel pipeline.
Differential Revision: https://reviews.llvm.org/D26677
llvm-svn: 289940
|
| |
|
|
|
|
| |
We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead.
llvm-svn: 289937
|
| |
|
|
| |
llvm-svn: 289936
|
| |
|
|
| |
llvm-svn: 289931
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
tail-call eligibility)
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.
I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.
llvm-svn: 289928
|
| |
|
|
| |
llvm-svn: 289923
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
This reapplies r289902 with additional testcase upgrades.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289920
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
idiom.
r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...
This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.
However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.
Sorry for any trouble or disruption!
llvm-svn: 289916
|
| |
|
|
|
|
| |
This reverts commit 289902 while investigating bot berakage.
llvm-svn: 289906
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289902
|
| |
|
|
|
|
|
|
|
| |
Removing sensitivity to scheduling (by using CHECK-DAG instead of CHECK) and
some other minor corrections.
In preparation to commit Power9 processor model.
llvm-svn: 289900
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.
Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).
llvm-svn: 289891
|
| |
|
|
|
|
|
|
|
|
|
| |
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .
Differential Revision: https://reviews.llvm.org/D27793
llvm-svn: 289882
|
| |
|
|
| |
llvm-svn: 289877
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.
Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.
Differential Revision: https://reviews.llvm.org/D27787
llvm-svn: 289874
|
| |
|
|
|
|
|
|
|
| |
This test is currently sensitive to scheduling. Using CHECK-DAG allows us to
preserve the main purpose of the test and remove this sensivity.
In preparation to commit Power9 processor model.
llvm-svn: 289869
|
| |
|
|
| |
llvm-svn: 289868
|
| |
|
|
|
|
|
|
|
|
|
| |
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837
This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
llvm-svn: 289846
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
My motivating case looks like this:
- vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
- vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
- vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
+ vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.
So the test case diffs all appear to be improvements except one test in
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.
Differential Revision: https://reviews.llvm.org/D27692
llvm-svn: 289837
|
| |
|
|
|
|
| |
As discussed on D27692
llvm-svn: 289834
|
| |
|
|
|
|
|
|
| |
sections specially.
Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation).
llvm-svn: 289832
|
| |
|
|
|
|
|
|
| |
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
|
| |
|
|
| |
llvm-svn: 289822
|
| |
|
|
|
|
| |
This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347.
llvm-svn: 289814
|
| |
|
|
| |
llvm-svn: 289807
|