| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.
This is the second part of https://reviews.llvm.org/D27704
llvm-svn: 290105
|
|
|
|
|
|
|
|
|
| |
Teach the instruction selector that it's ok to copy small values from physical
registers.
First part of https://reviews.llvm.org/D27704
llvm-svn: 290104
|
|
|
|
|
|
|
|
|
|
| |
This adds support for lowering more than 4 arguments (although still i32 only).
It uses the handleAssignments / ValueHandler infrastructure extracted from
the AArch64 backend in r288658.
Differential Revision: https://reviews.llvm.org/D27195
llvm-svn: 290098
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functime metadata V2.0
Summary:
Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata.
Between them user can put YAML string that would be directly put to the generated note. E.g.:
'''
.hsa_code_object_metadata
{
amd.MDVersion: [ 2, 0 ]
}
.end_hsa_code_object_metadata
'''
Based on D25046
Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye
Differential Revision: https://reviews.llvm.org/D27619
llvm-svn: 290097
|
|
|
|
|
|
|
|
|
|
| |
Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit
scalars only). This will be useful for functions that need to pass arguments on
the stack.
First part of https://reviews.llvm.org/D27195.
llvm-svn: 290096
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original version of the code in XRayInstrumentation.cpp assumed that
functions may not have empty machine basic blocks (or that the first one
couldn't be). This change addresses that by special-casing that specific
situation.
We provide two .mir test-cases to make sure we're handling this
appropriately.
Fixes llvm.org/PR31424.
Reviewers: chandlerc
Subscribers: varno, llvm-commits
Differential Revision: https://reviews.llvm.org/D27913
llvm-svn: 290091
|
|
|
|
|
|
| |
Sorry!
llvm-svn: 290087
|
|
|
|
|
|
|
| |
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).
llvm-svn: 290086
|
|
|
|
|
|
|
| |
Add test-case that was missing in "[FileCheck] Fix --strict-whitespace
--match-full-lines" commit.
llvm-svn: 290070
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296
I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.
But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.
We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.
I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.
Differential Revision: https://reviews.llvm.org/D24419
llvm-svn: 290067
|
|
|
|
|
|
|
|
| |
Not sure whether it causes and ASAN false positive or whether it
actually leads to incorrect code or whether it even exposes bad code.
Hans, I'll get you instructions to reproduce this.
llvm-svn: 290066
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit on behalf of Gadi Haber
Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified.
The changed encodings are validated with XED.
Rviewers: delena, igorb
Differential revision: https://reviews.llvm.org/D27802
llvm-svn: 290065
|
|
|
|
|
|
| |
As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point.
llvm-svn: 290064
|
|
|
|
|
|
|
|
|
|
|
|
| |
if they are available. This will allow a bunch of patterns to be removed.
These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine.
For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode.
For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now.
llvm-svn: 290060
|
|
|
|
|
|
|
|
| |
DQI and VLX instructions are available.
This can give the register allocator more registers to use.
llvm-svn: 290057
|
|
|
|
|
|
| |
for scalars. I missed this in r290049.
llvm-svn: 290055
|
|
|
|
| |
llvm-svn: 290050
|
|
|
|
|
|
| |
available. This gives the register allocator more registers to work with.
llvm-svn: 290049
|
|
|
|
|
|
| |
encoded logic instructions to get more registers to use.
llvm-svn: 290048
|
|
|
|
|
|
|
|
| |
It is still breaking Chrome. http://llvm.org/PR31361
This reverts commit r290026.
llvm-svn: 290047
|
|
|
|
|
|
| |
See also test/CodeGen/MIR/README
llvm-svn: 290032
|
|
|
|
|
|
|
|
| |
This reverts r289696, which caused TSan perf regression.
See PR31382.
llvm-svn: 290030
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply r288561: Liveness tracking should be correct now after r290014.
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Differential Revision: https://reviews.llvm.org/D27329
llvm-svn: 290026
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27863
llvm-svn: 290017
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27559
llvm-svn: 290014
|
|
|
|
|
|
| |
Feedback on r289468 from Adrian Prantl.
llvm-svn: 290012
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Mach-O command line flag like "-arch armv7m" does not match the
arch name part of its llvm Triple which is "thumbv7m-apple-darwin”.
I think the best way to fix this is to have
llvm::object::MachOObjectFile::getArchTriple() optionally return the
name of the Mach-O arch flag that would be used with -arch that
matches the CPUType and CPUSubType. Then change
llvm::object::MachOUniversalBinary::ObjectForArch::getArchTypeName()
to use that and change it to getArchFlagName() as the type name is
really part of the Triple and the -arch flag name is a Mach-O thing
for a specific Triple with a specific Mcpu value.
rdar://29663637
llvm-svn: 290001
|
|
|
|
|
|
|
| |
The original patch was broken due to some undefined behavior
as well as warnings that were triggering -Werror.
llvm-svn: 290000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When reading the metadata bitcode, create a type declaration when
possible for composite types when we are importing. Doing this in
the bitcode reader saves memory. Also it works naturally in the case
when the type ODR map contains a definition for the same composite type
because it was used in the importing module (buildODRType will
automatically use the existing definition and not create a type
declaration).
For Chromium built with -g2, this reduces the aggregate size of the
generated native object files by 66% (from 31G to 10G). It reduced
the time through the ThinLTO link and backend phases by about 20% on
my machine.
Reviewers: mehdi_amini, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27775
llvm-svn: 289993
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27830
llvm-svn: 289992
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).
Sorry for the churn!
llvm-svn: 289982
|
|
|
|
|
|
|
| |
This reverts commit r289978, which is failing due to some rebase/merge
issues.
llvm-svn: 289981
|
|
|
|
|
|
|
|
|
| |
This is the 3rd of 3 patches to get reading and writing of
CodeView symbol and type records to use a single codepath.
Differential Revision: https://reviews.llvm.org/D26427
llvm-svn: 289978
|
|
|
|
|
|
|
|
| |
This ensures backward compatibility on bitcode loading.
Differential Revision: https://reviews.llvm.org/D27839
llvm-svn: 289977
|
|
|
|
|
|
|
|
| |
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.
llvm-svn: 289975
|
|
|
|
|
|
|
|
|
|
|
|
| |
`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The
only reason it worked at all is that `getMetadataID` returns something
unrelated -- it returns the subclass ID of the receiver (which is used
in `dyn_cast` etc.). That does not numerically match
`LLVMContext::MD_invariant_group` and ends up dropping `invariant_group`
along with every other metadata that does not numerically match
`LLVMContext::MD_invariant_group`.
llvm-svn: 289973
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.
Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).
Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).
Differential Revision: https://reviews.llvm.org/D27694
llvm-svn: 289972
|
|
|
|
| |
llvm-svn: 289967
|
|
|
|
|
|
|
|
|
| |
Reverting because this breaks lld's gdb_index support - it's probably
double counting the abbrev relocation offset.
This reverts commit r289954.
llvm-svn: 289961
|
|
|
|
|
|
| |
This reverts commit r289951.
llvm-svn: 289960
|
|
|
|
| |
llvm-svn: 289959
|
|
|
|
|
|
|
|
|
|
| |
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.
llvm-svn: 289958
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.
Reviewers: davidxl, mehdi_amini, tejohnson
Subscribers: pcc, davide, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27790
llvm-svn: 289957
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.
This targets a pattern that occurs frequently with reference counting pointers:
void decrement(long volatile *ptr) {
if (_InterlockedDecrement(ptr) == 0)
release();
}
Clang would previously compile it (for 32-bit at -Os) as:
00000000 <?decrement@@YAXPCJ@Z>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 31 c9 xor %ecx,%ecx
6: 49 dec %ecx
7: f0 0f c1 08 lock xadd %ecx,(%eax)
b: 83 f9 01 cmp $0x1,%ecx
e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14>
14: c3 ret
and with this patch it becomes:
00000000 <?decrement@@YAXPCJ@Z>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: f0 ff 08 lock decl (%eax)
7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd>
d: c3 ret
(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)
Differential Revision: https://reviews.llvm.org/D27781
llvm-svn: 289955
|
|
|
|
|
|
|
|
|
|
| |
Input can be produced by ld -r, for example (a normal LLVM workflow
never hits this - LLVM only ever produces a single abbrev table in an
object (shared by multiple CUs), so the reloc's always 0, and when it's
linked together the relocation's resolved so it doesn't need to be
handled)
llvm-svn: 289954
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:
Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl
Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
llvm-svn: 289951
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837
This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
llvm-svn: 289946
|
|
|
|
|
|
| |
v16i32 to VSHUFPS (PR27885)
llvm-svn: 289945
|
|
|
|
| |
llvm-svn: 289944
|