| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The exciting code is actually already enough to handle the splitting
of vector arguments but we were lacking a test case.
This commit adds a test case for vector argument lowering involving
splitting and enable the related support in call lowering.
llvm-svn: 374589
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Teach buildMerge how to deal with scalar to vector kind of requests.
Prior to this patch, buildMerge would issue either a G_MERGE_VALUES
when all the vregs are scalars or a G_CONCAT_VECTORS when the destination
vreg is a vector.
G_CONCAT_VECTORS was actually not the proper instruction when the source
vregs were scalars and the compiler would assert that the sources must
be vectors. Instead we want is to issue a G_BUILD_VECTOR when we are
in this situation.
This patch fixes that.
llvm-svn: 374588
 | 
| | 
| 
| 
|  | 
llvm-svn: 374582
 | 
| | 
| 
| 
|  | 
llvm-svn: 374579
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Implements the following arithmetic intrinsics:
  - int_aarch64_sve_sdot
  - int_aarch64_sve_sdot_lane
  - int_aarch64_sve_udot
  - int_aarch64_sve_udot_lane
This patch includes tests for the Subdivide4Argument type added by D67549
Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D67551
llvm-svn: 374566
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds a moveAfter method to VPRecipeBase, which can be used to
move elements after other elements, across VPBasicBlocks, if necessary.
Reviewers: dcaballe, hsaito, rengolin, hfinkel
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D46825
llvm-svn: 374565
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The AIX system assembler does not understand .zero, so we should prefer
emitting .space.
Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68815
llvm-svn: 374564
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
ds_[read/write]_addtid_b32
See https://bugs.llvm.org/show_bug.cgi?id=37941
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68787
llvm-svn: 374561
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
buffer_atomic_[fcmpswap/fmin/fmax]*
See https://bugs.llvm.org/show_bug.cgi?id=28232
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68788
llvm-svn: 374559
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
See https://bugs.llvm.org/show_bug.cgi?id=43524
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D68785
llvm-svn: 374557
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.
llvm-svn: 374555
 | 
| | 
| 
| 
|  | 
llvm-svn: 374554
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
See https://bugs.llvm.org/show_bug.cgi?id=43486
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D68350
llvm-svn: 374553
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
If a "double" (64-bit) value has zero low 32-bits, it's possible to load
such value into a GP/FP registers as an instruction immediate. But now
assembler loads only high 32-bits of the value.
For example, if a target register is GPR the `li.d $4, 1.0` instruction
converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000`
in the register. While a correct representation of the `1.0` value is
`0x3FF0000000000000`. The patch fixes that.
Differential Revision: https://reviews.llvm.org/D68776
llvm-svn: 374544
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 374539
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The FileCheck utility is enhanced to support a `--ignore-case`
option. This is useful in cases where the output of Unix tools
differs in case (e.g. case not specified by Posix).
Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D68146
llvm-svn: 374538
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently -verify-scev only fails if there is a constant difference
between two BE counts. This misses a lot of cases.
This patch adds a -verify-scev-strict options, which fails for any
non-zero differences, if used together with -verify-scev.
With the stricter checking, some unit tests fail because
of mis-matches, especially around IndVarSimplify.
If there is no reason I am missing for just checking constant deltas, I
am planning on looking into the various failures.
Reviewers: efriedma, sanjoy.google, reames, atrick
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D68592
llvm-svn: 374535
 | 
| | 
| 
| 
| 
| 
|  | 
Now that its used by isNegatibleForFree we should try to avoid costly deep recursion
llvm-svn: 374534
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
If we insert them from function pass some analysis may be missing or invalid.
Fixes PR42877.
Reviewers: eugenis, leonardchan
Reviewed By: leonardchan
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68832
> llvm-svn: 374481
Signed-off-by: Vitaly Buka <vitalybuka@google.com>
llvm-svn: 374527
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The assertion is everzealous and fail tests like:
  renamable $x3 = LI8 0
  STD renamable $x3, 16, $x1
  renamable $x3 = LI8 0
Remove the assertion since killed flag of $x3 is not mandentory.
Differential Revision: https://reviews.llvm.org/D68344
llvm-svn: 374515
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch recognizes popcount intrinsic according to algorithm from website
  http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
Differential Revision: https://reviews.llvm.org/D68189
llvm-svn: 374512
 | 
| | 
| 
| 
| 
| 
|  | 
saturating truncating store.
llvm-svn: 374509
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP.
InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle.
Differential Revision: https://reviews.llvm.org/D68811
llvm-svn: 374506
 | 
| | 
| 
| 
| 
| 
| 
|  | 
CodeGen/sanitizer-module-constructor.c fails on mac and windows, see e.g.
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/11424
llvm-svn: 374503
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
The original implementation failed to shift the immediate down.
This should fix some of the bot failures due to r374476.
llvm-svn: 374499
 | 
| | 
| 
| 
|  | 
llvm-svn: 374498
 | 
| | 
| 
| 
|  | 
llvm-svn: 374495
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The intended usage is to measure relatively expensive operations. So the
cost of the statistic is negligible compared to the cost of a measured
operation and can be enabled all the time without impairing the
compilation time.
rdar://problem/55715134
Reviewers: dsanders, bogner, rtereshin
Reviewed By: dsanders
Subscribers: hiraditya, jkorous, dexonsmith, ributzka, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68252
llvm-svn: 374490
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
instructions in more situations.
If we don't have VLX we won't end up selecting a saturating
truncate for 256-bit or smaller vectors so we should just use
the pack lowering.
llvm-svn: 374487
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
If we insert them from function pass some analysis may be missing or invalid.
Fixes PR42877.
Reviewers: eugenis, leonardchan
Reviewed By: leonardchan
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D68832
llvm-svn: 374481
 | 
| | 
| 
| 
|  | 
llvm-svn: 374480
 | 
| | 
| 
| 
|  | 
llvm-svn: 374479
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This implementation has support for all relocation types except TLV.
Compact unwind sections are not yet supported, so exceptions/unwinding will not
work.
llvm-svn: 374476
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.
Some additional nits included.
llvm-svn: 374471
 | 
| | 
| 
| 
| 
| 
|  | 
Forgot to integrate this little change in previous commit
llvm-svn: 374463
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When handling the packus pattern for i32->i8 we do a two step
process using a packss to i16 followed by a packus to i8. If the
final i8 step is a type with less than 64-bits the packus step
will return SDValue(), but the i32->i16 step might have succeeded.
This leaves the nodes from the middle step dangling.
Guard against this by pre-checking that the number of elements is
at least 8 before doing the middle step.
With that check in place this should mean the only other
case the middle step itself can fail is when SSE2 is disabled. So
add an early SSE2 check then just assert that neither the middle
or final step ever fail.
llvm-svn: 374460
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Differential Revision: https://reviews.llvm.org/D68739
llvm-svn: 374458
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
It was missing an undef flag.
Differential Revision: https://reviews.llvm.org/D68813
llvm-svn: 374455
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch improves the handling of pointer offset in GEP expressions where
one argument is the base pointer. isPointerOffset() is being used by memcpyopt
where current code synthesizes consecutive 32 bytes stores to one store and
two memset intrinsic calls. With this patch, we convert the stores to one
memset intrinsic.
Differential Revision: https://reviews.llvm.org/D67989
llvm-svn: 374454
 | 
| | 
| 
| 
| 
| 
|  | 
Also, refactor check in `LibCallSimplifier::optimizeLog()`.
llvm-svn: 374453
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Whenever we get the previous definition, the assumption is that the
recursion starts ina  reachable block.
If the recursion starts in an unreachable block, we may recurse
indefinitely. Handle this case by returning LoE if the block is
unreachable.
Resolves PR43426.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68809
llvm-svn: 374447
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Move the default implementations of cache and prefetch queries to
TargetTransformInfoImplBase and delete them from NoTIIImpl.  This brings these
interfaces in line with how other TTI interfaces work.
Differential Revision: https://reviews.llvm.org/D68804
llvm-svn: 374446
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
A common pattern in Windows is to have all your precompiled headers
use an object named stdafx.obj.  If you've got a project with many
different static libs, you might use a separate PCH for each one of
these.
During the final link step, a file from A might reference the PCH
object from A, but it will have the same name (stdafx.obj) as any
other PCH from another project.  The only difference will be the
path.  For example, A might be A/stdafx.obj while B is B/stdafx.obj.
The existing algorithm checks only the filename that was passed on
the command line (or stored in archive), but this is insufficient in
the case where relative paths are used, because depending on the
command line object file / library order, it might find the wrong
PCH object first resulting in a signature mismatch.
The fix here is to simply check whether the absolute path of the
PCH object (which is stored in the input obj file for the file that
references the PCH) *ends with* the full relative path of whatever
is specified on the command line (or is in the archive).
Differential Revision: https://reviews.llvm.org/D66431
llvm-svn: 374442
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Add a specialization to StringMap (actually StringMapEntry) for a
value type of NoneType (the type of llvm::None), and use it for
StringSet. This'll save us a word from every entry in a StringSet,
used for alignment with the size_t that stores the string length.
I could have gone all the way to some kind of empty base class
optimization, but that seemed like overkill. Someone can consider
adding that in the future, though.
https://reviews.llvm.org/D68586
llvm-svn: 374440
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
inputs to be between 0 and 255 when zmm registers are disabled on SKX.
If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp.
Differential Revision: https://reviews.llvm.org/D68763
llvm-svn: 374431
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
r179397 added Parallel.h and implemented it terms of concrt in 2013.
In 2015, a cross-platform implementation of the functions has appeared
and is in use everywhere but on Windows (r232419).  r246219 hints that
<thread> had issues in MSVC2013, but r296906 suggests they've been fixed
now that we require 2015+.
So remove the concrt code. It's less code, and it sounds like concrt has
conceptual and performance issues, see PR41198.
I built blink_core.dll in a debug component build with full symbols and
in a release component build without any symbols.  I couldn't measure a
performance difference for linking blink_core.dll before and after this
patch.
Differential Revision: https://reviews.llvm.org/D68820
llvm-svn: 374421
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Add a helper function getMCSymbolForTOCPseudoMO to clean up PPCAsmPrinter
a little bit.
Differential Revision: https://reviews.llvm.org/D68721
llvm-svn: 374420
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This improves readability of Windows path string literals in LLVM IR.
The LLVM assembler has supported \\ in IR strings for a long time, but
the lexer doesn't tolerate escaped quotes, so they have to be printed as
\22 for now.
llvm-svn: 374415
 | 
| | 
| 
| 
|  | 
llvm-svn: 374413
 | 
| | 
| 
| 
| 
| 
|  | 
Fifth time's the charm.
llvm-svn: 374411
 |