| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 200934
|
|
|
|
| |
llvm-svn: 200933
|
|
|
|
|
|
|
|
| |
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.
llvm-svn: 200932
|
|
|
|
|
|
|
|
|
| |
The approach is similar to the existing inline-asm reporting, just more
general.
<rdar://problem/15886278>
llvm-svn: 200931
|
|
|
|
|
|
|
|
| |
IOHandler changes, this is now fixed.
<rdar://problem/15962763>
llvm-svn: 200930
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
llvm-svn: 200929
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.
It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.
Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.
rdar://problem/15981990
llvm-svn: 200928
|
|
|
|
| |
llvm-svn: 200927
|
|
|
|
| |
llvm-svn: 200926
|
|
|
|
| |
llvm-svn: 200925
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch introduces several improvements to clang-tidy diagnostic;
1. Make filtering of messages from non-user code more reliable. Output an
error when it or any of the related notes touches user code. This fixes an
assertion when an error has a location in a system header, and one of the
notes relates to user code.
2. In order for 1. to work, subscribe to the static analyzer diagnostics using
a custom PathDiagnosticConsumer.
3. Enable colors on supported terminals.
4. Output FixItHints.
Reviewers: djasper
Reviewed By: djasper
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2714
llvm-svn: 200924
|
|
|
|
|
|
| |
https://code.google.com/p/thread-sanitizer/issues/detail?id=47
llvm-svn: 200923
|
|
|
|
|
|
|
|
| |
on file stream locks.
This should fix https://code.google.com/p/thread-sanitizer/issues/detail?id=47.
llvm-svn: 200922
|
|
|
|
|
|
|
| |
Properly support fields that come from anonymous unions and structs
when used as template arguments for pointer to data member params.
llvm-svn: 200921
|
|
|
|
|
|
|
|
|
|
|
|
| |
Properly determine the inheritance model when dealing with nullptr:
- If a nullptr template argument is being checked against
pointer-to-member parameter, nail down an inheritance model.
N.B. We will chose an inheritance model even if we won't ultimately
choose the template to instantiate! Cooky, right?
- Null pointer-to-datamembers have a virtual base table offset of -1,
not zero. Previously, we chose an offset of 0.
llvm-svn: 200920
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The aim in this patch is to reduce work that VirtRegRewriter needs to do when
telling MachineRegisterInfo which physregs are in use. Up until now
VirtRegRewriter::rewrite has been doing rewriting and populating def info and
then proceeding to set whether a physreg is used based this info for every
physreg that the target provides. This can be expensive when a target has an
unusually high number of supported physregs, and is a noticeable chunk of
compile time for small programs on such targets.
So to reduce compile time, this patch simply adds the use of a SparseSet to the
rewrite function that is used to flag each physreg that is encountered in a
MachineFunction. Afterward, rather than iterating over the set of all physregs
for a given target to set the physregs used in MachineRegisterInfo, the new way
is to iterate over the set of physregs that were actually encountered and set
in the SparseSet. This improves compile time because the existing rewrite
function was iterating over all MachineOperands already, and because the
iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.
llvm-svn: 200919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe VZEXT_MOVL means "zero all vector elements except the first" (and
should have identical input & output types) whereas VZEXT means "zero extend
each element of a vector (discarding higher elements if necessary)".
For example:
(v4i32 (vzext (v16i8 ...)))
should zero extend the low 4 bytes of the incoming vector to 32-bits,
discarding higher bytes.
However, somewhere in the past, these two concepts had become confused, even
leading to a nonsensical VSEXT_MOVL.
This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
-> VZEXT when it's an actual extension).
rdar://problem/15981990
llvm-svn: 200918
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
programs on targets with large register files. The root of the compile time
overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
std::vector uses the faster __platform_bzero to zero out primitive buffers when
assign is called, while SmallVector uses an iterator.
The fix for this was simply to replace the SmallVector with a dynamically
allocated buffer and to initialize or reinitialize the buffer based on the
total registers that the target architecture requires. The changes support
cases where a pass manager may be reused for different targets, and note that
the PhysRegEntries is allocated using calloc mainly for good for, and also to
quite tools like Valgrind (see comments for more info on this).
There is an rdar to track the fact that SmallVector doesn't have platform
specific speedup optimizations inside of it for things like this, and I'll
create a bugzilla entry at some point soon as well.
TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
char>::assign(N, 0) with a call to calloc for N bytes which is much faster
because SmallVector's assign uses iterators.
llvm-svn: 200917
|
|
|
|
|
|
| |
we don't use assembly files
llvm-svn: 200916
|
|
|
|
| |
llvm-svn: 200915
|
|
|
|
| |
llvm-svn: 200914
|
|
|
|
|
|
|
|
|
|
|
| |
large register files. The omission of Queries.clear() is perfectly safe because
LiveIntervalUnion::Query doesn't contain any data that needs freeing and
because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
holding Queries every time it is run, so there's no need to zero out the
queries either. Not having to do this for very large numbers of physregs
is a noticeable constant cost reduction in compilation of small programs.
llvm-svn: 200913
|
|
|
|
| |
llvm-svn: 200911
|
|
|
|
| |
llvm-svn: 200910
|
|
|
|
|
|
|
|
|
|
| |
clang/test/FixIt/fixit-unicode-with-utf8-output.c has begun complained since LLVM r200885.
Although it is changes for StringRef, it brought LLVM_ON_WIN32 to Support/Locale.cpp.
Before r200885, LLVM_ON_WIN32 was undefined in Locale.cpp!
FIXME: We should consider i18n on win32.
llvm-svn: 200909
|
|
|
|
|
|
| |
garbage colection to work with asan's fake stack
llvm-svn: 200908
|
|
|
|
| |
llvm-svn: 200907
|
|
|
|
| |
llvm-svn: 200906
|
|
|
|
|
|
|
|
|
|
|
|
| |
build but spectacularly changed behavior of the C++98 build. =]
This shows my one problem with not having unittests -- basic API
expectations aren't well exercised by the integration tests because they
*happen* to not come up, even though they might later. I'll probably add
a basic unittest to complement the integration testing later, but
I wanted to revive the bots.
llvm-svn: 200905
|
|
|
|
|
|
| |
pointers. Specifically, libc++abi would crash when you tried it.
llvm-svn: 200904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.
However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:
- Be lazy about scanning function definitions. The existing pass eagerly
scans the entire module to build the initial graph. This new pass is
significantly more lazy, and I plan to push this even further to
maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
indirect call from functions whose address is taken. This node creates
a huge choke-point which would preclude good parallelization across
the fanout of the SCC graph when we got to the point of looking at
such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
rather than value handles and tracking call instructions. This will
require explicit update calls instead of some updates working
transparently, but should end up being significantly more efficient.
The explicit update calls ended up being needed in many cases for the
existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
for an SCC which is stable across updates. This is essential for the
new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
an SCC friendly order. This is a much simpler graph structure and
should be more memory dense. It does limit the ways in which it is
appropriate to use this analysis. I wish I had a better name than
"call graph". I've commented extensively this aspect.
This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.
Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.
A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.
llvm-svn: 200903
|
|
|
|
|
|
| |
in my next patch. Sorry for the breakage.
llvm-svn: 200902
|
|
|
|
|
|
|
|
|
| |
necessary until we add analyses to the driver, but I have such an
analysis ready and wanted to split this out. This is actually exercised
by the existing tests of the new pass manager as the analysis managers
are cross-checked and validated by the function and module managers.
llvm-svn: 200901
|
|
|
|
|
|
|
|
|
|
|
|
| |
opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.
llvm-svn: 200900
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the following code:
struct A { static const int sz; };
template<class T> void f() { T arr[A::sz]; }
the array 'arr' is represented as a variable size array in the template.
If 'A::sz' gets value below in the translation unit, the array in
instantiation can turn into constant size array.
This change fixes PR18633.
Differential Revision: http://llvm-reviews.chandlerc.com/D2688
llvm-svn: 200899
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.
As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.
r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.
The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.
Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.
llvm-svn: 200898
|
|
|
|
|
|
|
|
| |
using-declaration, and they declare the same function (either because
the using-declaration is in the same namespace as the declaration it
imports, or because they're both extern "C"), they do not conflict.
llvm-svn: 200897
|
|
|
|
|
|
|
|
| |
the << and >> bitwise operators.
rdar://15975725
llvm-svn: 200896
|
|
|
|
|
|
|
| |
It is only used in MachObjectWriter.cpp. Another leftover from early days
of ELF in MC.
llvm-svn: 200895
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a nop. doesSectionRequireSymbols is only used from
isSymbolLinkerVisible. isSymbolLinkerVisible only use from ELF was in
if (!Asm.isSymbolLinkerVisible(Symbol) && !Symbol.isUndefined())
return false;
if (Symbol.isTemporary())
return false;
If the symbol is a temporary this code returns false and it is irrelevant if
we take the first if or not. If the symbol is not a temporary,
Asm.isSymbolLinkerVisible returns true without ever calling
doesSectionRequireSymbols.
This was an horrible leftover from when support for ELF was first added.
llvm-svn: 200894
|
|
|
|
| |
llvm-svn: 200893
|
|
|
|
|
|
|
|
|
| |
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
|
|
|
|
| |
llvm-svn: 200891
|
|
|
|
| |
llvm-svn: 200890
|
|
|
|
|
|
| |
has a more precise type.
llvm-svn: 200889
|
|
|
|
| |
llvm-svn: 200888
|
|
|
|
|
|
|
| |
On R600, some address spaces have more strict alignment
requirements than others.
llvm-svn: 200887
|
|
|
|
|
|
|
|
| |
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.
llvm-svn: 200886
|
|
|
|
|
|
|
|
|
| |
Now to copy a string into a BumpPtrAllocator and get a StringRef to the copy:
StringRef myCopy = myStr.copy(myAllocator);
llvm-svn: 200885
|
|
|
|
|
|
|
|
|
| |
This option will:
- load the given pch file
- verify it is not out of date by stat'ing dependencies, and
- return 0 on success and non-zero on error
llvm-svn: 200884
|