| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
|  | 
llvm-svn: 352166
 | 
| | 
| 
| 
|  | 
llvm-svn: 352165
 | 
| | 
| 
| 
|  | 
llvm-svn: 352162
 | 
| | 
| 
| 
|  | 
llvm-svn: 352161
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.
With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.
llvm-svn: 352160
 | 
| | 
| 
| 
| 
| 
|  | 
Also legalize 64-bit compares for AMDGPU
llvm-svn: 352157
 | 
| | 
| 
| 
|  | 
llvm-svn: 352155
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).
Differential Revision: https://reviews.llvm.org/D57205
llvm-svn: 352152
 | 
| | 
| 
| 
|  | 
llvm-svn: 352143
 | 
| | 
| 
| 
| 
| 
|  | 
in assert
llvm-svn: 352133
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch exploits the instructions that store a single element from a vector
to preform a (store (extract_elt)). We already have code that does this with
ISA 3.0 instructions that were added to handle i8/i16 types. However, we had
never exploited the existing ones that handle f32/f64/i32/i64 types.
Differential revision: https://reviews.llvm.org/D56175
llvm-svn: 352131
 | 
| | 
| 
| 
|  | 
llvm-svn: 352130
 | 
| | 
| 
| 
|  | 
llvm-svn: 352129
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
As noted in D57156, we want to check at least part of
this pattern earlier (in combining), so this will allow
the code to be shared instead of duplicated.
llvm-svn: 352127
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
https://reviews.llvm.org/D57178
Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.
reviewed by: arsenm
llvm-svn: 352126
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Werror bots didn't like the lambda + assert thing in my previous commit.
Capture everything to suppress the error.
Example failure here:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393
llvm-svn: 352124
 | 
| | 
| 
| 
|  | 
llvm-svn: 352123
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
PDBs contain several serialized hash tables. In the microsoft-pdb
repo published to support LLVM implementing PDB support, the
provided initializes the bucket count for the TPI and IPI streams
to the maximum size. This occurs in tpi.cpp L33 and tpi.cpp L398.
In the LLVM code for generating PDBs, these streams are created with
minimum number of buckets. This difference makes LLVM generated
PDBs slower for when used for debugging.
Patch by C.J. Hebert
Differential Revision: https://reviews.llvm.org/D56942
llvm-svn: 352117
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.
To do this, this patch...
- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors
It also adds
- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES
And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.
https://reviews.llvm.org/D56682
llvm-svn: 352113
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Using COFF's .def directive in module assembly used to crash ThinLTO
with "this directive only supported on COFF targets" when getting
symbol information in ModuleSymbolTable.  This change allows
ModuleSymbolTable to process such code and adds a test to verify that
the .def directive has the desired effect on the native object file,
with and without ThinLTO.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36789
Reviewers: rnk, pcc, vlad.tsyrklevich
Subscribers: mehdi_amini, eraman, hiraditya, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D57073
llvm-svn: 352112
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
A volatile operation cannot be used to prove an address points to normal
memory.  (LangRef was recently updated to state it explicitly.)
Differential Revision: https://reviews.llvm.org/D57040
llvm-svn: 352109
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
guessLibraryShortName() separates a full Mach-O dylib install name path
into a short name and a dyld image suffix. The short name is the name
of the dylib without its path or extension. The dyld image suffix is a
string used by dyld to load variants of dylibs if available at runtime;
for example, "when binding this process, load 'debug' variants of all
required dylibs." dyld knows exactly what the image suffix is, but
by convention diagnostic tools such as llvm-nm attempt to guess suffix
names by looking at the install name path. 
These dyld image suffixes are separated from the short name by a '_'
character. Because the '_' character is commonly used to separate words
in filenames guessLibraryShortName() cannot reliably separate a dylib's
short name from an arbitrary image suffix; imagine if both the short
name and the suffix contains an '_' character! To better deal with this
ambiguity, guessLibraryShortName() will recognize only "_debug" and
"_profile" as valid Suffix values. Calling code needs to be tolerant of
guessLibraryShortName() guessing incorrectly.
The previous implementation of guessLibraryShortName() did not allow
'_' characters to appear in short names. When present, the short name
would be  truncated, e.g., "libcompiler_rt" => "libcompiler". This
change allows "libcompiler_rt" and "libcompiler_rt_debug" to both be
recognized as "libcompiler_rt".
rdar://47412244
Reviewers: kledzik, lhames, pete
Reviewed By: pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56978
llvm-svn: 352104
 | 
| | 
| 
| 
|  | 
llvm-svn: 352098
 | 
| | 
| 
| 
|  | 
llvm-svn: 352093
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.
Reviewers: jnspaulsson
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D57176
llvm-svn: 352092
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Performing splitting early has several advantages:
  - Inhibiting inlining of cold code early improves code size. Compared
    to scheduling splitting at the end of the pipeline, this cuts code
    size growth in half within the iOS shared cache (0.69% to 0.34%).
  - Inhibiting inlining of cold code improves compile time. There's no
    need to inline split cold functions, or to inline as much *within*
    those split functions as they are marked `minsize`.
  - During LTO, extra work is only done in the pre-link step. Less code
    must be inlined during cross-module inlining.
An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.
The disadvantages are:
  - Some opportunities for splitting out cold code may be missed. This
    gap can potentially be narrowed by adding a worklist algorithm to the
    splitting pass.
  - Some opportunities to reduce code size may be lost (e.g. store
    sinking, when one side of the CFG diamond is split). This does not
    outweigh the code size benefits of splitting earlier.
On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.
This reverses course on the decision made to schedule splitting late in
r344869 (D53437).
Differential Revision: https://reviews.llvm.org/D57082
llvm-svn: 352080
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This wasn't consistent within the file, so made it harder to search.
Standardize on the shorter name to save some typing.
llvm-svn: 352077
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
It should be emitted when any floating-point operations (including
calls) are present in the object, not just when calls to printf/scanf
with floating point args are made.
The difference caused by this is very subtle: in static (/MT) builds,
on x86-32, in a program that uses floating point but doesn't print it,
the default x87 rounding mode may not be set properly upon
initialization.
This commit also removes the walk of the types pointed to by pointer
arguments in calls. (To assist in opaque pointer types migration --
eventually the pointee type won't be available.)
That latter implies that it will no longer consider a call like
`scanf("%f", &floatvar)` as sufficient to emit _fltused on its
own. And without _fltused, `scanf("%f")` will abort with error R6002. This
new behavior is unlikely to bite anyone in practice (you'd have to
read a float, and do nothing with it!), and also, is consistent with
MSVC.
Differential Revision: https://reviews.llvm.org/D56548
llvm-svn: 352076
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
presence of `noreturn` calls"
This reverts commit cea84ab93aeb079a358ab1c8aeba6d9140ef8b47.
llvm-svn: 352069
 | 
| | 
| 
| 
|  | 
llvm-svn: 352067
 | 
| | 
| 
| 
|  | 
llvm-svn: 352066
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This is the most common usage for isUndefInRange, 
so make the code slightly less duplicated and more 
readable.
llvm-svn: 352063
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.
Differential Revision: https://reviews.llvm.org/D57161
llvm-svn: 352061
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.
The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.
This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.
Differential Revision: https://reviews.llvm.org/D57138
llvm-svn: 352059
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
expandFixedPointMul. NFCI.
Match the (much shorter) name used in various legalization methods.
llvm-svn: 352056
 | 
| | 
| 
| 
|  | 
llvm-svn: 352053
 | 
| | 
| 
| 
|  | 
llvm-svn: 352051
 | 
| | 
| 
| 
| 
| 
|  | 
Added x86 scalar sadd_with_overflow/ssub_with_overflow costs.
llvm-svn: 352045
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Added x86 scalar uadd_with_overflow/usub_with_overflow costs.
Differential Revision: https://reviews.llvm.org/D56907
llvm-svn: 352043
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This reverts commit a6982414edf315c39ae93f3c3322476217119e99 (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio
llvm-svn: 352041
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Select zero extending and sign extending load for MIPS32.
Use size from MachineMemOperand to determine number of bytes to load.
Differential Revision: https://reviews.llvm.org/D57099
llvm-svn: 352038
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Use CombinerHelper to combine extending load instructions.
G_LOAD combined with G_ZEXT, G_SEXT or G_ANYEXT gives G_ZEXTLOAD,
G_SEXTLOAD or G_LOAD with same type as def of extending instruction
respectively.
Similarly G_ZEXTLOAD combined with G_ZEXT gives G_ZEXTLOAD and
G_SEXTLOAD combined with G_SEXT gives G_SEXTLOAD with same type
as def of extending instruction.
Differential Revision: https://reviews.llvm.org/D56914
llvm-svn: 352037
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.
I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.
Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D57092
llvm-svn: 352036
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This reapplies commit r351987 with a failed test fix. Now the test
accepts both DW_OP_GNU_push_tls_address and DW_OP_form_tls_address
opcode.
Original commit message:
```
  This is a fix for a regression introduced by the rL348194 commit. In
  that change new type (MEK_DTPREL) of MipsMCExpr expression was added,
  but in some places of the code this type of expression considered as
  unexpected.
  This change fixes the bug. The MEK_DTPREL type of expression is used for
  marking TLS DIEExpr only and contains a regular sub-expression. Where we
  need to handle the expression, we retrieve the sub-expression and
  handle it in a common way.
```
llvm-svn: 352034
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
After creating new PHI instructions during isel pseudo expansion, the NoPHIs
property of MF should be reset in case it was previously set.
Review: Ulrich Weigand
llvm-svn: 352030
 | 
| | 
| 
| 
| 
| 
|  | 
flag for masked loads. Add truncating and compressing for masked stores.
llvm-svn: 352029
 | 
| | 
| 
| 
| 
| 
|  | 
into a truncating masked store.
llvm-svn: 352027
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
  in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
  by constant and the successor is its always taken block.
What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.
Thanks Jordan Rupprech for reporting this!
Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht
llvm-svn: 352024
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
reading/editing
Recommits 350048, 350050 That broke buildbots because of some typos in
the test case.
llvm-svn: 352019
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This reverts commit ccfb060ecb5d7e18ea729455660484d576bde2cc.
Some tests need to to fixed before reapplying this commit.
llvm-svn: 352014
 |