| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
full solution to the slowness of this formatter, but it's a 5% improvement in our testcase performance, which I am not going to complain too hard about.
llvm-svn: 224373
|
| |
|
|
| |
llvm-svn: 224372
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on the post-commit review thread for r224012, -Wkeyword-macro fires
mostly on headers trying to set up portable defines and doesn't find much bad
stuff in practice. But [macro.names]p2 does disallow defining or undefining
keywords, override and final, and alignas, so keep the warning but move it
into -pedantic.
-Wreserved-id-macro warns on
#define __need_size_t
which is more or less public api for glibc headers. Since this warning isn't
motivated by a standard, remove it.
(See also r223114 for a previous follow-up to r224012.)
llvm-svn: 224371
|
| |
|
|
| |
llvm-svn: 224370
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The variable (and the GV) is only ever used if the function is. Putting it
in the function's comdat make it easier for the linker to discard them.
The motivating example is
struct S {
static const int x;
};
// const int S::x = 42;
inline const int *f() {
static const int y = S::x;
return &y;
}
const int *g() { return f(); }
With S::x commented out, _ZZ1fvE1y is a variable with a guard variable
that is initialized by f.
With S::x present, _ZZ1fvE1y is a constant.
llvm-svn: 224369
|
| |
|
|
| |
llvm-svn: 224368
|
| |
|
|
| |
llvm-svn: 224367
|
| |
|
|
| |
llvm-svn: 224366
|
| |
|
|
|
|
| |
rounding. Doubleword abs/neg/not. Interleave and deinterleave instructions.
llvm-svn: 224365
|
| |
|
|
| |
llvm-svn: 224362
|
| |
|
|
| |
llvm-svn: 224361
|
| |
|
|
| |
llvm-svn: 224360
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386.
Test Plan: ninja check
Reviewers: t.p.northover, jvoung
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6687
llvm-svn: 224359
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In SemaCUDA all implicit functions were considered host device, this led to
errors such as the following code snippet failing to compile:
struct Copyable {
const Copyable& operator=(const Copyable& x) { return *this; }
};
struct Simple {
Copyable b;
};
void foo() {
Simple a, b;
a = b;
}
Above the implicit copy assignment operator was inferred as host device but
there was only a host assignment copy defined which is an error in device
compilation mode.
Differential Revision: http://reviews.llvm.org/D6565
llvm-svn: 224358
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
package
If we use the receiver's package, we can end up with identical manglings
for different functions. Consider:
package p
type U struct{}
func (U) f()
package q
import "p"
type T struct { p.U }
func (T) f()
The method set of *T has two synthetic methods named (*T).f(); one forwards to
(T).f(), and the other to (U).f(). Previously, we were distinguishing them
by the receiver's package, and in this case because both methods have the
same receiver, they received the same name.
The methods are correctly distinguished by the package owning the identifier
"f", which is available via f.Object().Pkg().
Differential Revision: http://reviews.llvm.org/D6673
llvm-svn: 224357
|
| |
|
|
|
|
| |
Also delete a dead member variable.
llvm-svn: 224356
|
| |
|
|
| |
llvm-svn: 224355
|
| |
|
|
|
|
|
| |
This was a static function before, and NVPTX duplicated it
because it wasn't exposed.
llvm-svn: 224354
|
| |
|
|
| |
llvm-svn: 224353
|
| |
|
|
| |
llvm-svn: 224352
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.
Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.
** Context **
Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.
** Motivating Example **
Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
%ld = load i8* %addr1
%zextld = zext i8 %ld to i32
%ld2 = load i32* %addr2
%add = add nsw i32 %ld2, %zextld
%sextadd = sext i32 %add to i64
%zexta = zext i8 %a to i32
%addza = add nsw i32 %zexta, %zextld
%sextaddza = sext i32 %addza to i64
%addb = add nsw i32 %b, %zextld
%sextaddb = sext i32 %addb to i64
call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
ret void
}
As it is, this IR generates the following assembly on x86_64:
[...]
movzbl (%rdi), %eax # zero-extended load
movl (%rsi), %es # plain load
addl %eax, %esi # 32-bit add
movslq %esi, %rdi # sign extend the result of add
movzbl %dl, %edx # zero extend the first argument
addl %eax, %edx # 32-bit add
movslq %edx, %rsi # sign extend the result of add
addl %eax, %ecx # 32-bit add
movslq %ecx, %rdx # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.
Now, by promoting the additions to form more extended loads we would generate:
[...]
movzbl (%rdi), %eax # zero-extended load
movslq (%rsi), %rdi # sign-extended load
addq %rax, %rdi # 64-bit add
movzbl %dl, %esi # zero extend the first argument
addq %rax, %rsi # 64-bit add
movslq %ecx, %rdx # sign extend the second argument
addq %rax, %rdx # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.
This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.
Note: The throughput numbers are similar on Sandy Bridge and Haswell.
** Proposed Solution **
To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))
The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.
** Performance **
Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar: ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%
The results are consistent for both O3 and Os.
<rdar://problem/18310086>
llvm-svn: 224351
|
| |
|
|
|
|
|
|
| |
incorrectly using PRIx32
when it should have been using PRIx64 for the value that was passed as uint64_t .
llvm-svn: 224350
|
| |
|
|
|
|
| |
Added lowering tests.
llvm-svn: 224349
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An instruction alias defined with InstAlias and an optional operand in the
middle of the AsmString field, "..${a} <operands>", would get the final
"}" printed in the instruction disassembly. This wouldn't happen if the optional
operand appeared as the last item in the AsmString which is how the current
backends avoided the problem.
There don't appear to be any tests for this part of Tablegen but it passes the
pre-commit tests. Manually tested the change by enabling the generic alias
printer in the ARM backend and checking the output.
Differential Revision: http://reviews.llvm.org/D6529
llvm-svn: 224348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On X86, the Intel asm parser tries to match all memory operand sizes when
none is explicitly specified. For LEA, which doesn't really have a memory
operand (just a pointer one), this results in multiple successful matches,
one for each memory size. There's no error because it's same opcode, so
really, it's just one match. However, the tablegen'd matcher function
adds opcode/operands to the passed MCInst, and this results in multiple
duplicated operands.
This commit clears the MCInst in the tablegen'd matcher function.
We sometimes clear it when the match failed, so there's no expectation of
keeping the previous content anyway.
Differential Revision: http://reviews.llvm.org/D6670
llvm-svn: 224347
|
| |
|
|
| |
llvm-svn: 224346
|
| |
|
|
|
|
|
|
|
| |
lld-link shells out to MSVC for certain types of work, and this
results in some MSVC output files being generated even though
clang / lld are the compiler / linker. This is expected, so we
make sure to clean these output files on make clean.
llvm-svn: 224345
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead.
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.
Differential Revision: http://reviews.llvm.org/D6492
llvm-svn: 224344
|
| |
|
|
| |
llvm-svn: 224343
|
| |
|
|
| |
llvm-svn: 224342
|
| |
|
|
| |
llvm-svn: 224341
|
| |
|
|
| |
llvm-svn: 224340
|
| |
|
|
|
|
| |
actually wrong. It should be testing for FeatureGP64bit.There are no functional changes.
llvm-svn: 224339
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5667
llvm-svn: 224338
|
| |
|
|
| |
llvm-svn: 224337
|
| |
|
|
| |
llvm-svn: 224335
|
| |
|
|
|
|
|
|
|
|
| |
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.
http://reviews.llvm.org/D6527
llvm-svn: 224334
|
| |
|
|
| |
llvm-svn: 224333
|
| |
|
|
|
|
| |
This would result in a crash since the vcvt used does not support v8i32 types.
llvm-svn: 224332
|
| |
|
|
| |
llvm-svn: 224331
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to AVX specification:
"Most arithmetic and data processing instructions encoded using the VEX prefix and
performing memory accesses have more flexible memory alignment requirements
than instructions that are encoded without the VEX prefix. Specifically,
With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions,
most VEX-encoded, arithmetic and data processing instructions operate in
a flexible environment regarding memory address alignment, i.e. VEX-encoded
instruction with 32-byte or 16-byte load semantics will support unaligned load
operation by default. Memory arguments for most instructions with VEX prefix
operate normally without causing #GP(0) on any byte-granularity alignment
(unlike Legacy SSE instructions)."
The same for AVX-512.
This change does not affect anything right now, because only the "memop pattern fragment"
depends on FeatureVectorUAMem and it is not used in AVX patterns.
All AVX patterns are based on the "unaligned load" anyway.
llvm-svn: 224330
|
| |
|
|
|
|
|
| |
Bitfield RefersToEnclosingLocal of Stmt::DeclRefExprBitfields renamed to RefersToCapturedVariable to reflect latest changes introduced in commit 224323. Also renamed method Expr::refersToEnclosingLocal() to Expr::refersToCapturedVariable() and comments for constant arguments.
No functional changes.
llvm-svn: 224329
|
| |
|
|
| |
llvm-svn: 224328
|
| |
|
|
|
|
|
| |
Stop printing `metadata` in `Metadata::print()` and
`Metadata::printAsOperand()`.
llvm-svn: 224327
|
| |
|
|
| |
llvm-svn: 224326
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's horrible to inspect `MDNode`s in a debugger. All of their operands
that are `MDNode`s get dumped as `<badref>`, since we can't assign
metadata slots in the context of a `Metadata::dump()`. (Why not? Why
not assign numbers lazily? Because then each time you called `dump()`,
a given `MDNode` could have a different lazily assigned number.)
Fortunately, the C memory model gives us perfectly good identifiers for
`MDNode`. Add pointer addresses to the dumps, transforming this:
(lldb) e N->dump()
!{i32 662302, i32 26, <badref>, null}
(lldb) e ((MDNode*)N->getOperand(2))->dump()
!{i32 4, !"foo"}
into:
(lldb) e N->dump()
!{i32 662302, i32 26, <0x100706ee0>, null}
(lldb) e ((MDNode*)0x100706ee0)->dump()
!{i32 4, !"foo"}
and this:
(lldb) e N->dump()
0x101200248 = !{<badref>, <badref>, <badref>, <badref>, <badref>}
(lldb) e N->getOperand(0)
(const llvm::MDOperand) $0 = {
MD = 0x00000001012004e0
}
(lldb) e N->getOperand(1)
(const llvm::MDOperand) $1 = {
MD = 0x00000001012004e0
}
(lldb) e N->getOperand(2)
(const llvm::MDOperand) $2 = {
MD = 0x0000000101200058
}
(lldb) e N->getOperand(3)
(const llvm::MDOperand) $3 = {
MD = 0x00000001012004e0
}
(lldb) e N->getOperand(4)
(const llvm::MDOperand) $4 = {
MD = 0x0000000101200058
}
(lldb) e ((MDNode*)0x00000001012004e0)->dump()
!{}
(lldb) e ((MDNode*)0x0000000101200058)->dump()
!{null}
into:
(lldb) e N->dump()
!{<0x1012004e0>, <0x1012004e0>, <0x101200058>, <0x1012004e0>, <0x101200058>}
(lldb) e ((MDNode*)0x1012004e0)->dump()
!{}
(lldb) e ((MDNode*)0x101200058)->dump()
!{null}
llvm-svn: 224325
|
| |
|
|
|
|
|
|
| |
This test was missing a `Debug Info Version` so it's `not grep` was
passing vacuously. Update it to CHECK for something useful at the same
time so it doesn't bitrot quite so easily in the future.
llvm-svn: 224324
|
| |
|
|
|
|
|
|
| |
Currently, if global variable is marked as a private OpenMP variable, the compiler crashes in debug version or generates incorrect code in release version. It happens because in the OpenMP region the original global variable is used instead of the generated private copy. It happens because currently globals variables are not captured in the OpenMP region.
This patch adds capturing of global variables iff private copy of the global variable must be used in the OpenMP region.
Differential Revision: http://reviews.llvm.org/D6259
llvm-svn: 224323
|
| |
|
|
|
|
|
|
|
|
|
| |
Turning our _Atomic L-value into an R-value removes its _Atomic-ness.
However, we didn't update our 'FromType' which made
ScalarTypeToBooleanCastKind think we were trying to pass it a
non-scalar.
This fixes PR21836.
llvm-svn: 224322
|
| |
|
|
|
|
|
|
|
| |
The compact unwind importer is getting the wrong unwind info for one
case that I found. I haven't been able to fix the problem tonight
and I don't want to leave TOT behaving incorrectly, so just ignore
compact unwind until I can get to the bottom of this.
llvm-svn: 224321
|