| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
review thread out of order easier.
llvm-svn: 216241
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks.
Somewhat unnoticed in the original implementation of discriminators, but
it could cause instructions to end up in new, small,
DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
discriminator changes.
Instead, use DILexicalBlockFile which we already use to track file
changes without introducing new scopes, so it works well to track
discriminator changes in the same way.
llvm-svn: 216239
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:
srl/add/sra
is not the general sequence for signed integer division by power-of-2. We need one more 'sra':
sra/srl/add/sra
That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.
This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D5010
llvm-svn: 216237
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The advanced copy optimization does not yield any difference on the whole llvm
test-suite + SPECs, either in compile time or runtime (binaries are identical),
but has a big potential when data go back and forth between register files as
demonstrated with test/CodeGen/ARM/adv-copy-opt.ll.
Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64.
<rdar://problem/12702965>
llvm-svn: 216236
|
| |
|
|
|
|
|
|
| |
Patch 2 of 11 in 'Add a "probe-stack" attribute' review thread
Patch by: john.kare.alsaker@gmail.com
llvm-svn: 216235
|
| |
|
|
|
|
|
|
| |
Patch 1 of 11 in 'Add a "probe-stack" attribute' review thread.
Patch by: <john.kare.alsaker@gmail.com>
llvm-svn: 216233
|
| |
|
|
|
|
|
|
|
|
|
| |
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details
The command line option is "atomic-expand"
llvm-svn: 216231
|
| |
|
|
|
|
|
|
| |
source of a copy.
<rdar://problem/12702965>
llvm-svn: 216229
|
| |
|
|
| |
llvm-svn: 216228
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
happy.
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.
Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.
llvm-svn: 216225
|
| |
|
|
|
|
| |
std::unique_ptr
llvm-svn: 216223
|
| |
|
|
| |
llvm-svn: 216220
|
| |
|
|
|
|
| |
This fixes a crash in an ocl conformance test.
llvm-svn: 216219
|
| |
|
|
| |
llvm-svn: 216218
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will simplify the SGPR spilling and also allow us to use
MachineFrameInfo for calculating offsets, which should be more
reliable than our custom code.
This fixes a crash in some cases where a register would be spilled
in a branch such that the VGPR defined for spilling did not dominate
all the uses when restoring.
This fixes a crash in an ocl conformance test. The test requries
register spilling and is too big to include.
llvm-svn: 216217
|
| |
|
|
|
|
|
| |
This fixes a crash in an ocl conformance test. The test requries
register spilling and is too big to include.
llvm-svn: 216216
|
| |
|
|
|
|
|
| |
This will avoid code duplication in the next commit which calls it directly
from the gold plugin.
llvm-svn: 216211
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We discussed the issue of generality vs. readability of the AVX512 classes
recently. I proposed this approach to try to hide and centralize the mappings
we commonly perform based on the vector type. A new class X86VectorVTInfo
captures these.
The idea is to pass an instance of this class to classes/multiclasses instead
of the corresponding ValueType. Then the class/multiclass can use its field
for things that derive from the type rather than passing all those as separate
arguments.
I modified avx512_valign to demonstrate this new approach. As you can see
instead of 7 related template parameters we now have one. The downside is
that we have to refer to fields for the derived values. I named the argument
'_' in order to make this as invisible as possible. Please let me know if you
absolutely hate this. (Also once we allow local initializations in
multiclasses we can recover the original version by assigning the fields to
local variables.)
Another possible use-case for this class is to directly map things, e.g.:
RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC
llvm-svn: 216209
|
| |
|
|
|
|
|
|
|
|
| |
The profile data format was recently updated and the new indexing api
requires the code coverage tool to know the function's hash as well
as the function's name to get the execution counts for a function.
Differential Revision: http://reviews.llvm.org/D4994
llvm-svn: 216207
|
| |
|
|
| |
llvm-svn: 216203
|
| |
|
|
| |
llvm-svn: 216201
|
| |
|
|
|
|
|
|
|
| |
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.
<rdar://problem/12702965>
llvm-svn: 216200
|
| |
|
|
|
|
| |
function. NFCI.
llvm-svn: 216199
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only
the latter supports using different source and destination registers, so
whenever we materialize a new base register (at a certain offset) we'd do
so by moving the base register value to the new register and then adding in
place. This patch changes the code to use a single tADDi3 if the offset is
small enough to fit in 3 bits.
Differential Revision: http://reviews.llvm.org/D5006
llvm-svn: 216193
|
| |
|
|
|
|
|
|
| |
systems
http://reviews.llvm.org/D4984
llvm-svn: 216182
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 216178
|
| |
|
|
| |
llvm-svn: 216176
|
| |
|
|
|
|
|
|
| |
with many arguments.
PR20677
llvm-svn: 216175
|
| |
|
|
| |
llvm-svn: 216174
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This bug was introduced in r213006 which makes an assumption that MCSection is COFF for Windows MSVC. This assumption is broken for MCJIT users where ELF is used instead [1]. The fix is to change the MCSection cast to a dyn_cast.
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/068407.html.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4872
llvm-svn: 216173
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.
This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.
llvm-svn: 216172
|
| |
|
|
|
|
|
|
|
|
|
| |
This does not require -ffast-math, and it gives CSE/GVN more options to
eliminate duplicate expressions in, e.g.:
return ((x + 0.1234 * y) * (x - 0.1234 * y));
Differential Revision: http://reviews.llvm.org/D4904
llvm-svn: 216169
|
| |
|
|
|
|
| |
While there remove noop casts.
llvm-svn: 216168
|
| |
|
|
| |
llvm-svn: 216164
|
| |
|
|
| |
llvm-svn: 216163
|
| |
|
|
| |
llvm-svn: 216162
|
| |
|
|
|
|
|
|
| |
Added FeatureSMAP.
Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP
llvm-svn: 216161
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:
int N = 100;
float **A;
void foo(int x0, int x1)
{
float * A_cur = &A[0][0];
float * A_next = &A[1][0];
for(int x = x0; x < x1; ++x).
{
// Currently only [x+N] case is widened. Others 2 cases lead to sext.
// This patch fixes it, so all 3 cases do not need sext.
const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
A_next[x] = div;
}
}
...
> clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o -
Differential Revision: http://reviews.llvm.org/D4695
llvm-svn: 216160
|
| |
|
|
|
|
|
|
| |
EngineBuilder and DIContext.
By Arch Robison.
llvm-svn: 216159
|
| |
|
|
|
|
| |
needing to mention the size.
llvm-svn: 216158
|
| |
|
|
|
|
| |
Adapted from a patch by Richard Smith, test-case written by me.
llvm-svn: 216157
|
| |
|
|
|
|
| |
added to work an old gcc bug. I believe its been fixed by now.
llvm-svn: 216156
|
| |
|
|
|
|
| |
In constant folding stage, "TRUNC" can't handle vector data type.
llvm-svn: 216149
|
| |
|
|
|
|
| |
Builder and type".
llvm-svn: 216147
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
llvm-svn: 216144
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
target hook.
This patch teaches the compiler that:
dX = VSETLNi32 dY, rZ, imm
is the same as:
dX = INSERT_SUBREG dY, rZ, translateImmToSubIdx(imm)
<rdar://problem/12702965>
llvm-svn: 216143
|
| |
|
|
|
|
| |
Only for Cortex-A57 and Cyclone for now, where it has shown wins.
llvm-svn: 216141
|
| |
|
|
|
|
| |
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.
llvm-svn: 216140
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
llvm-svn: 216139
|
| |
|
|
|
|
|
|
|
|
|
| |
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.
See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.
llvm-svn: 216138
|