| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).
This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).
llvm-svn: 265047
|
| |
|
|
|
|
|
|
|
|
| |
eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio
llvm-svn: 265046
|
| |
|
|
|
|
|
| |
Only force "extern" linkage if the function used to be a definition
in the source module. Declarations keep their original linkage.
llvm-svn: 265043
|
| |
|
|
|
|
|
| |
The default is legal, which results in 'Cannot select' errors. This is
triggered during selfhost due to a recent cost model change.
llvm-svn: 265040
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR27140)
For code such as:
void f(int, int);
void g() {
f(1, 2);
}
compiled for 32-bit X86 Linux, Clang would previously generate:
subl $12, %esp
subl $8, %esp
pushl $2
pushl $1
calll f
addl $16, %esp
addl $12, %esp
retl
This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265039
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
|
| |
|
|
|
|
|
|
| |
isBrImm should accept any non-constant immediate. Previously it was only accepting LanaiMCExpr ones which was wrong.
Differential Revision: http://reviews.llvm.org/D18571
llvm-svn: 265032
|
| |
|
|
|
|
|
|
| |
http://reviews.llvm.org/D18097
Initial support does not include any patterns to generate this instructions
llvm-svn: 265031
|
| |
|
|
|
|
| |
Use emplace_back instead of push_back for simplicity.
llvm-svn: 265030
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.
Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143
Differential Revision: http://reviews.llvm.org/D18566
llvm-svn: 265029
|
| |
|
|
| |
llvm-svn: 265025
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If the lhs is evaluated before the rhs, FuncletI's operator-> can trigger the
assert(isHandleInSync() && "invalid iterator access!");
at include/llvm/ADT/DenseMap.h:1061. (Happens e.g. when compiled with GCC 6.)
Differential Revision: http://reviews.llvm.org/D18440
llvm-svn: 265024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCSimplifyAddress contains this code:
IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
: Type::getInt64Ty(*Context));
to determine the type to be used for an index register, if one needs
to be created. However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address. This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.
Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64. Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.
llvm-svn: 265023
|
| |
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
http://reviews.llvm.org/D18032
This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx
llvm-svn: 265022
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change will handle missing store pair opportunity where the first store
instruction stores zero followed by the non-zero store. For example, this change
will convert :
str wzr, [x8]
str w1, [x8, #4]
into:
stp wzr, w1, [x8]
Reviewers: jmolloy, t.p.northover, mcrosier
Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18570
llvm-svn: 265021
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.
This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.
However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.
This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.
Differential Revision: http://reviews.llvm.org/D18605
llvm-svn: 265020
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are too many instructions to exhaustively test so addiu and lwc2 are
used as representative examples.
It should be noted that many memory instructions that should have simm16
range checking do not because it is also necessary to support the macro
of the same name which accepts simm32. The range checks for these occur in
the macro expansion.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18437
llvm-svn: 265019
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem
is that they don't trigger the custom parser because all the candidates
are disabled by feature bits. On all other subtargets, the diagnostics are
accurate but are subject to the usual issues of needing to report multiple
ways to correct the code (e.g. smaller offset, enable a CPU feature) but
only being able to report one error.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18436
llvm-svn: 265018
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch is a part of http://reviews.llvm.org/D15525
GlobalIndirectSymbol class contains common implementation for both
aliases and ifuncs. This patch should be NFC change that just prepare
common code for ifunc support.
Differential Revision: http://reviews.llvm.org/D18433
llvm-svn: 265016
|
| |
|
|
|
| |
Review: http://reviews.llvm.org/D18642
llvm-svn: 265015
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also, made test_mi10.s formatting consistent with the majority of the
MC tests.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18435
llvm-svn: 265014
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18546
llvm-svn: 265013
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a
12-bit offset when it is only 9-bit.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D18434
llvm-svn: 265010
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17334
llvm-svn: 265002
|
| |
|
|
| |
llvm-svn: 265000
|
| |
|
|
|
|
|
| |
The rule for SMIN introduced in rL236202 doesn't work as advertised: the
check for Pred == ICmpInst::ICMP_SGT was missing.
llvm-svn: 264996
|
| |
|
|
| |
llvm-svn: 264995
|
| |
|
|
|
|
|
|
|
| |
This way once we teach MatchBinaryOp to map more things into arithmetic,
the non-wrapping add recurrence construction would understand it too.
Right now MatchBinaryOp still only understands arithmetic, so this is
solely a code-reorganization change.
llvm-svn: 264994
|
| |
|
|
| |
llvm-svn: 264993
|
| |
|
|
| |
llvm-svn: 264992
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:
t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
t16: i64 = srl t13, Constant:i32<32>
t17: i32 = truncate t16
t18: f32 = bitcast t17
t19: i32 = truncate t13
t20: f32 = bitcast t19
The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:
ld 3, 0(5)
stw 3, -8(1)
rldicl 3, 3, 32, 32
stw 3, -4(1)
lfs 3, -4(1)
lfs 0, -8(1)
into:
lfs 3, 4(5)
lfs 0, 0(5)
llvm-svn: 264988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As discussed on llvm-dev[1].
This change adds the basic boilerplate code around having this intrinsic
in LLVM:
- Changes in Intrinsics.td, and the IR Verifier
- A lowering pass to lower @llvm.experimental.guard to normal
control flow
- Inliner support
[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html
Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18527
llvm-svn: 264976
|
| |
|
|
|
|
|
|
|
|
| |
(PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.
Differential Revision: http://reviews.llvm.org/D18573
llvm-svn: 264966
|
| |
|
|
| |
llvm-svn: 264959
|
| |
|
|
| |
llvm-svn: 264944
|
| |
|
|
| |
llvm-svn: 264943
|
| |
|
|
|
|
|
| |
r264884 introduced a helper to escape the backslashes in the source file
path, but I since discovered an existing mechanism to escape strings.
llvm-svn: 264936
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Commit r260791 contained an error in that it would introduce a cross-module
reference in the old module. It also introduced O(N^2) complexity in the
module cloner by requiring the entire module to be visited for each function.
Fix both of these problems by avoiding use of the CloneDebugInfoMetadata
function (which is only designed to do intra-module cloning) and cloning
function-attached metadata in the same way that we clone all other metadata.
Differential Revision: http://reviews.llvm.org/D18583
llvm-svn: 264935
|
| |
|
|
|
|
| |
"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264929
|
| |
|
|
|
|
|
|
|
|
|
| |
For the same reason as the corresponding load change.
Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.
llvm-svn: 264928
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.
Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.
llvm-svn: 264927
|
| |
|
|
|
|
|
|
|
|
| |
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.
This fixes PR27133.
llvm-svn: 264926
|
| |
|
|
|
|
|
|
| |
consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.
llvm-svn: 264922
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently it's a module pass. Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18615
llvm-svn: 264919
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rather than a function pointer.
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:
LV: The Smallest and Widest types: 32 / 32 bits.
LV: The Widest register is: 256 bits.
LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Scalar loop costs: 3.
LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 2 costs: 2.
LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 4 costs: 1.
...
LV: Selecting VF: 8.
LV: The target has 32 registers
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #4 Interval # 1
LV(REG): At #5 Interval # 1
LV(REG): VF = 8
The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.
The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.
While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.
This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.
Differential Revision: http://reviews.llvm.org/D18537
llvm-svn: 264904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.
Also add a new map that maps PGOFuncName's MD5 value to the function definition.
Differential Revision: http://reviews.llvm.org/D17895
llvm-svn: 264902
|
| |
|
|
|
|
|
| |
This restores commit 264869, with a fix for windows bots to properly
escape '\' in the path when serializing out. Added test.
llvm-svn: 264884
|
| |
|
|
| |
llvm-svn: 264882
|
| |
|
|
|
|
|
|
|
| |
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.
Differential Revision: http://reviews.llvm.org/D18568
llvm-svn: 264879
|