| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247683
|
|
|
|
|
|
|
|
|
|
| |
*MCAsmInfo.h. NFC.
This is to reduce noise in a following commit.
Also fixes a couple missing spaces before the reference operator.
llvm-svn: 247679
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D11632
llvm-svn: 247670
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Added support for the following instructions:
CACHEE, LBE, LBUE, LHE, LHUE, LWE, LLE, LWLE, LWRE, PREFE,
SBE, SHE, SWE, SCE, SWLE, SWRE, TLBINV, TLBINVF
This required adding some infrastructure for the EVA ASE.
Patch by Scott Egerton.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11139
llvm-svn: 247669
|
|
|
|
| |
llvm-svn: 247649
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In vectorized integer min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :
%svn0 = vector_shuffle %0, undef<2,3,u,u>
%smax0 = smax %0, svn0
%svn3 = vector_shuffle %smax0, undef<1,u,u,u>
%sc = setcc %smax0, %svn3, gt
%n0 = extract_vector_elt %sc, #0
%n1 = extract_vector_elt %smax0, #0
%n2 = extract_vector_elt $smax0, #1
%result = select %n0, %n1, n2
becomes :
%1 = smaxv %0
%result = extract_vector_elt %1, 0
This change extends r246790.
llvm-svn: 247575
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operands, NFC.
Summary:
These operands had the same purpose, however the MipsMemSimm9GPRAsmOperand
operand was only for micromips32r6 and the MipsMemSimm9AsmOperand did not
have a ParserMatchClass.
Patch by Scott Egerton
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12730
llvm-svn: 247573
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when
we can do the lsl as a shifted operand and the resulting multiply constant is
simpler to generate.
Do this by doing the transformation when trying to select a shifted operand,
as that ensures that it actually turns out better (the alternative would be to
do it in PreprocessISelDAG, but we don't know for sure there if extracting the
shift would allow a shifted operand to be used).
Differential Revision: http://reviews.llvm.org/D12196
llvm-svn: 247569
|
|
|
|
| |
llvm-svn: 247547
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dangling pointer
The MipsTargetELFStreamer can receive ABI info from many sources. For example,
from the MipsAsmParser instance. Lifetime of the MipsAsmParser can be shorter
than MipsTargetELFStreamer's lifetime. In that case we get a dangling pointer
to MipsABIInfo.
Differential Revision: http://reviews.llvm.org/D12805
llvm-svn: 247546
|
|
|
|
|
|
|
| |
Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles.
Added shuffle decodes for 3DNow! PSWAPD shuffles.
llvm-svn: 247526
|
|
|
|
|
|
|
|
|
|
|
| |
KNL does not have VXORPS, VORPS for 512-bit values.
I use integer VPXOR, VPOR that actually do the same.
X86ISD::FXOR/FOR are generated as a result of FSUB combining.
Differential Revision: http://reviews.llvm.org/D12753
llvm-svn: 247523
|
|
|
|
|
|
|
|
|
|
| |
integer insts (2nd try)
The changes in:
test/CodeGen/X86/machine-cp.ll
are just due to scheduling differences after some logic instructions were reassociated.
llvm-svn: 247516
|
|
|
|
|
|
| |
Renamed to lowerVectorShuffleAsPermuteAndUnpack to make it clear that it lowers to more than just a UNPCK instruction.
llvm-svn: 247513
|
|
|
|
| |
llvm-svn: 247511
|
|
|
|
| |
llvm-svn: 247507
|
|
|
|
|
|
| |
integer insts
llvm-svn: 247506
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes a variety of typos in docs, code and headers.
Subscribers: jholewinski, sanjoy, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12626
llvm-svn: 247495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
realignment should be forced.
With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.
Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.
Differential Revision: http://reviews.llvm.org/D11814
llvm-svn: 247450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).
Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)
Differential Revision: http://reviews.llvm.org/D12557
llvm-svn: 247429
|
|
|
|
|
|
| |
This lets us generalize its usage to the other atomic instructions.
llvm-svn: 247428
|
|
|
|
|
|
| |
It caused crash in MachineInstr::hasPropertyInBundle() since r247237.
llvm-svn: 247395
|
|
|
|
|
|
| |
small. NFC.
llvm-svn: 247357
|
|
|
|
|
|
|
|
| |
The Win32 EH runtime caller does not preserve EBP, even though it does
preserve the CSRs (EBX, ESI, EDI) for us. The result was that each
finally funclet call would leave the frame pointer off by 12 bytes.
llvm-svn: 247348
|
|
|
|
| |
llvm-svn: 247345
|
|
|
|
| |
llvm-svn: 247344
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (mostly-deprecated) SelectionDAG-based ILPListDAGScheduler scheduler
was making poor scheduling decisions, causing high register pressure and
extraneous register spills.
Switching to the newer machine scheduler generates better code -- even
without there being a machine model defined for SPARC yet.
(Actually committing the test changes too, this time, unlike r247315)
llvm-svn: 247343
|
|
|
|
|
|
|
|
| |
This reverts commit r247315.
Accidentally omitted test changes; will resubmit full change shortly.
llvm-svn: 247328
|
|
|
|
|
|
|
|
|
|
|
| |
The (mostly-deprecated) SelectionDAG-based ILPListDAGScheduler scheduler
was making poor scheduling decisions, causing high register pressure and
extraneous register spills.
Switching to the newer machine scheduler generates better code -- even
without there being a machine model defined for SPARC yet.
llvm-svn: 247315
|
|
|
|
|
|
|
|
|
| |
fixes"
Except the changes that defined virtual destructors as =default, because that
ran into problems with GCC 4.7 and overriding methods that weren't noexcept.
llvm-svn: 247298
|
|
|
|
| |
llvm-svn: 247296
|
|
|
|
|
|
|
|
|
| |
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11802
llvm-svn: 247276
|
|
|
|
|
|
|
|
| |
The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.
Patch by Jeroen Ketema!
llvm-svn: 247254
|
|
|
|
|
|
|
| |
splits to actually use the single character split routine which does
less work, and in a debug build is *substantially* faster.
llvm-svn: 247245
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The changes in this patch are as follows:
1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line
A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.
Phabricator review: http://reviews.llvm.org/D11817
llvm-svn: 247237
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.
Follow-up to r247234.
llvm-svn: 247236
|
|
|
|
|
|
| |
Followup to r247231.
llvm-svn: 247234
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.
Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.
Part of PR24086.
llvm-svn: 247231
|
|
|
|
| |
llvm-svn: 247230
|
|
|
|
|
|
|
| |
This will be caught by existing tests with a
verifier check to be added in a future commit.
llvm-svn: 247229
|
|
|
|
|
|
|
| |
This caused build breakges, e.g.
http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926
llvm-svn: 247226
|
|
|
|
|
|
| |
To be used by other targets.
llvm-svn: 247225
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.
This small example now compiles and executes successfully on win32:
extern "C" int printf(const char *, ...) noexcept;
struct Dtor {
~Dtor() { printf("~Dtor\n"); }
};
void has_cleanup() {
Dtor o;
throw 42;
}
int main() {
try {
has_cleanup();
} catch (int) {
printf("caught it\n");
}
}
Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.
llvm-svn: 247219
|
|
|
|
|
|
|
|
| |
Patch by Eugene Zelenko!
Differential Revision: http://reviews.llvm.org/D12740
llvm-svn: 247216
|
|
|
|
|
|
|
|
|
|
|
| |
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.
The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.
llvm-svn: 247192
|
|
|
|
|
|
|
|
| |
,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding."
This reverts commit r247149, as it was breaking numerous buildbots of varied architectures.
llvm-svn: 247177
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With subregister liveness enabled we can detect the case where only
parts of a register are live in, this is expressed as a 32bit lanemask.
The current code only keeps registers in the live-in list and therefore
enumerated all subregisters affected by the lanemask. This turned out to
be too conservative as the subregister may also cover additional parts
of the lanemask which are not live. Expressing a given lanemask by
enumerating a minimum set of subregisters is computationally expensive
so the best solution is to simply change the live-in list to store the
lanemasks as well. This will reduce memory usage for targets using
subregister liveness and slightly increase it for other targets
Differential Revision: http://reviews.llvm.org/D12442
llvm-svn: 247171
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.
This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.
llvm-svn: 247162
|
|
|
|
| |
llvm-svn: 247161
|