| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.
The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.
llvm-svn: 247192
|
|
|
|
|
|
|
|
| |
,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding."
This reverts commit r247149, as it was breaking numerous buildbots of varied architectures.
llvm-svn: 247177
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With subregister liveness enabled we can detect the case where only
parts of a register are live in, this is expressed as a 32bit lanemask.
The current code only keeps registers in the live-in list and therefore
enumerated all subregisters affected by the lanemask. This turned out to
be too conservative as the subregister may also cover additional parts
of the lanemask which are not live. Expressing a given lanemask by
enumerating a minimum set of subregisters is computationally expensive
so the best solution is to simply change the live-in list to store the
lanemasks as well. This will reduce memory usage for targets using
subregister liveness and slightly increase it for other targets
Differential Revision: http://reviews.llvm.org/D12442
llvm-svn: 247171
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.
This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.
llvm-svn: 247162
|
|
|
|
| |
llvm-svn: 247161
|
|
|
|
| |
llvm-svn: 247158
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This helps mostly when we use add instructions for address calculations
that contain immediates.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12256
llvm-svn: 247157
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select instructions
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.
Reviewers: mcrosier
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12724
llvm-svn: 247156
|
|
|
|
| |
llvm-svn: 247152
|
|
|
|
|
|
|
|
|
| |
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.
Differential Revision: http://reviews.llvm.org/D11802
llvm-svn: 247149
|
|
|
|
|
|
|
|
| |
SRL16 instructions
Differential Revision: http://reviews.llvm.org/D11178
llvm-svn: 247146
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D11628
llvm-svn: 247125
|
|
|
|
|
|
|
| |
Broken by r247074. Should include an assembler test,
but the assembler is currently broken for VOP3b apparently.
llvm-svn: 247123
|
|
|
|
| |
llvm-svn: 247118
|
|
|
|
|
|
|
|
|
|
| |
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.
This moves a hack out of SI's argument lowering and
is covered by existing tests.
llvm-svn: 247113
|
|
|
|
| |
llvm-svn: 247110
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.
64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12546
llvm-svn: 247092
|
|
|
|
|
|
| |
read CTR and count them as reading the CTR.
llvm-svn: 247083
|
|
|
|
|
|
|
|
|
| |
Adds vcc to output string input for e32. Allows option
of using e64 encoding with assembler.
Also fixes these instructions not implicitly reading exec.
llvm-svn: 247074
|
|
|
|
|
|
|
|
|
| |
The pass is needed to remove __nvvm_reflect calls when we link in
libdevice bitcode that comes with CUDA.
Differential Revision: http://reviews.llvm.org/D11663
llvm-svn: 247072
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch modifies X86TargetLowering::LowerVASTART so that
struct va_list is initialized with 32 bit pointers in x32. It also
includes tests that call @llvm.va_start() for x32.
Patch by João Porto
Subscribers: llvm-commits, hjl.tools
Differential Revision: http://reviews.llvm.org/D12346
llvm-svn: 247069
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeGen passes
This allows backends which don't use a traditional register allocator,
but do need PHI lowering and other passes, to use the default
TargetPassConfig::addFastRegAlloc and
TargetPassConfig::addOptimizedRegAlloc implementations.
Differential Revision: http://reviews.llvm.org/D12691
llvm-svn: 247065
|
|
|
|
|
|
|
|
|
|
|
| |
These were marked as WriteSALU, which is low latency.
I'm guessing at the value to use, but it should probably
be considered the highest latency instruction.
I'm not sure this has any actual effect since hasSideEffects
probably is preventing any moving of these.
llvm-svn: 247060
|
|
|
|
|
|
|
|
| |
This should be convergent. This is not a
barrier in the isBarrier sense, nor
hasCtrlDep.
llvm-svn: 247059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old implementation assumed LP64 which is broken for x32. Specifically, the
MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit
physreg copy instruction' error message to be reported.
This patch also enable the h-register*ll tests for x32.
Differential Revision: http://reviews.llvm.org/D12336
Patch by João Porto
llvm-svn: 247058
|
|
|
|
|
|
|
|
|
| |
sub C, x - > add (sub 0, x), C for DS offsets.
This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.
llvm-svn: 247054
|
|
|
|
|
|
| |
Based on a patch by Jerome Witmann.
llvm-svn: 247047
|
|
|
|
|
|
|
|
|
|
| |
x86 call frame optimization
Patch by Dave Kreitzer
Differential Revision: http://reviews.llvm.org/D12620
llvm-svn: 247042
|
|
|
|
|
|
| |
Renamed from: https://github.com/WebAssembly/design/pull/332
llvm-svn: 247028
|
|
|
|
| |
llvm-svn: 247021
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D1179
llvm-svn: 247017
|
|
|
|
|
|
|
|
| |
Added tests for encoding.
Differential Revision: http://reviews.llvm.org/D12061
llvm-svn: 247010
|
|
|
|
| |
llvm-svn: 247008
|
|
|
|
| |
llvm-svn: 247006
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D11801
llvm-svn: 246999
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: And define them to have noop casts with address spaces 0-255.
Reviewers: pekka.jaaskelainen
Subscribers: pekka.jaaskelainen, llvm-commits
Differential Revision: http://reviews.llvm.org/D12678
llvm-svn: 246990
|
|
|
|
|
|
|
|
| |
16-bit LBU16, LHU16, LW16, LWGP and LWSP instructions
Differential Revision: http://reviews.llvm.org/D10956
llvm-svn: 246987
|
|
|
|
| |
llvm-svn: 246983
|
|
|
|
| |
llvm-svn: 246982
|
|
|
|
|
|
|
|
| |
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.
Differential Revision: http://reviews.llvm.org/D10683
llvm-svn: 246981
|
|
|
|
|
|
|
|
| |
FLOOR.W.fmt, TRUNC.L.fmt, TRUNC.W.fmt, RSQRT.fmt and SQRT.fmt instructions
Differential Revision: http://reviews.llvm.org/D11674
llvm-svn: 246968
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D11181
llvm-svn: 246963
|
|
|
|
|
|
|
|
|
| |
SelectT2ShifterOperandReg has identical behaviour to SelectImmShifterOperand,
so get rid of it and use SelectImmShifterOperand instead.
Differential Revision: http://reviews.llvm.org/D12195
llvm-svn: 246962
|
|
|
|
|
|
|
|
| |
MAX.fmt, MIN.fmt, MAXA.fmt, MINA.fmt and CMP.condn.fmt instructions
Differential Revision: http://reviews.llvm.org/D12141
llvm-svn: 246960
|
|
|
|
| |
llvm-svn: 246953
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Fixes PR24719.
llvm-svn: 246937
|
|
|
|
|
|
|
|
| |
MADDF.fmt, MSUBF.fmt and NEG.fmt instructions
Differential Revision: http://reviews.llvm.org/D11978
llvm-svn: 246919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Fixes PR24704.
llvm-svn: 246900
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a basic cost model for interleaved-access vectorization (and a better
default for shuffles), and enables interleaved-access vectorization by default.
The relevant difference from the default cost model for interleaved-access
vectorization, is that on PPC, the shuffles that end up being used are *much*
cheaper than modeling the process with insert/extract pairs (which are
quite expensive, especially on older cores).
llvm-svn: 246824
|