| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
generation.
llvm-svn: 231558
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will provide the analogous replacements for the PassManagerBuilder
and other code long term. This code is extracted from the opt tool
currently, and I plan to extend it as I build up support for using the
new pass manager in Clang and other places.
Mailing this out for review in part to let folks comment on the terrible names
here. A brief word about why I chose the names I did.
The library is called "Passes" to try and make it clear that it is a high-level
utility and where *all* of the passes come together and are registered in
a common library. I didn't want it to be *limited* to a registry though, the
registry is just one component.
The class is a "PassBuilder" but this name I'm less happy with. It doesn't
build passes in any traditional sense and isn't a Builder-style API at all. The
class is a PassRegisterer or PassAdder, but neither of those really make a lot
of sense. This class is responsible for constructing passes for registry in an
analysis manager or for population of a pass pipeline. If anyone has a better
name, I would love to hear it. The other candidate I looked at was
PassRegistrar, but that doesn't really fit either. There is no register of all
the passes in use, and so I think continuing the "registry" analog outside of
the registry of pass *names* and *types* is a mistake. The objects themselves
are just objects with the new pass manager.
Differential Revision: http://reviews.llvm.org/D8054
llvm-svn: 231556
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.
This prevents many cases of spilling scalar data between the gpr + simd registers.
At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).
Differential Revision: http://reviews.llvm.org/D8132
llvm-svn: 231554
|
| |
|
|
| |
llvm-svn: 231547
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to disable lane switching if we don't actually have the instruction
set we want to switch to. Models the earlier check above the
conditional for the pass.
The testcase is one that triggered with the assert that's added
as part of the fix, use it to avoid adding a new testcase as it
highlights the same problem.
llvm-svn: 231539
|
| |
|
|
| |
llvm-svn: 231528
|
| |
|
|
|
|
|
|
|
|
|
| |
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
<rdar://problem/20072968>
llvm-svn: 231527
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))
Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.
Differential Revision: http://reviews.llvm.org/D7622
llvm-svn: 231507
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.
The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.
Differential Revision: http://reviews.llvm.org/D8026
llvm-svn: 231506
|
| |
|
|
| |
llvm-svn: 231503
|
| |
|
|
|
|
|
|
| |
it into a function
+ Random cleanups. No functional change.
llvm-svn: 231501
|
| |
|
|
|
|
| |
Translate german to english.
llvm-svn: 231500
|
| |
|
|
| |
llvm-svn: 231495
|
| |
|
|
| |
llvm-svn: 231491
|
| |
|
|
| |
llvm-svn: 231486
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-debug-pass is not specified, as the string is only used when dumping pass
information. There is a big cost of determining the name in
ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a
name. This is the case for the Release build, as names are not preserved by the
front-end.
RegionPass is mainly used by Polly, resulting in long compile time for one file
of a customer application with the Release build (1m24s) vs Release+Asserts
build (10s) when Polly is used. With this change, the compile time with the
Release build went down to 8s.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D8076
llvm-svn: 231485
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Multiplication is not dependent on signedness, so just treating
all input ranges as unsigned is not incorrect. However it will cause
overly pessimistic ranges (such as full-set) when used with signed
negative values.
Teach multiply to try to interpret its inputs as both signed and
unsigned, and then to take the most specific (smallest population)
as its result.
llvm-svn: 231483
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
-- before
_extgotequiv:
.long _extfoo
_delta:
.long _extgotequiv-_delta
-- after
_delta:
.long L_extfoo$non_lazy_ptr-_delta
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_extfoo$non_lazy_ptr:
.indirect_symbol _extfoo
.long 0
llvm-svn: 231475
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
-- before
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
-- after
.globl _foo
_foo:
.long 42
.globl _delta
Ltmp3:
.long _foo@GOT-Ltmp3
llvm-svn: 231474
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
directive.
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7140
llvm-svn: 231465
|
| |
|
|
|
|
|
|
|
| |
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
looping forever.
llvm-svn: 231459
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
|
| |
|
|
|
|
|
|
| |
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable. Extend this
functionality to all GlobalObjects.
llvm-svn: 231456
|
| |
|
|
| |
llvm-svn: 231455
|
| |
|
|
|
|
|
|
|
|
| |
We extend an underlying file before mmap'ing it, but it's not needed
on Windows. Extending file is slow on Windows, so we should avoid doing that.
The difference gets larger as the size of an output file gets larger.
It shove off 2 seconds out of 25 seconds when linking chrome.dll with LLD,
for example.
llvm-svn: 231452
|
| |
|
|
| |
llvm-svn: 231447
|
| |
|
|
|
|
| |
onto PtrState.
llvm-svn: 231446
|
| |
|
|
|
|
|
| |
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.
llvm-svn: 231443
|
| |
|
|
|
|
|
|
|
| |
We would set the body of a struct type (therefore making it non-opaque)
but were forgetting to move it to the non-opaque set.
Fixes pr22807.
llvm-svn: 231442
|
| |
|
|
|
|
|
|
|
|
|
| |
and out of the main dataflow.
These refactored computations check whether or not we are at a stage
of the sequence where we can perform a match. This patch moves the
computation out of the main dataflow and into
{BottomUp,TopDown}PtrState.
llvm-svn: 231439
|
| |
|
|
|
|
|
|
|
|
| |
{TopDown,BottomUp}PtrState Class.
This initialization occurs when we see a new retain or release. Before
we performed the actual initialization inline in the dataflow. That is
just messy.
llvm-svn: 231438
|
| |
|
|
|
|
|
|
|
|
|
| |
ptr state change behavior onto a PtrState class.
This will enable the main ObjCARCOpts dataflow to work with higher
level concepts such as "can this ptr state be modified by this ref
count" and not need to understand the nitty gritty details of how that
is determined. This makes the dataflow cleaner.
llvm-svn: 231437
|
| |
|
|
|
|
| |
be passed around.
llvm-svn: 231436
|
| |
|
|
|
|
|
| |
It will always be in the history if it is needed again. Now it is just dead
code.
llvm-svn: 231435
|
| |
|
|
|
|
| |
This optimization a continuation of r231140 that reasoned about signed div.
llvm-svn: 231433
|
| |
|
|
| |
llvm-svn: 231430
|
| |
|
|
| |
llvm-svn: 231427
|
| |
|
|
|
|
| |
sequence dataflow. This will allow me to separate the actual ARC queries from the meat of the dataflow algorithm.
llvm-svn: 231426
|
| |
|
|
| |
llvm-svn: 231425
|
| |
|
|
|
|
|
|
| |
It turns out 256bit V[SZ]EXT nodes are still
generated by the new shuffle lowering, so this
is here to stay!
llvm-svn: 231422
|
| |
|
|
|
|
| |
NFC.
llvm-svn: 231411
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reduces code size for all AVX targets and increases speed for some chips.
SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.
AVX subsequently made the instruction useful by adding a 4-register operand form.
So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.
This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter
No new test cases with this patch because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.
http://llvm.org/bugs/show_bug.cgi?id=22483
Differential Revision: http://reviews.llvm.org/D8063
llvm-svn: 231408
|
| |
|
|
|
|
| |
NFC intended.
llvm-svn: 231406
|
| |
|
|
| |
llvm-svn: 231405
|
| |
|
|
|
|
|
| |
The copies already diverged, don't let them become any worse. Reduce
redundancy in code with a little macro metaprogramming.
llvm-svn: 231401
|
| |
|
|
|
|
|
| |
Fixes PR22761, rdar://20024866.
Differential Revision: http://reviews.llvm.org/D8042
llvm-svn: 231400
|
| |
|
|
|
|
| |
I missed an occurrence of the old symbol in my previous patch.
llvm-svn: 231398
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
llvm-svn: 231396
|
| |
|
|
|
|
|
|
|
|
| |
This will be followed by a change on the clang side to update
the only user of this function with the new version.
Differential Revision: http://reviews.llvm.org/D8074
Reviewed By: Reid Kleckner
llvm-svn: 231392
|
| |
|
|
| |
llvm-svn: 231391
|