| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.
Reviewers: t.p.northover, mcrosier, jmolloy
Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20172
llvm-svn: 270251
|
|
|
|
|
|
|
|
|
|
|
| |
Check that the incoming blocks of phi nodes are identical, and block
function merging if they are not.
rdar://problem/26255167
Differential Revision: http://reviews.llvm.org/D20462
llvm-svn: 270250
|
|
|
|
|
|
|
|
|
| |
The Fast mode takes the first mapping, the greedy mode loops over all
the possible mapping for an instruction and choose the cheaper one.
Test case will come with target specific code, since we currently do not
have instructions that have several mappings.
llvm-svn: 270249
|
|
|
|
|
|
| |
The uuid_command was duplicating the load_command.cmdsize field. This removes the duplicate from the YAML mapping and from the test cases.
llvm-svn: 270248
|
|
|
|
|
|
| |
This is now encapsulated in the RepairingPlacement class.
llvm-svn: 270247
|
|
|
|
|
|
|
|
| |
We performed a number of memory allocations each time getTTI was called,
remove them by using SmallString.
No functionality change intended.
llvm-svn: 270246
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
computeMapping.
Computing the cost of a mapping takes some time.
Since in Fast mode, the cost is irrelevant, just spare some cycles by not
computing it.
In Greedy mode, we need to choose the best cost, that means that when
the local cost gets more expensive than the best cost, we can stop
computing the repairing and cost for the current mapping.
llvm-svn: 270245
|
|
|
|
|
|
|
|
|
| |
more precise cost in Greedy mode.
In Fast mode the cost is irrelevant so do not bother requiring that
those passes get scheduled.
llvm-svn: 270244
|
|
|
|
| |
llvm-svn: 270242
|
|
|
|
| |
llvm-svn: 270237
|
|
|
|
| |
llvm-svn: 270236
|
|
|
|
|
|
| |
The mode should be choose by the target when instantiating the pass.
llvm-svn: 270235
|
|
|
|
| |
llvm-svn: 270234
|
|
|
|
|
|
|
|
| |
- Do not store Twine objects.
- Remove report_fatal_error, since llvm_unreachable does terminate the
program in release mode.
llvm-svn: 270233
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous choice of the insertion points for repairing was
straightfoward but may introduce some basic block or edge splitting. In
some situation this is something we can avoid.
For instance, when repairing a phi argument, instead of placing the
repairing on the related incoming edge, we may move it to the previous
block, before the terminators. This is only possible when the argument
is not defined by one of the terminator.
llvm-svn: 270232
|
|
|
|
|
|
| |
Inline getAnalysisUsage() while I'm here.
llvm-svn: 270231
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a first step towards a more extendible method of matching combined target shuffle masks.
Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.
I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.
Differential Revision: http://reviews.llvm.org/D19198
llvm-svn: 270230
|
|
|
|
| |
llvm-svn: 270228
|
|
|
|
|
|
| |
porting this pass to the new PM.
llvm-svn: 270225
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was noted in PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766#c2
We may not know whether the sign bit(s) are zero or one, but we can still
optimize based on knowing that the sign bit is repeated.
Differential Revision: http://reviews.llvm.org/D20275
llvm-svn: 270222
|
|
|
|
| |
llvm-svn: 270220
|
|
|
|
| |
llvm-svn: 270219
|
|
|
|
|
|
|
|
|
| |
I accidentally exposed a bug in MCExpr::evaluateAsRelocatableImpl() with the test file added in:
http://reviews.llvm.org/rL269977
Differential Revision: http://reviews.llvm.org/D20434
llvm-svn: 270218
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
|
|
|
|
|
|
|
| |
We now handle them just like non hidden ones. This was already the case
on x86 (r207518) and arm (r207517).
llvm-svn: 270205
|
|
|
|
| |
llvm-svn: 270200
|
|
|
|
|
|
| |
Allows Sparc registers to be specifically referred to in inline assembly.
llvm-svn: 270198
|
|
|
|
| |
llvm-svn: 270195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for an inline function.
If an inline function is observed but unused in a translation unit, dummy
coverage mapping data with zero hash is stored for this function.
If such a coverage mapping section came earlier than real one, the latter
was ignored. As a result, llvm-cov was unable to show coverage information
for those functions.
Differential Revision: http://reviews.llvm.org/D20286
llvm-svn: 270194
|
|
|
|
|
|
|
|
| |
Note: This is specifically to allow GCC's test pr44707 to pass.
Trivial change, not put for differential revision. Test included.
llvm-svn: 270192
|
|
|
|
| |
llvm-svn: 270190
|
|
|
|
| |
llvm-svn: 270182
|
|
|
|
|
|
| |
Follow r269988 and use Optional<Reloc>.
llvm-svn: 270176
|
|
|
|
|
|
| |
supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL.
llvm-svn: 270174
|
|
|
|
|
|
|
|
| |
On Linux ``rusage.ru_maxrss`` is in KiB but on Mac OSX it is in bytes.
Differential Revision: http://reviews.llvm.org/D20410
llvm-svn: 270173
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ``nprocs`` command does not exist under Mac OSX so use
``sysctl`` instead on that platform.
Whilst I'm here
* Use ``pclose()`` instead of ``fclose()`` which the ``popen()``
documentation says should be used.
* Check for errors that were previously unhandled.
Differential Revision: http://reviews.llvm.org/D20409
llvm-svn: 270172
|
|
|
|
|
|
| |
Reviewed by Matt Arsenault in http://reviews.llvm.org/D16311
llvm-svn: 270171
|
|
|
|
|
|
|
|
|
|
|
| |
an instruction.
Use the previously introduced RepairingPlacement class to split the code
computing the repairing placement from the code doing the actual
placement. That way, we will be able to consider different placement and
then, only apply the best one.
llvm-svn: 270168
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When assigning the register banks we may have to insert repairing code
to move already assigned values accross register banks.
Introduce a few helper classes to keep track of what is involved in the
repairing of an operand:
- InsertPoint and its derived classes record the positions, in the CFG,
where repairing has to be inserted.
- RepairingPlacement holds all the insert points for the repairing of an
operand plus the kind of action that is required to do the repairing.
This is going to be used to keep track of how the repairing should be
done, while comparing different solutions for an instruction. Indeed, we
will need the repairing placement to capture the cost of a solution and
we do not want to compute it a second time when we do the actual
repairing.
llvm-svn: 270167
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
register bank twice.
Prior to this change, we were checking if the assignment for the current
machine operand was matching, then we would check if the mismatch
requires to insert repair code.
We actually already have this information from the first check, so just
pass it along.
NFCI.
llvm-svn: 270166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sorry for the lack testcase. There is one in the pr, but it depends on
std::sort and the .ll version is 110 lines, so I don't think it is
wort it.
The bug was that we were sorting after adding a terminator, and the
sorting algorithm could end up putting the terminator in the middle of
the List vector.
With that we would create a Spans map entry keyed on nullptr which would
then be added to CUs and fail in that sorting.
llvm-svn: 270165
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helper class will be used to represent the cost of mapping an
instruction to a specific register bank.
The particularity of these costs is that they are mostly local, thus the
frequency of the basic block is irrelevant. However, for few
instructions (e.g., phis and terminators), the cost may be non-local and
then, we need to account for the frequency of the involved basic blocks.
This will be used by the greedy mode I am working on.
llvm-svn: 270163
|
|
|
|
|
|
| |
symbols on x86-64.
llvm-svn: 270157
|
|
|
|
| |
llvm-svn: 270156
|
|
|
|
| |
llvm-svn: 270155
|
|
|
|
|
|
|
|
| |
Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior.
Differential Revision: http://reviews.llvm.org/D20452
llvm-svn: 270153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sequences of range checks expressed using guards, like
guard((I - 2) u< L)
guard((I - 1) u< L)
guard((I + 0) u< L)
guard((I + 1) u< L)
guard((I + 2) u< L)
can sometimes be combined into a smaller sequence:
guard((I - 2) u< L AND (I + 2) u< L)
if we can prove that (I - 2) u< L AND (I + 2) u< L implies all of checks
expressed in the previous sequence.
This change teaches GuardWidening to do this kind of merging when
feasible.
llvm-svn: 270151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using Chandler's words from r265331:
This commit was greatly exacerbating PR17409 and effectively regressed
build time for lot of (very large) code when compiled with ASan or MSan.
PR17409 is fixed by r269249, so this is fine to reapply r263460.
Original commit message:
The bad behavior happens when we have a function with a long linear
chain of basic blocks, and have a live range spanning most of this
chain, but with very few uses.
Let say we have only 2 uses.
The Hopfield network is only seeded with two active blocks where the
uses are, and each iteration of the outer loop in
`RAGreedy::growRegion()` only adds two new nodes to the network due to
the completely linear shape of the CFG. Meanwhile,
`SpillPlacer->iterate()` visits the whole set of discovered nodes, which
adds up to a quadratic algorithm.
This is an historical accident effect from r129188.
When the Hopfield network is expanding, most of the action is happening
on the frontier where new nodes are being added. The internal nodes in
the network are not likely to be flip-flopping much, or they will at
least settle down very quickly. This means that while
`SpillPlacer->iterate()` is recomputing all the nodes in the network, it
is probably only the two frontier nodes that are changing their output.
Instead of recomputing the whole network on each iteration, we can
maintain a SparseSet of nodes that need to be updated:
- `SpillPlacement::activate()` adds the node to the todo list.
- When a node changes value (i.e., `update()` returns true), its
neighbors are added to the todo list.
- `SpillPlacement::iterate()` only updates the nodes in the list.
The result of Hopfield iterations is not necessarily exact. It should
converge to a local minimum, but there is no guarantee that it will find
a global minimum. It is possible that updating nodes in a different
order will cause us to switch to a different local minimum. In other
words, this is not NFC, but although I saw a few runtime improvements
and regressions when I benchmarked this change, those were side effects
and actually the performance change is in the noise as expected.
Huge thanks to Jakob Stoklund Olesen <stoklund@2pi.dk> for his
feedbacks, guidance and time for the review.
llvm-svn: 270149
|
|
|
|
|
|
| |
Addresses r270095's code review.
llvm-svn: 270147
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Work around crashes in ``__sanitizer_malloc_hook()`` under Mac OSX.
Under Mac OSX we intercept calls to malloc before thread local
storage is initialised leading to a crash when accessing
``AllocTracer``. To workaround this ``AllocTracer`` is only accessed
in the hook under Linux. For symmetry ``__sanitizer_free_hook()``
is also modified in the same way.
To support this change a set of new macros
LIBFUZZER_LINUX and LIBFUZZER_APPLE has been defined which can be
used to check the target being compiled for.
Differential Revision: http://reviews.llvm.org/D20402
llvm-svn: 270145
|