summaryrefslogtreecommitdiffstats
path: root/llvm/docs/Frontend/PerformanceTips.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/Frontend/PerformanceTips.rst')
-rw-r--r--llvm/docs/Frontend/PerformanceTips.rst60
1 files changed, 57 insertions, 3 deletions
diff --git a/llvm/docs/Frontend/PerformanceTips.rst b/llvm/docs/Frontend/PerformanceTips.rst
index 5d7ad590464..d8c04651f0a 100644
--- a/llvm/docs/Frontend/PerformanceTips.rst
+++ b/llvm/docs/Frontend/PerformanceTips.rst
@@ -53,13 +53,25 @@ Other things to consider
#. Make sure that a DataLayout is provided (this will likely become required in
the near future, but is certainly important for optimization).
-#. Add nsw/nuw/fast-math flags as appropriate
+#. Add nsw/nuw flags as appropriate. Reasoning about overflow is
+ generally hard for an optimizer so providing these facts from the frontend
+ can be very impactful. For languages which need overflow semantics,
+ consider using the :ref:`overflow intrinsics <int_overflow>`.
+
+#. Use fast-math flags on floating point operations if legal. If you don't
+ need strict IEEE floating point semantics, there are a number of additional
+ optimizations that can be performed. This can be highly impactful for
+ floating point intensive computations.
+
+#. Use inbounds on geps. This can help to disambiguate some aliasing queries.
#. Add noalias/align/dereferenceable/nonnull to function arguments and return
values as appropriate
-#. Mark functions as readnone/readonly/nounwind when known (especially for
- external functions)
+#. Mark functions as readnone/readonly or noreturn/nounwind when known. The
+ optimizer will try to infer these flags, but may not always be able to.
+ Manual annotations are particularly important for external functions that
+ the optimizer can not analyze.
#. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
analysis), prefer GEPs
@@ -85,9 +97,51 @@ Other things to consider
and may not be well optimized by the current optimizer. Depending on your
source language, you may consider using fences instead.
+#. If calling a function which is known to throw an exception (unwind), use
+ an invoke with a normal destination which contains an unreachable
+ instruction. This form conveys to the optimizer that the call returns
+ abnormally. For an invoke which neither returns normally or requires unwind
+ code in the current function, you can use a noreturn call instruction if
+ desired. This is generally not required because the optimizer will convert
+ an invoke with an unreachable unwind destination to a call instruction.
+
#. If you language uses range checks, consider using the IRCE pass. It is not
currently part of the standard pass order.
+#. For languages with numerous rarely executed guard conditions (e.g. null
+ checks, type checks, range checks) consider adding an extra execution or
+ two of LoopUnswith and LICM to your pass order. The standard pass order,
+ which is tuned for C and C++ applications, may not be sufficient to remove
+ all dischargeable checks from loops.
+
+#. Use profile metadata to indicate statically known cold paths, even if
+ dynamic profiling information is not available. This can make a large
+ difference in code placement and thus the performance of tight loops.
+
+#. When generating code for loops, try to avoid terminating the header block of
+ the loop earlier than necessary. If the terminator of the loop header
+ block is a loop exiting conditional branch, the effectiveness of LICM will
+ be limited for loads not in the header. (This is due to the fact that LLVM
+ may not know such a load is safe to speculatively execute and thus can't
+ lift an otherwise loop invariant load unless it can prove the exiting
+ condition is not taken.) It can be profitable, in some cases, to emit such
+ instructions into the header even if they are not used along a rarely
+ executed path that exits the loop. This guidance specifically does not
+ apply if the condition which terminates the loop header is itself invariant,
+ or can be easily discharged by inspecting the loop index variables.
+
+#. In hot loops, consider duplicating instructions from small basic blocks
+ which end in highly predictable terminators into their successor blocks.
+ If a hot successor block contains instructions which can be vectorized
+ with the duplicated ones, this can provide a noticeable throughput
+ improvement. Note that this is not always profitable and does involve a
+ potentially large increase in code size.
+
+#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
+ of predecessors). Among other issues, the register allocator is known to
+ perform badly with confronted with such structures. The only exception to
+ this guidance is that a unified return block with high in-degree is fine.
+
p.s. If you want to help improve this document, patches expanding any of the
above items into standalone sections of their own with a more complete
discussion would be very welcome.
OpenPOWER on IntegriCloud