diff options
author | Philip Reames <listmail@philipreames.com> | 2015-09-10 17:03:10 +0000 |
---|---|---|
committer | Philip Reames <listmail@philipreames.com> | 2015-09-10 17:03:10 +0000 |
commit | fba81bc0767b428696ccdaf0dd590d4ed5537a18 (patch) | |
tree | de56376f4fe6edbd0fdc29d1e541a4c7bea63136 /llvm/docs/Frontend | |
parent | a938bcb89a9669b8ca09dd4c282aa103166d5829 (diff) | |
download | bcm5719-llvm-fba81bc0767b428696ccdaf0dd590d4ed5537a18.tar.gz bcm5719-llvm-fba81bc0767b428696ccdaf0dd590d4ed5537a18.zip |
[docs][PerformanceTips] Add text on allocas and alignment
This summarizes two recent llvm-dev discussions. Most of the text provided by David Chisnall and Benoit Belley with minor editting by me.
llvm-svn: 247301
Diffstat (limited to 'llvm/docs/Frontend')
-rw-r--r-- | llvm/docs/Frontend/PerformanceTips.rst | 41 |
1 files changed, 41 insertions, 0 deletions
diff --git a/llvm/docs/Frontend/PerformanceTips.rst b/llvm/docs/Frontend/PerformanceTips.rst index a3f977f0e03..142d262eb65 100644 --- a/llvm/docs/Frontend/PerformanceTips.rst +++ b/llvm/docs/Frontend/PerformanceTips.rst @@ -46,6 +46,22 @@ The Basics perform badly with confronted with such structures. The only exception to this guidance is that a unified return block with high in-degree is fine. +Use of allocas +^^^^^^^^^^^^^^ + +An alloca instruction can be used to represent a function scoped stack slot, +but can also represent dynamic frame expansion. When representing function +scoped variables or locations, placing alloca instructions at the beginning of +the entry block should be preferred. In particular, place them before any +call instructions. Call instructions might get inlined and replaced with +multiple basic blocks. The end result is that a following alloca instruction +would no longer be in the entry basic block afterward. + +The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt +to eliminate alloca instructions that are in the entry basic block. Given +SSA is the canonical form expected by much of the optimizer; if allocas can +not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less +effective than it could be. Avoid loads and stores of large aggregate type ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -79,6 +95,31 @@ operations for safety. If your source language provides information about the range of the index, you may wish to manually extend indices to machine register width using a zext instruction. +When to specify alignment +^^^^^^^^^^^^^^^^^^^^^^^^^^ +LLVM will always generate correct code if you don’t specify alignment, but may +generate inefficient code. For example, if you are targeting MIPS (or older +ARM ISAs) then the hardware does not handle unaligned loads and stores, and +so you will enter a trap-and-emulate path if you do a load or store with +lower-than-natural alignment. To avoid this, LLVM will emit a slower +sequence of loads, shifts and masks (or load-right + load-left on MIPS) for +all cases where the load / store does not have a sufficiently high alignment +in the IR. + +The alignment is used to guarantee the alignment on allocas and globals, +though in most cases this is unnecessary (most targets have a sufficiently +high default alignment that they’ll be fine). It is also used to provide a +contract to the back end saying ‘either this load/store has this alignment, or +it is undefined behavior’. This means that the back end is free to emit +instructions that rely on that alignment (and mid-level optimizers are free to +perform transforms that require that alignment). For x86, it doesn’t make +much difference, as almost all instructions are alignment-independent. For +MIPS, it can make a big difference. + +Note that if your loads and stores are atomic, the backend will be unable to +lower an under aligned access into a sequence of natively aligned accesses. +As a result, alignment is mandatory for atomic loads and stores. + Other Things to Consider ^^^^^^^^^^^^^^^^^^^^^^^^ |