ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.

The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127
author: Tim Northover <tnorthover@apple.com> 2018-09-13 10:28:05 +0000
committer: Tim Northover <tnorthover@apple.com> 2018-09-13 10:28:05 +0000
commit: c15d47bb013e975da582c8fd786ba8234d70d75d (patch)
tree: e13262451793600a29c0df26342fc954e1a8a79a /llvm/lib/Target/ARM/ARMISelLowering.h
parent: 95ac65bc32180744cbc67d4e82a0f6417fb92aa9 (diff)
download: bcm5719-llvm-c15d47bb013e975da582c8fd786ba8234d70d75d.tar.gz
bcm5719-llvm-c15d47bb013e975da582c8fd786ba8234d70d75d.zip
1 files changed, 2 insertions, 0 deletions
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index 734b1ee5aa1..bf652bc7721 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -575,6 +575,8 @@ class VectorType;
     bool isLegalInterleavedAccessType(VectorType *VecTy,
                                       const DataLayout &DL) const;
 
+    bool alignLoopsWithOptSize() const override;
+
     /// Returns the number of interleaved accesses that will be generated when
     /// lowering accesses of the given type.
     unsigned getNumInterleavedAccesses(VectorType *VecTy,
author	Tim Northover <tnorthover@apple.com>	2018-09-13 10:28:05 +0000
committer	Tim Northover <tnorthover@apple.com>	2018-09-13 10:28:05 +0000
commit	c15d47bb013e975da582c8fd786ba8234d70d75d (patch)
tree	e13262451793600a29c0df26342fc954e1a8a79a /llvm/lib/Target/ARM/ARMISelLowering.h
parent	95ac65bc32180744cbc67d4e82a0f6417fb92aa9 (diff)
download	bcm5719-llvm-c15d47bb013e975da582c8fd786ba8234d70d75d.tar.gz bcm5719-llvm-c15d47bb013e975da582c8fd786ba8234d70d75d.zip