[asan-asm-instrumentation] Added comment describing how asm instrumentation works.

Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670
author: Yuri Gorshenin <ygorshenin@google.com> 2014-10-27 08:38:54 +0000
committer: Yuri Gorshenin <ygorshenin@google.com> 2014-10-27 08:38:54 +0000
commit: 3e22bb8c540e2db6d1591e9560f7c670f1361ca4 (patch)
tree: 41ab0c14520a96fd1b1f838e8268ee9e83345706 /llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp
parent: 292fb6d7be7301623bbd2d19ab1306bc16e3e68c (diff)
download: bcm5719-llvm-3e22bb8c540e2db6d1591e9560f7c670f1361ca4.tar.gz
bcm5719-llvm-3e22bb8c540e2db6d1591e9560f7c670f1361ca4.zip
1 files changed, 64 insertions, 0 deletions
diff --git a/llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp b/llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp
index 32c107deb06..9c49a113638 100644
--- a/llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp
+++ b/llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp
@@ -30,6 +30,70 @@
 #include <cassert>
 #include <vector>
 
+// Following comment describes how assembly instrumentation works.
+// Currently we have only AddressSanitizer instrumentation, but we're
+// planning to implement MemorySanitizer for inline assembly too. If
+// you're not familiar with AddressSanitizer algorithm, please, read
+// https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm.
+//
+// When inline assembly is parsed by an instance of X86AsmParser, all
+// instructions are emitted via EmitInstruction method. That's the
+// place where X86AsmInstrumentation analyzes an instruction and
+// decides, whether the instruction should be emitted as is or
+// instrumentation is required. The latter case happens when an
+// instruction reads from or writes to memory. Now instruction opcode
+// is explicitly checked, and if an instruction has a memory operand
+// (for instance, movq (%rsi, %rcx, 8), %rax) - it should be
+// instrumented.  There're also exist instructions that modify
+// memory but don't have an explicit memory operands, for instance,
+// movs.
+//
+// Let's consider at first 8-byte memory accesses when an instruction
+// has an explicit memory operand. In this case we need two registers -
+// AddressReg to compute address of a memory cells which are accessed
+// and ShadowReg to compute corresponding shadow address. So, we need
+// to spill both registers before instrumentation code and restore them
+// after instrumentation. Thus, in general, instrumentation code will
+// look like this:
+// PUSHF  # Store flags, otherwise they will be overwritten
+// PUSH AddressReg  # spill AddressReg
+// PUSH ShadowReg   # spill ShadowReg
+// LEA MemOp, AddressReg  # compute address of the memory operand
+// MOV AddressReg, ShadowReg
+// SHR ShadowReg, 3
+// # ShadowOffset(AddressReg >> 3) contains address of a shadow
+// # corresponding to MemOp.
+// CMP ShadowOffset(ShadowReg), 0  # test shadow value
+// JZ .Done  # when shadow equals to zero, everything is fine
+// MOV AddressReg, RDI
+// # Call __asan_report function with AddressReg as an argument
+// CALL __asan_report
+// .Done:
+// POP ShadowReg  # Restore ShadowReg
+// POP AddressReg  # Restore AddressReg
+// POPF  # Restore flags
+//
+// Memory accesses with different size (1-, 2-, 4- and 16-byte) are
+// handled in a similar manner, but small memory accesses (less than 8
+// byte) require an additional ScratchReg, which is used for shadow value.
+//
+// If, suppose, we're instrumenting an instruction like movs, only
+// contents of RDI, RDI + AccessSize * RCX, RSI, RSI + AccessSize *
+// RCX are checked.  In this case there're no need to spill and restore
+// AddressReg , ShadowReg or flags four times, they're saved on stack
+// just once, before instrumentation of these four addresses, and restored
+// at the end of the instrumentation.
+//
+// There exist several things which complicate this simple algorithm.
+// * Instrumented memory operand can have RSP as a base or an index
+//   register.  So we need to add a constant offset before computation
+//   of memory address, since flags, AddressReg, ShadowReg, etc. were
+//   already stored on stack and RSP was modified.
+// * Debug info (usually, DWARF) should be adjusted, because sometimes
+//   RSP is used as a frame register. So, we need to select some
+//   register as a frame register and temprorary override current CFA
+//   register.
+
 namespace llvm {
 namespace {
author	Yuri Gorshenin <ygorshenin@google.com>	2014-10-27 08:38:54 +0000
committer	Yuri Gorshenin <ygorshenin@google.com>	2014-10-27 08:38:54 +0000
commit	3e22bb8c540e2db6d1591e9560f7c670f1361ca4 (patch)
tree	41ab0c14520a96fd1b1f838e8268ee9e83345706 /llvm/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp
parent	292fb6d7be7301623bbd2d19ab1306bc16e3e68c (diff)
download	bcm5719-llvm-3e22bb8c540e2db6d1591e9560f7c670f1361ca4.tar.gz bcm5719-llvm-3e22bb8c540e2db6d1591e9560f7c670f1361ca4.zip