summaryrefslogtreecommitdiffstats
path: root/clang/docs/AnalyzerRegions.txt
blob: d2177ec1607ec68faaf983f3b340fff51f6da828 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
Static Analyzer: 'Regions'
--------------------------

INTRODUCTION

  The path-sensitive analysis engine in libAnalysis employs an extensible API
  for abstractly modeling the memory of an analyzed program. This API employs
  the concept of "memory regions" to abstractly model chunks of program memory
  such as program variables and dynamically allocated memory such as those
  returned from 'malloc' and 'alloca'. Regions are hierarchical, with subregions
  modeling subtyping relationships, field and array offsets into larger chunks
  of memory, and so on.

  The region API consists of two components. The first is the taxonomy and
  representation of regions themselves within the analyzer engine. The primary
  definitions and interfaces are described in
  'include/clang/Analysis/PathSensitive/MemRegion.h'. At the root of the region
  hierarchy is the class 'MemRegion' with specific subclasses refining the
  region concept for variables, heap allocated memory, and so forth.

  The second component in the region API is the modeling of the binding of
  values to regions. For example, modeling the value stored to a local variable
  'x' consists of recording the binding between the region for 'x' (which
  represents the raw memory associated with 'x') and the value stored to 'x'.
  This binding relationship is captured with the notion of "symbolic stores."

  Symbolic stores, which can be thought of as representing the relation 'regions
  -> values', are implemented by subclasses of the StoreManager class (Store.h).
  A particular StoreManager implementation has complete flexibility concerning
  (a) *how* to model the binding between regions and values and (b) *what*
  bindings are recorded. Together, both points allow different StoreManagers to
  tradeoff between different levels of analysis precision and scalability
  concerning the reasoning of program memory. Meanwhile, the core path-sensitive
  engine makes no assumptions about (a) or (b), and queries a StoreManager about
  the bindings to a memory region through a generic interface that all
  StoreManagers share. If a particular StoreManager cannot reason about the
  potential bindings of a given memory region (e.g., 'BasicStoreManager' does
  not reason about fields of structures) then the StoreManager can simply return
  'unknown' (represented by 'UnknownVal') for a particular region-binding. This
  separation of concerns not only isolates the core analysis engine from the
  details of reasoning about program memory but also facilities the option of a
  client of the path-sensitive engine to easily swap in different StoreManager
  implementations that internally reason about program memory in very different
  ways.

  The rest of this document is divided into two parts. We first discuss region
  taxonomy and the semantics of regions. We then discuss the StoreManager
  interface, and details of how the currently available StoreManager classes
  implement region bindings.

MEMORY REGIONS and REGION TAXONOMY

  SYMBOLIC REGIONS

  A symbolic region is a map of the concept of symbolic values into the domain
  of regions. It is the way that we represent symbolic pointers. Whenever a
  symbolic pointer value is needed, a symbolic region is created to represent
  it.

  A symbolic region has no type. It wraps a SymbolData. But sometimes we have
  type information associated with a symbolic region. For this case, a
  TypedViewRegion is created to layer the type information on top of the
  symbolic region. The reason we do not carry type information with the symbolic
  region is that the symbolic regions can have no type. To be consistent, we
  don't let them to carry type information.

  Like a symbolic pointer, a symbolic region may be NULL, has unknown extent,
  and represents a generic chunk of memory.

  NOTE: We plan not to use loc::SymbolVal in RegionStore and remove it
  gradually.

Pointer Casts

Pointer casts allow people to impose different 'views' onto a chunk of memory.

Usually we have two kinds of casts. One kind of casts cast down with in the type
hierarchy. It imposes more specific views onto more generic memory regions. The
other kind of casts cast up with in the type hierarchy. It strips away more
specific views on top of the more generic memory regions. 

We simulate the down casts by layering another TypedViewRegion on top of the
original region. We simulate the up casts by striping away the top
TypedViewRegion. Down casts is usually simple. For up casts, if the there is no
TypedViewRegion to be stripped, we return the original region. If the underlying
region is of the different type than the cast-to type, we flag an error state.

For toll-free bridging casts, we return the original region.

Region Bindings

The following region kinds are boundable: VarRegion, CompoundLiteralRegion,
StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.

When binding regions, we perform canonicalization on element regions and field
regions. This is because we can have different views on the same region, some of
which are essentially the same view with different sugar type names.

To canonicalize a region, we get the canonical types for all TypedViewRegions
along the way up to the root region, and make new TypedViewRegions with those
canonical types.

All bindings and retrievings are done on the canonicalized regions.

Canonicalization is transparent outside the region store manager, and more
specifically, unaware outside the Bind() and Retrieve() method. We don't need to
consider region canonicalization when doing pointer cast.
OpenPOWER on IntegriCloud