Memory safety without garbage collection for embedded applications

Presented on May 7, 2018
Presenter: Michael

Preview

Michael will present Memory safety without garbage collection for embedded applications by Dinakar Dhurjati, Sumant Kowshik, Vikram Adve, and Chris Lattner. If you want to read a shorter but less detailed version, you can read Memory safety without runtime checks or garbage collection, also by Dhurjati et al.

Summary

Problem

This paper elaborates on the Dhurjati’s prior work (2003), expanding the previous paper somewhat. Like before, the problem is that enforcing memory safety via runtime checks and garbage collection requires too high an overhead for embedded applications. 100% static checking is ideal, but due to the undecidability of problems involving unrestricted pointers and memory accesses, some compromises must often be made, whether that be limiting the language semantics or requiring these unwanted runtime checks.

Solution

This paper proposes a mix of compiler techniques and C-semantics restrictions to ensure memory safety and error-detection capabilities without GC. Their key results are: 1. statically ensuring that dereferencing dangling pointers is safe without runtime checks, GC, or added annotations 2. use an interprocedural analysis for static array bounds checking

Complex array references prove to be the greatest challenge, causes their static checks to fail such that run-time checks must be added for about half of their benchmark programs. Among some of the assumptions they make are that a system has a reserved address range (e.g. addresses reserved for the kernel), whose access causes a safe run-time error (e.g. page fault); this is important because it allows them to initialize global pointers to this address to in helping ensure memory safety.

DSA

This paper explains data structure analysis (DSA) and automatic pool allocation more extensively than their previous paper. Data structure analysis computes a data structure graph (a points-to-graph) for memory objects. A separate graph is computed for each function, and all functions within a strongly connected component share a single graph, with different nodes in the graph representing different objects. The algorithm for computing DSA is fully context and field sensitive (uses full acyclic call paths and distinguishes pointer fields in structures, respectively). This technique is useful when doing pool allocation.

APA

The automatic pool allocation transformation rewrites a program to segregate heap objects into multiple pools, using separate pools for each logical data structure instance. It involves 1) identifying pools within each data structure, 2) identifying where to create/destroy pools, and 3) transforming (de) allocationg operations and function interfaces. This results in heap objects be assigning to type-homogeneous pools and nodes in disjoint data structure instances being put in different pools.

Pointer Safety

They prove various pointer safety properties. Automatic pool allocation becomes useful when proving heap safety (preventing unsafe accesses to freed memory). They rely on the type homogeneity princple, which means that it safe to dereference a dangling pointer as long as it always points to a memory block containing data of the original type and alignment (even it’s been reallocated to a different object of that type). Thus they don’t prevent dangling pointer dereferences, they just make them safe. They find that this approach may lead to increased memory consumption in some cases, and develop a compiler algorithm for categorizing pools.

Array Restrictions

The problem with array references is that a compiler must prove that index expressions in these references lie within the array’s bounds on all execution paths; the effectiveness of this analysis is limited by fundamental limits on analysis of symbolic integer expressions. They show techniques to generate Presburger-arithmetic constraints on array use, based on the a set of language rules for arrays, such positive array sizes, the existence of provably affine relationships between array references and enclosing loop index variables, etc. In addition, they rely on a set of trusted library functions implied constraints and preconditions which are propagated into the constraint set upon use. Any relevant unconstrained variables (including those involved in non-affine expressions) cause the array access to be marked as unsafe. After generating the array reference constraints, they add conditions for array bounds violations (e.g. ) and use the Omega library to check for satisfiability; the array access is safe if the system is unsatisfiable.

Evaluation

Their evaluation involves measuring the effect on execution time and memory usage their changes had, as well effort involved in making them and general ability to statically check embedded C programs (with minimal changes). The porting effort on their MiBench and MediaBench benchmarks is ‘negligible’. They could enforce heap and pointer safety in all programs via their DSA and pool allocation techniques, with very marginal increases in execution time and memory usage (except for one program, where memory was less efficiently used with the pool allocation technique because of page size inefficiencies). They achieved one false positive in the stack safety checks. Finally, the array bounds checking failed to prove safety in more than half of the programs because they used non-affine bit operations on index variables or indirect indexing of arrays, which their static checks could not handle properly.

Discussion

There was ample discussion on data structure analysis and the current state of LLVM (where this is no longer implemented).

Attachments

Slides