Enforcing High-Level Protocols in Low-Level Software

Presented on April 9, 2018
Presenter: Michael

Preview

Michael will present “Enforcing High-Level Protocols in Low-Level Software” by Robert DeLine and Manuel Fähndrich.

Summary

Problem

Safe languages provide many benefits for general-purpose programming, such as type and memory safety. However, they have not seen (in the year 2001 when this was written) widespread adoption in building low-level infrastructure software. This type of software needs to manage resource management protocols, such as regulate resource references (to mitigate dangling pointers, leaks, race conditions) and enforce usage rules. A language is needed with appropriate primitives for handling these concerns.

Solution

DeLiene and Fähndrich’s solution is to create a C-like language called Vault, which allows the specification (and more important, static verification) of domain-specific resource management protocols. Vault’s novel features are: 1. Keys: compile-time tokens representing run-time resources (not duplicated nor lost), each key can have state too (via stateset construct) 2. Held-key set: set of keys in possesion at a given program point 3. Type guards: predicates describing when operations on a value are valid

They provide several ways to manipulate keys: - Tracked types: 1-1 correspondence between compile-time key and run-time object - Functions: annotate with an /effect-clause/, which defines the pre- and post-conditions on the held-key set contents - Parameterized types: types are polymorphic over keys - Keyed variants: like algebraic datatypes, allows typechecker to safely move between static and dynamic knowledge regarding held-key set.

They present several small examples to demonstrate these features. They show how keyed variants can help express correlation between state values and functions. They also show how Vault can be used to catch dangling references and memory leaks when building a region abstraction. They also demonstrate that keys’ states can be used to enforce that the correct steps are followed, in order, to create and listen on a socket.

interface SOCKET {
  type sock;
  variant domain ['UNIX | 'INET]; variant comm_style ['STREAM | 'DGRAM];
  variant status<key K> [ 'OK {K@named} | 'Error(error_code){K@raw} ];
  struct sockaddr { ... };

  tracked(S@raw) sock socket(domain, comm_style, int);
  tracked status<S> bind(tracked(S) sock, sockaddr) [-S@raw]; 
  void listen(tracked(S) sock, int) [S@named->listening];
  tracked(N) sock accept(tracked(S) sock, sockaddr) [S@listening, new N@ready];
  void receive(tracked(S) sock, byte[]) [S@ready];
  void close(tracked(S) sock) [-S];
}
tracked (@raw) sock mysocket = socket('UNIX, 'INET, 0);
switch (bind(mysocket, mysockaddr)) {
  case 'OK:
    list(mysocket, 0);
    ...
  case 'Error(code):
    // report an error.
  ...
}  

Their type system is based on Crary’s /Capability Calculus/. Some noteworthy parts include how the concrete syntax ~tracked T~ translates to a singleton type s(r), so that every alias for a resource is given the same singleton type. They also use existential types (∃[N|C].τ) (which are used to encode that a value carries a capability with them). For example, the declaration ~tracked region create();~ has type $(,) (,.(r)). This means that calling create returns a new region and the key needed to access that region; the type binds new region (singleton type) and key.

One interesting limitation that becomes immediately apparent is how Vault can reconcile using compile-time names to statically track resources with instances where you are dealing with an unbounded number of resources. They use anonymous tracked types to solve this problem, with some loss of programming convenience. For example, in ~variant reglist [’Nil|’Cons(tracked region, tracked reglist)];~, reglist stores an unbounded number of regions, but you lose information about which key guards which region when you place them in a list like this (as the list takes ownership of the region but doesn’t have a place to store those correlated keys in a user-visible way). The following is an example of this anonymization inconvenience:

void main() {
  tracked(R) region rgn = Region.create();
  R:point pt = new(rgn) point {x=4;y=2;};
  tracked reglist list = 'Cons(rgn,'Nil);
  // We lost key R
  ...
  switch(list) {
    case 'Cons(rgn2,_): // got _some_ key back
      pt.x++;           // Bug! we need key R...
  }
}

Another limitation is the type agreement at join points, which is a typical over-conservative problem of typechecking in general.

This paper seems to resemble Rust; namely the transfer of ownership of keys (removing and adding back of keys into the held-key set, i.e. “consumption”) is like ownership types. Like Rust, Vault’s non-duplicatable, unloseable keys facilitate safe access to mutable objects by enforcing a form of mutual exclusion.

Evaluation

Their evaluation involved translating 4900 lines of C to 5200 lines of Vault, then using their compiler to translate it back to C. They had to write wrappers around C in order to link back into the original kernel (since in some cases their data representations had changed).

For case studies, they worked on Windows 2000 drivers. They demonstrate: - using tracked types to reflect the I/O Request Packets ownership model - coordinating threads access to locked objects safely via keys (detecting missing lock releases) - ensuring correct calls of IRP completion routines - representing the interrupt level via a key with several states to guard data in paged memory

They cite /typestate/ as inspiration, which allows the specification of operations allowed on data based on context. They cite the Capability Calculus as influence (their type system is so similar, they claim, that they don’t need to include the rules in this paper), and give very minor differences between the two works. Concurrent Clean uses /unique/ types to represent unaliased objects, like Vault’s anonymous tracked types. Extended Type Checking (ESC) is similarly motivated but is used for high-level languages; they say their approaches are complimentary. SLAM focuses on C by using a model checker to search an abstraction of a C program for protocol violations, iterating until found or exhausted.

Attachments