DySy: Dynamic Symbolic Execution for Invariant Inference

Presented on June 25, 2018
Presenter: Mika

Preview

Mika will present DySy: Dynamic Symbolic Execution for Invariant Inference by Csallner, Tillmann, and Smaragdakis. For more information on Daikon, see this paper.

Summary

Problem

Daikon introduced dynamic invariant inferencing, but using Daikon templates is limiting (exponential growth when instantiating templates, lots of instantiated templates are wasted). Instead of searching through a library of templates at every pre-selected program point, DySy uses a shadow interpreter to follow concrete execution over a test set and collect symbolic conditions for every instruction executed.

Afterwards, the concrete values from the tests are used to generate a program precondition, while the symbolic conditions are used to create a postcondition. Having both of these, we can send them to a theorem prover to simplify and return a set of invariants which are practically observable (true as far as one can tell, and importaant to the developers). This practicality is deduced from using the test set as input: the authors presume that a test set is a good guide to how a functionality should be used in practice.

Evaluation

The evaluation of DySy is done on the Daikon benchmarks. Even though it’s around three times slower, DySy finds a subset of invariant that has been deemed practically important by a human tester, after both tools have finished working. Daikon finds lots more, and generates more common subexpressions – a metric which can be used to tell which inferencer needs more effort to find the results. In this view, the authors are saying DySy is an upgrade and basis for future evolutions of invariant inferencing. There is no real discussion of how much the quality of the test set influences these metrics, nor of how important invariant coverage is, that is, the problem where, given an application, one isn’t searching for the obvious invariants, but rather the ones that aren’t as obvious, and which a wider tool, like Daikon, might actually find by sheer luck.

Discussion

We discussed far and wide about the different aspects of invariant inferencing that the authors miss while talking about their tool. Ben asked why the authors do path-based symbolic execution instead of regular full symbolic execution, and we concluded that knowing the paths of the concrete inputs makes the symbolic execution engine have lots of work less to do, thus making it easier to operate.

An important note is that the power of the DySy method depends, and this without any mention from the authors, on the similarity between the concrete and symbolic execution engines: if the symbolic execution engine cannot replicate some state that the concrete one has done, then the postconditions can be emitted in such a way that they do not match (or do not fully match but still align, in the sense of an overapproximation) with the preconditions.

A number of interesting ideas has been mentioned while the paper was presented, namely, where we can plug a fuzzer in and make this (or, really, any similar method) better. There has been mention of using a similar method for API use-case inference from the API interfaces, but I need to write it down in more detail to be able to talk about it.

Attachments

Slides